Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

BACKGROUND: Reliable classification of ischemic stroke (IS) etiological subtypes is required in research and clinical practice, but the predictive properties of these subtypes in population studies with incomplete investigations are poorly understood. AIMS: To compare the prognosis of etiologically classified IS subtypes and use machine learning (ML) to classify incompletely investigated IS cases. METHODS: In a 9-year follow-up of a prospective study of 512,726 Chinese adults, 22,216 incident IS cases, confirmed by clinical adjudication of medical records, were assigned subtypes using a modified Causative Classification System for Ischemic Stroke (CCS) (large artery atherosclerosis (LAA), small artery occlusion (SAO), cardioaortic embolism (CE), or undetermined etiology) and classified by CCS as "evident," "probable," or "possible" IS cases. For incompletely investigated IS cases where CCS yielded an undetermined etiology, an ML model was developed to predict IS subtypes from baseline risk factors and screening for cardioaortic sources of embolism. The 5-year risks of subsequent stroke and all-cause mortality (measured using cumulative incidence functions and 1 minus Kaplan-Meier estimates, respectively) for the ML-predicted IS subtypes were compared with etiologically classified IS subtypes. RESULTS: Among 7443 IS subtypes with evident or probable etiology, 66% had SAO, 32% had LAA, and 2% had CE, but proportions of SAO-to-LAA cases varied by regions in China. CE had the highest rates of subsequent stroke and mortality (43.5% and 40.7%), followed by LAA (43.2% and 17.4%) and SAO (38.1% and 11.1%), respectively. ML provided classifications for cases with undetermined etiology and incomplete clinical data (24% of all IS cases; n = 5276), with area under the curves (AUC) of 0.99 (0.99-1.00) for CE, 0.67 (0.64-0.70) for LAA, and 0.70 (0.67-0.73) for SAO for unseen cases. ML-predicted IS subtypes yielded comparable subsequent stroke and all-cause mortality rates to the etiologically classified IS subtypes. CONCLUSION: This study highlighted substantial heterogeneity in prognosis of IS subtypes and utility of ML approaches for classification of IS cases with incomplete clinical investigations.

Original publication




Journal article


Int J Stroke

Publication Date





847 - 855


China, Ischemic stroke, classification, etiology, machine learning, Humans, Adult, Stroke, Ischemic Stroke, Prospective Studies, Follow-Up Studies, East Asian People, Prognosis, Atherosclerosis, Risk Factors, Embolism, Brain Ischemia