Heterogeneity in the diagnosis and prognosis of ischemic stroke subtypes: 9-year follow-up of 22,000 cases in Chinese adults.
Chun M., Qin H., Turnbull I., Sansome S., Gilbert S., Hacker A., Wright N., Zhu T., Clifton D., Bennett D., Guo Y., Pei P., Lv J., Yu C., Yang L., Li L., Lu Y., Chen Z., Cairns BJ., Chen Y., Clarke R.
BACKGROUND: Reliable classification of ischemic stroke (IS) etiological subtypes is required in research and clinical practice, but the predictive properties of these subtypes in population studies with incomplete investigations are poorly understood. AIMS: To compare the prognosis of etiologically classified IS subtypes and use machine learning (ML) to classify incompletely investigated IS cases. METHODS: In a 9-year follow-up of a prospective study of 512,726 Chinese adults, 22,216 incident IS cases, confirmed by clinical adjudication of medical records, were assigned subtypes using a modified Causative Classification System for Ischemic Stroke (CCS) (large artery atherosclerosis (LAA), small artery occlusion (SAO), cardioaortic embolism (CE), or undetermined etiology) and classified by CCS as "evident," "probable," or "possible" IS cases. For incompletely investigated IS cases where CCS yielded an undetermined etiology, an ML model was developed to predict IS subtypes from baseline risk factors and screening for cardioaortic sources of embolism. The 5-year risks of subsequent stroke and all-cause mortality (measured using cumulative incidence functions and 1 minus Kaplan-Meier estimates, respectively) for the ML-predicted IS subtypes were compared with etiologically classified IS subtypes. RESULTS: Among 7443 IS subtypes with evident or probable etiology, 66% had SAO, 32% had LAA, and 2% had CE, but proportions of SAO-to-LAA cases varied by regions in China. CE had the highest rates of subsequent stroke and mortality (43.5% and 40.7%), followed by LAA (43.2% and 17.4%) and SAO (38.1% and 11.1%), respectively. ML provided classifications for cases with undetermined etiology and incomplete clinical data (24% of all IS cases; n = 5276), with area under the curves (AUC) of 0.99 (0.99-1.00) for CE, 0.67 (0.64-0.70) for LAA, and 0.70 (0.67-0.73) for SAO for unseen cases. ML-predicted IS subtypes yielded comparable subsequent stroke and all-cause mortality rates to the etiologically classified IS subtypes. CONCLUSION: This study highlighted substantial heterogeneity in prognosis of IS subtypes and utility of ML approaches for classification of IS cases with incomplete clinical investigations.