In an article recently published in the journal Information, researchers demonstrated the feasibility of using heterogeneous machine learning (ML) classifiers and eXplainable artificial intelligence (XAI) to predict strokes with greater transparency.
Background
Stroke is one of the common causes of mortality among older individuals around the world. The cerebrovascular condition occurs due to a leak or blockage in blood vessels that diminishes or stops blood flow to the brain.
Although a number of strokes can be fatal or leave an individual incapacitated, many of them are treatable. Severe brain damage/loss of life can be prevented if the stroke is diagnosed and recognized early. The advent of devices such as fit bits and smartwatches that can capture a huge amount of health data continuously has led to the extensive application of ML to assess the collected health data.
Big data and ML solutions can be utilized for different health-related applications, such as prognosis, diagnosis, and decision support systems. These solutions can also assist doctors in optimizing treatments and prescriptions and help patients decide on scheduling follow-up appointments. Artificial intelligence (AI) algorithms can also perform sophisticated operations, such as imaging and text recognition, accurate detection and prediction of diseases, and remote health treatment, on substantial amounts of data.
Thus, AI/ML classifiers can be used to predict strokes. For instance, the stochastic gradient algorithm (SGD) can categorize stroke disease with 95% accuracy. Similarly, the random forest (RF) algorithm can be used for early stroke prediction with 96% accuracy.
ML and XAI in stroke risk prediction
In this paper, researchers proposed using heterogeneous ML classifiers and XAI to predict strokes. Five XAI techniques, including Anchor, local interpretable model-agnostic explanations (LIME), explain like I'm 5 (ELI5), QLattice, and Shapley additive values (SHAP) to understand the stroke predictions better.
XAI is a collection of tools and frameworks that plays a key role in maintaining accountability and transparency of ML algorithms by interpreting and understanding predictions made by these algorithms. Thus, the explainability factor can significantly improve transparency in predictive analysis, which is crucial in the healthcare sector, and enable healthcare stakeholders to interpret deep learning (DL) and ML models more confidently.
Researchers used a public stroke dataset that contained information about 5110 patients with 12 attributes, with stroke as the target variable. The raw data were transformed into understandable and usable forms through data preprocessing. The borderline-synthetic minority oversampling technique (SMOTE) was utilized to balance the data of the training dataset to ensure appropriate model training. Researchers employed the Harris Hawks algorithm, particle swarm optimization, mutual information, and Pearson’s correlation feature selection techniques to select the most suitable features.
ML ensemble models can be used through several techniques, including bagging, stacking, and boosting. Researchers used stacking as it allowed them to train multiple models to address similar issues and then combine all findings to develop a more potent model.
Three stacks on two levels were constructed based on stacking. The first stack contained k-nearest neighbors (KNN), RF, decision tree, and logistic regression ML techniques, while the second stack consisted of tree-based algorithms, including extreme gradient boosting (XGBoost), categorical boosting (CatBoost), adaptive boosting (AdaBoost), and light gradient-boosting (LightGBM). These two stacks were ensembled to construct the final stack. XAI techniques were employed to decipher the prediction made by these models.
Significance of the study
Researchers successfully predicted the risk of strokes using the ML and XAI-based approach. A customized novel ensemble-stacking architecture was designed and utilized to improve the stroke prediction performance using baseline classifiers.
Mutual information showed the best performance among the four feature selection techniques. Results obtained from these techniques demonstrated that body mass index (BMI), age, heart disease, hypertension, and average glucose level were the most crucial features/variables for stroke prediction. Thus, these features were utilized in further analysis.
The multi-stack of ML models demonstrated exceptional performance with 96%, 96%, and 96% precision, accuracy, and recall, respectively. Moreover, the proposed model could be effectively understood and interpreted using five XAI techniques used in this study, including SHAP, QLattice, Anchor, ELI5, and LIME. The comparison of the proposed algorithm with other related algorithms also displayed both the effectiveness and greater transparency of the model.
To summarize, the findings of this study demonstrated the feasibility of using this proposed model to predict strokes with high accuracy, resulting in more efficient and personalized patient care. However, extensive testing, scalability assessments, and multiple external validations must be performed before employing this framework on a large scale in healthcare institutions.
Additionally, more high-caliber features must be added to the dataset, and advanced imaging modalities must be considered when predicting strokes in the future. DL techniques can be employed for large datasets.
Journal reference:
- S, S., Chadaga, K., Sampathila, N., Prabhu, S., Chadaga, R., & S, S. K. (2023). Multiple Explainable Approaches to Predict the Risk of Stroke Using Artificial Intelligence. Information, 14(8), 435. https://doi.org/10.3390/info14080435, https://www.mdpi.com/2078-2489/14/8/435