Explainable ML for High-Risk Non-Alcoholic Steatohepatitis Prediction

Download PDF Copy

By Dr Silpaja Chandrasekar, PhDReviewed by Susha Cheriyedath, M.Sc.Apr 24 2024

In a paper published in the journal Scientific Reports, researchers developed an explainable machine learning (ML) model using National Health and Nutrition Examination Surveys (NHANES) 2017-March 2020 data to predict high-risk metabolic dysfunction-associated steatohepatitis (MASH) by utilizing an ensemble-based extreme gradient boosting (XGBoost) model with high-performance metrics.

Impact of predictors on high-risk MASLD prediction using SHAP values. Summary Shapley Additive Explanations (SHAP) plot shows the importance and impact of various training variables (predictors) on XGB MASLDFAST≥0.35. The SHAP values on the x-axis quantify the influence of each predictor, with positive values favor high-risk MASLD prediction and negative values favor no high-risk MASLD prediction. The predictors, including ALT, BMI, GGT, age, and platelet count, are ranked by magnitude of impact. Predictor values are color-coded, with red indicating higher values and blue lower variable values (e.g., ALT of 120 U/L in red and 12 U/L in blue). Image Credit: https://www.nature.com/articles/s41598-024-59183-4

Their model outperformed traditional biomarkers like fibrosis-4 index (FIB-4P) and aspartate aminotransferase to platelet ratio index (APRI), with alanine aminotransferase (ALT), gamma-glutamyl transferase (GGT), platelet count, waist circumference, and age as top predictors by offering a promising tool for early identification of high-risk MASH patients, especially in resource-limited settings.

Background

Past work has highlighted metabolic dysfunction-as sociated steatotic liver disease (MASLD) as a prevalent global issue that affects approximately 25% of the population worldwide and 34% in the United States. Around 20% of individuals with MASLD progress to MASH, characterized by liver cell inflammation and damage. Current identification methods rely on liver biopsy, but non-invasive approaches like the fibroscan-AST (FAST) score offer alternatives. However, these methods may not be universally accessible. ML techniques, like gradient boosting, have shown promise in MASH diagnosis using routine clinical data and improving patient management and outcomes.

NHANES Data Analysis

The data were sourced from the NHANES conducted between 2017 and March 2020 and comprised a nationally representative sample of the noninstitutionalized US population. The study included participants aged 18 years or older who met specific criteria, such as testing negative for Hepatitis B and C antibodies and having no history of high alcohol use.

Quality control measures were applied to exclude participants with missing liver elastography data and unacceptable vibration-controlled transient elastography (VCTE) measurements. The study followed transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) reporting guidelines and obtained approval from the National Center for Health Statistics (NCHS) research ethics review board.

The outcome variable of interest was high-risk MASLD, defined by FAST scores of ≥ 0.35 and ≥ 0.67. Predictors included demographic information, physical exam findings, laboratory values, past medical history, and serologic biomarkers of liver fibrosis. These biomarkers were calculated based on established cutoff values, including the Fibrosis-4 Index (FIB4), the non-alcoholic fatty liver disease (NAFLD) fibrosis score (NFS), body mass index (BMI), aspartate aminotransferase to ALT ratio (AST/ALT) ratio, and the presence of type 2 diabetes mellitus, and the APRI.

XGBoost algorithm was employed to develop an ML model, with data split into training, validation, and test sets. Model performance was evaluated using various metrics that included area under the receiver operating characteristic curve (AUROC), accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). The researchers utilized the Shapley additive explanations (SHAP) approach to interpret the XGBoost model and implemented techniques to handle missing data.

Statistical analyses included t-tests to describe patient characteristics and compare means between high-risk MASLD outcome groups. The study adhered to ethical principles outlined by the NCHS to ensure participant rights, welfare, and privacy were upheld throughout the research process.

MASLD Analysis Insights

The study encompassed 5156 subjects meeting the inclusion criteria, revealing a prevalence of high-risk MASLD at FAST ≥ 0.35 and FAST ≥ 0.67 of 5.8% and 1.1%, respectively. Among the subjects, the median age was 55 years, with approximately 48% being women. Notable differences emerged between the high-risk MASLD and no high-risk MASLD groups, characterized by a higher prevalence of diabetes, elevated BMI, waist circumference, liver enzymes, and altered lipid profiles among those classified as high-risk MASLD individuals.

The XGBoost MASLD models showcased robust predictive accuracy across various FAST score thresholds, with AUROC ranging from 0.91 to 0.97. Comparative assessments with traditional ML models highlighted the superior performance of XGBoost, underscoring its efficacy in predicting high-risk MASLD. The XGBoost MASLD model outperformed traditional serologic biomarkers, showcasing higher AUROC, sensitivity, specificity, PPV, and NPV compared to FIB4, NFS, and APRI scores. Employing the SHAP approach facilitated the interpretation of the XGBoost MASLD models, revealing ALT, GGT, BMI, platelet count, and age as significant predictors influencing high-risk MASLD prediction. SHAP values elucidated the contributions of these predictors to model predictions, shedding light on their respective impacts.

The study pinpointed areas where the predictive model faltered by scrutinizing false positive and false negative predictions, providing crucial insights for refinement. Understanding the reasons behind these inaccuracies allows for targeted improvements, such as fine-tuning model parameters or incorporating additional relevant predictors, to enhance the precision and reliability of the model. This iterative refinement process ensures the predictive model evolves to reflect real-world scenarios and clinical complexities better.

Continuous assessment and enhancement of the model allow clinicians to confidently integrate it into their decision-making process, thereby optimizing patient care and outcomes. Ultimately, these efforts contribute to advancing the field of predictive modeling in healthcare, paving the way for more accurate and effective tools to support clinical practice.

Conclusion

To sum up, the study showcased the potential of explainable ML in identifying high-risk MASH. The developed XGBoost model outperformed traditional serologic tests, offering a more comprehensive approach to detection. The capability of the model to detect heterogeneous subphenotypes suggests potential enhancements in diagnosis and management, which could optimize clinical outcomes. Further exploration of its clinical applications is warranted, with possible implications for resource-limited settings.

Journal reference:

Njei, B., et al. (2024). An Explainable Machine Learning Model for Prediction of High-Risk Non-Alcoholic Steatohepatitis. Scientific Reports, 14:1, 8589. https://doi.org/10.1038/s41598-024-59183-4, https://www.nature.com/articles/s41598-024-59183-4

Posted in: AI Research News

Comments (0)

Written by

Silpaja Chandrasekar

Dr. Silpaja Chandrasekar has a Ph.D. in Computer Science from Anna University, Chennai. Her research expertise lies in analyzing traffic parameters under challenging environmental conditions. Additionally, she has gained valuable exposure to diverse research areas, such as detection, tracking, classification, medical image analysis, cancer cell detection, chemistry, and Hamiltonian walks.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Chandrasekar, Silpaja. (2024, April 24). Explainable ML for High-Risk Non-Alcoholic Steatohepatitis Prediction. AZoAi. Retrieved on April 21, 2025 from https://www.azoai.com/news/20240424/Explainable-ML-for-High-Risk-Non-Alcoholic-Steatohepatitis-Prediction.aspx.
MLA
Chandrasekar, Silpaja. "Explainable ML for High-Risk Non-Alcoholic Steatohepatitis Prediction". AZoAi. 21 April 2025. <https://www.azoai.com/news/20240424/Explainable-ML-for-High-Risk-Non-Alcoholic-Steatohepatitis-Prediction.aspx>.
Chicago
Chandrasekar, Silpaja. "Explainable ML for High-Risk Non-Alcoholic Steatohepatitis Prediction". AZoAi. https://www.azoai.com/news/20240424/Explainable-ML-for-High-Risk-Non-Alcoholic-Steatohepatitis-Prediction.aspx. (accessed April 21, 2025).
Harvard
Chandrasekar, Silpaja. 2024. Explainable ML for High-Risk Non-Alcoholic Steatohepatitis Prediction. AZoAi, viewed 21 April 2025, https://www.azoai.com/news/20240424/Explainable-ML-for-High-Risk-Non-Alcoholic-Steatohepatitis-Prediction.aspx.