Explainable ML for High-Risk Non-Alcoholic Steatohepatitis Prediction

In a paper published in the journal Scientific Reports, researchers developed an explainable machine learning (ML) model using National Health and Nutrition Examination Surveys (NHANES) 2017-March 2020 data to predict high-risk metabolic dysfunction-associated steatohepatitis (MASH) by utilizing an ensemble-based extreme gradient boosting (XGBoost) model with high-performance metrics.

Impact of predictors on high-risk MASLD prediction using SHAP values. Summary Shapley Additive Explanations (SHAP) plot shows the importance and impact of various training variables (predictors) on XGB MASLDFAST≥0.35. The SHAP values on the x-axis quantify the influence of each predictor, with positive values favor high-risk MASLD prediction and negative values favor no high-risk MASLD prediction. The predictors, including ALT, BMI, GGT, age, and platelet count, are ranked by magnitude of impact. Predictor values are color-coded, with red indicating higher values and blue lower variable values (e.g., ALT of 120 U/L in red and 12 U/L in blue). Image Credit: https://www.nature.com/articles/s41598-024-59183-4
Impact of predictors on high-risk MASLD prediction using SHAP values. Summary Shapley Additive Explanations (SHAP) plot shows the importance and impact of various training variables (predictors) on XGB MASLDFAST≥0.35. The SHAP values on the x-axis quantify the influence of each predictor, with positive values favor high-risk MASLD prediction and negative values favor no high-risk MASLD prediction. The predictors, including ALT, BMI, GGT, age, and platelet count, are ranked by magnitude of impact. Predictor values are color-coded, with red indicating higher values and blue lower variable values (e.g., ALT of 120 U/L in red and 12 U/L in blue). Image Credit: https://www.nature.com/articles/s41598-024-59183-4

Their model outperformed traditional biomarkers like fibrosis-4 index (FIB-4P) and aspartate aminotransferase to platelet ratio index (APRI), with alanine aminotransferase (ALT), gamma-glutamyl transferase (GGT), platelet count, waist circumference, and age as top predictors by offering a promising tool for early identification of high-risk MASH patients, especially in resource-limited settings.

Background

Past work has highlighted metabolic dysfunction-as sociated steatotic liver disease (MASLD) as a prevalent global issue that affects approximately 25% of the population worldwide and 34% in the United States. Around 20% of individuals with MASLD progress to MASH, characterized by liver cell inflammation and damage. Current identification methods rely on liver biopsy, but non-invasive approaches like the fibroscan-AST (FAST) score offer alternatives. However, these methods may not be universally accessible. ML techniques, like gradient boosting, have shown promise in MASH diagnosis using routine clinical data and improving patient management and outcomes.

NHANES Data Analysis

The data were sourced from the NHANES conducted between 2017 and March 2020 and comprised a nationally representative sample of the noninstitutionalized US population. The study included participants aged 18 years or older who met specific criteria, such as testing negative for Hepatitis B and C antibodies and having no history of high alcohol use.

Quality control measures were applied to exclude participants with missing liver elastography data and unacceptable vibration-controlled transient elastography (VCTE) measurements. The study followed transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) reporting guidelines and obtained approval from the National Center for Health Statistics (NCHS) research ethics review board. 

The outcome variable of interest was high-risk MASLD, defined by FAST scores of ≥ 0.35 and ≥ 0.67. Predictors included demographic information, physical exam findings, laboratory values, past medical history, and serologic biomarkers of liver fibrosis. These biomarkers were calculated based on established cutoff values, including the Fibrosis-4 Index (FIB4), the non-alcoholic fatty liver disease (NAFLD) fibrosis score (NFS), body mass index (BMI), aspartate aminotransferase to ALT  ratio (AST/ALT) ratio, and the presence of type 2 diabetes mellitus, and the APRI.

XGBoost algorithm was employed to develop an ML model, with data split into training, validation, and test sets. Model performance was evaluated using various metrics that included area under the receiver operating characteristic curve (AUROC), accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). The researchers utilized the Shapley additive explanations (SHAP) approach to interpret the XGBoost model and implemented techniques to handle missing data.

Statistical analyses included t-tests to describe patient characteristics and compare means between high-risk MASLD outcome groups. The study adhered to ethical principles outlined by the NCHS to ensure participant rights, welfare, and privacy were upheld throughout the research process.

MASLD Analysis Insights

The study encompassed 5156 subjects meeting the inclusion criteria, revealing a prevalence of high-risk MASLD at FAST ≥ 0.35 and FAST ≥ 0.67 of 5.8% and 1.1%, respectively. Among the subjects, the median age was 55 years, with approximately 48% being women. Notable differences emerged between the high-risk MASLD and no high-risk MASLD groups, characterized by a higher prevalence of diabetes, elevated BMI, waist circumference, liver enzymes, and altered lipid profiles among those classified as high-risk MASLD individuals.

The XGBoost MASLD models showcased robust predictive accuracy across various FAST score thresholds, with AUROC ranging from 0.91 to 0.97. Comparative assessments with traditional ML models highlighted the superior performance of XGBoost, underscoring its efficacy in predicting high-risk MASLD. The XGBoost MASLD model outperformed traditional serologic biomarkers, showcasing higher AUROC, sensitivity, specificity, PPV, and NPV compared to FIB4, NFS, and APRI scores. Employing the SHAP approach facilitated the interpretation of the XGBoost MASLD models, revealing ALT, GGT, BMI, platelet count, and age as significant predictors influencing high-risk MASLD prediction. SHAP values elucidated the contributions of these predictors to model predictions, shedding light on their respective impacts.

The study pinpointed areas where the predictive model faltered by scrutinizing false positive and false negative predictions, providing crucial insights for refinement. Understanding the reasons behind these inaccuracies allows for targeted improvements, such as fine-tuning model parameters or incorporating additional relevant predictors, to enhance the precision and reliability of the model. This iterative refinement process ensures the predictive model evolves to reflect real-world scenarios and clinical complexities better.

Continuous assessment and enhancement of the model allow clinicians to confidently integrate it into their decision-making process, thereby optimizing patient care and outcomes. Ultimately, these efforts contribute to advancing the field of predictive modeling in healthcare, paving the way for more accurate and effective tools to support clinical practice.

Conclusion

To sum up, the study showcased the potential of explainable ML in identifying high-risk MASH. The developed XGBoost model outperformed traditional serologic tests, offering a more comprehensive approach to detection. The capability of the model to detect heterogeneous subphenotypes suggests potential enhancements in diagnosis and management, which could optimize clinical outcomes. Further exploration of its clinical applications is warranted, with possible implications for resource-limited settings.

Journal reference:
Silpaja Chandrasekar

Written by

Silpaja Chandrasekar

Dr. Silpaja Chandrasekar has a Ph.D. in Computer Science from Anna University, Chennai. Her research expertise lies in analyzing traffic parameters under challenging environmental conditions. Additionally, she has gained valuable exposure to diverse research areas, such as detection, tracking, classification, medical image analysis, cancer cell detection, chemistry, and Hamiltonian walks.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Chandrasekar, Silpaja. (2024, April 24). Explainable ML for High-Risk Non-Alcoholic Steatohepatitis Prediction. AZoAi. Retrieved on November 21, 2024 from https://www.azoai.com/news/20240424/Explainable-ML-for-High-Risk-Non-Alcoholic-Steatohepatitis-Prediction.aspx.

  • MLA

    Chandrasekar, Silpaja. "Explainable ML for High-Risk Non-Alcoholic Steatohepatitis Prediction". AZoAi. 21 November 2024. <https://www.azoai.com/news/20240424/Explainable-ML-for-High-Risk-Non-Alcoholic-Steatohepatitis-Prediction.aspx>.

  • Chicago

    Chandrasekar, Silpaja. "Explainable ML for High-Risk Non-Alcoholic Steatohepatitis Prediction". AZoAi. https://www.azoai.com/news/20240424/Explainable-ML-for-High-Risk-Non-Alcoholic-Steatohepatitis-Prediction.aspx. (accessed November 21, 2024).

  • Harvard

    Chandrasekar, Silpaja. 2024. Explainable ML for High-Risk Non-Alcoholic Steatohepatitis Prediction. AZoAi, viewed 21 November 2024, https://www.azoai.com/news/20240424/Explainable-ML-for-High-Risk-Non-Alcoholic-Steatohepatitis-Prediction.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Machine Learning Predicts Recovery in Endurance Athletes But Requires Personalized Strategies