In a paper published in the journal Scientific Reports, researchers developed an explainable machine learning (ML) model using National Health and Nutrition Examination Surveys (NHANES) 2017-March 2020 data to predict high-risk metabolic dysfunction-associated steatohepatitis (MASH) by utilizing an ensemble-based extreme gradient boosting (XGBoost) model with high-performance metrics.
Their model outperformed traditional biomarkers like fibrosis-4 index (FIB-4P) and aspartate aminotransferase to platelet ratio index (APRI), with alanine aminotransferase (ALT), gamma-glutamyl transferase (GGT), platelet count, waist circumference, and age as top predictors by offering a promising tool for early identification of high-risk MASH patients, especially in resource-limited settings.
Background
Past work has highlighted metabolic dysfunction-as sociated steatotic liver disease (MASLD) as a prevalent global issue that affects approximately 25% of the population worldwide and 34% in the United States. Around 20% of individuals with MASLD progress to MASH, characterized by liver cell inflammation and damage. Current identification methods rely on liver biopsy, but non-invasive approaches like the fibroscan-AST (FAST) score offer alternatives. However, these methods may not be universally accessible. ML techniques, like gradient boosting, have shown promise in MASH diagnosis using routine clinical data and improving patient management and outcomes.
NHANES Data Analysis
The data were sourced from the NHANES conducted between 2017 and March 2020 and comprised a nationally representative sample of the noninstitutionalized US population. The study included participants aged 18 years or older who met specific criteria, such as testing negative for Hepatitis B and C antibodies and having no history of high alcohol use.
Quality control measures were applied to exclude participants with missing liver elastography data and unacceptable vibration-controlled transient elastography (VCTE) measurements. The study followed transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) reporting guidelines and obtained approval from the National Center for Health Statistics (NCHS) research ethics review board.
The outcome variable of interest was high-risk MASLD, defined by FAST scores of ≥ 0.35 and ≥ 0.67. Predictors included demographic information, physical exam findings, laboratory values, past medical history, and serologic biomarkers of liver fibrosis. These biomarkers were calculated based on established cutoff values, including the Fibrosis-4 Index (FIB4), the non-alcoholic fatty liver disease (NAFLD) fibrosis score (NFS), body mass index (BMI), aspartate aminotransferase to ALT ratio (AST/ALT) ratio, and the presence of type 2 diabetes mellitus, and the APRI.
XGBoost algorithm was employed to develop an ML model, with data split into training, validation, and test sets. Model performance was evaluated using various metrics that included area under the receiver operating characteristic curve (AUROC), accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). The researchers utilized the Shapley additive explanations (SHAP) approach to interpret the XGBoost model and implemented techniques to handle missing data.
Statistical analyses included t-tests to describe patient characteristics and compare means between high-risk MASLD outcome groups. The study adhered to ethical principles outlined by the NCHS to ensure participant rights, welfare, and privacy were upheld throughout the research process.
MASLD Analysis Insights
The study encompassed 5156 subjects meeting the inclusion criteria, revealing a prevalence of high-risk MASLD at FAST ≥ 0.35 and FAST ≥ 0.67 of 5.8% and 1.1%, respectively. Among the subjects, the median age was 55 years, with approximately 48% being women. Notable differences emerged between the high-risk MASLD and no high-risk MASLD groups, characterized by a higher prevalence of diabetes, elevated BMI, waist circumference, liver enzymes, and altered lipid profiles among those classified as high-risk MASLD individuals.
The XGBoost MASLD models showcased robust predictive accuracy across various FAST score thresholds, with AUROC ranging from 0.91 to 0.97. Comparative assessments with traditional ML models highlighted the superior performance of XGBoost, underscoring its efficacy in predicting high-risk MASLD. The XGBoost MASLD model outperformed traditional serologic biomarkers, showcasing higher AUROC, sensitivity, specificity, PPV, and NPV compared to FIB4, NFS, and APRI scores. Employing the SHAP approach facilitated the interpretation of the XGBoost MASLD models, revealing ALT, GGT, BMI, platelet count, and age as significant predictors influencing high-risk MASLD prediction. SHAP values elucidated the contributions of these predictors to model predictions, shedding light on their respective impacts.
The study pinpointed areas where the predictive model faltered by scrutinizing false positive and false negative predictions, providing crucial insights for refinement. Understanding the reasons behind these inaccuracies allows for targeted improvements, such as fine-tuning model parameters or incorporating additional relevant predictors, to enhance the precision and reliability of the model. This iterative refinement process ensures the predictive model evolves to reflect real-world scenarios and clinical complexities better.
Continuous assessment and enhancement of the model allow clinicians to confidently integrate it into their decision-making process, thereby optimizing patient care and outcomes. Ultimately, these efforts contribute to advancing the field of predictive modeling in healthcare, paving the way for more accurate and effective tools to support clinical practice.
Conclusion
To sum up, the study showcased the potential of explainable ML in identifying high-risk MASH. The developed XGBoost model outperformed traditional serologic tests, offering a more comprehensive approach to detection. The capability of the model to detect heterogeneous subphenotypes suggests potential enhancements in diagnosis and management, which could optimize clinical outcomes. Further exploration of its clinical applications is warranted, with possible implications for resource-limited settings.