In an article published in the journal Nature, researchers explored the prevalence and prediction of anemia among young girls in Ethiopia using machine learning (ML) algorithms. Analyzing data from the 2016 Ethiopian Demographic and Health Survey (EDHS), the authors evaluated various predictors of anemia and identified the random forest classifier as the most effective model.
Key determinants included socioeconomic factors, demographic characteristics, and lifestyle choices. The findings suggested targeted interventions to address anemia among young girls in Ethiopia.
Background
Anemia, characterized by a deficiency in red blood cells or hemoglobin levels, poses significant health risks globally, particularly among young women. This demographic group, due to heightened physiological needs for essential nutrients like iron and folic acid, is particularly susceptible to anemia, exacerbated by factors such as intestinal parasitic infestations prevalent in developing countries. While anemia among reproductive-age women has been extensively studied, research specifically focusing on young women, particularly in Ethiopia, has been limited.
Previous studies in Ethiopia have primarily utilized traditional statistical methods to analyze anemia prevalence and its determinants. However, these approaches may overlook intricate relationships within the data. This paper addressed this gap by employing advanced ML techniques to predict anemia and identify its predictors among young girls in Ethiopia. By leveraging ML algorithms, which excelled in handling nonlinear data and capturing complex interrelationships among predictors, this study provided a more nuanced understanding of anemia prevalence and its associated factors.
The research utilized data from the 2016 Ethiopian Demographic and Health Survey (EDHS) and applied eight ML algorithms, including association rule mining, to forecast anemia and identify its predictors. By doing so, it contributed to the existing literature by offering a novel approach to analyzing anemia prevalence among young women, ultimately providing valuable insights for policymakers to design targeted interventions and mitigate the adverse effects of anemia on this vulnerable demographic group.
Predicting Anemia Among Youth Girls in Ethiopia Using Advanced ML Techniques
The research utilized data from the 2016 EDHS, a nationally representative cross-sectional survey conducted in Ethiopia. Covering nine regional states and two city administrations, the survey included a diverse sample of women aged 15-49. For this study, a weighted sample of 5,642 young girls was analyzed. The primary objective was to predict anemia among these young girls and identify its predictors using advanced ML algorithms.
Various demographic, socioeconomic, and lifestyle factors were considered as potential predictors of anemia, including age group, religion, wealth index, occupation, media exposure, educational status, source of drinking water, family size, body mass index, altitude, type of residence, and region. Data preprocessing involved cleaning, handling missing values, and addressing imbalanced categories in the outcome variable. Multiple data balancing techniques were employed to improve model performance.
Eight state-of-the-art ML algorithms were applied, including decision tree, random forest, extreme gradient boost, light gradient boosting machine, support vector machine, logistic regression, k-nearest neighbor, and Gaussian Naïve Bayes. Evaluation metrics such as accuracy, sensitivity, specificity, F1 score, and area under the curve (AUC) were used to assess model performance. Additionally, 10-fold cross-validation was utilized to validate the models.
Furthermore, feature engineering techniques such as one-hot coding and label encoding were applied, along with dimensionality reduction methods to streamline the input variables. Model interpretability was enhanced through shapley additive explanations (SHAP) analysis to understand feature importance and association rule mining to uncover patterns related to anemia.
Ethical considerations were addressed, ensuring compliance with ethical standards and obtaining informed consent from respondents. Overall, the study provided valuable insights for policymakers to develop targeted interventions and mitigate the prevalence of anemia among young girls in Ethiopia.
Findings and Predictive Analysis Insights
The study analyzed data from 5642 young girls in Ethiopia, revealing that 25.43% of them were anemic. Socio-demographic characteristics highlighted disparities, with most respondents aged 15–19, from rural areas, and with orthodox Christian affiliation. Feature selection using the Boruta algorithm identified influential variables for predicting anemia status. Household sex and smoking status were deemed unimportant and excluded from further analysis.
Data balancing techniques, including synthetic minority oversampling technique (SMOTE), enhanced model performance, with the random forest model outperforming others, achieving an AUC of 82.4%. Hyper-parameter tuning via grid search optimized model precision, recall, and F1 score. Among selected algorithms, random forest, extreme gradient boosting, and support vector machine performed best, with AUC values of 0.82, 0.776, and 0.736, respectively.
SHAP value interpretation revealed region, media exposure, and marital status as significant predictors, while association rule mining identified key factors influencing anemia likelihood, such as age, region, and wealth index. For instance, a 96.3% likelihood of anemia was associated with young girls aged 15–19 from Dire Dawa with primary education and poor wealth index. These findings provided valuable insights for targeted interventions to address anemia among young girls in Ethiopia, emphasizing the importance of socio-demographic factors and regional disparities.
Insights and Implications
By evaluating eight different algorithms, including random forest and support vector machine, the authors illuminated the robust predictive capabilities, paving the way for automated screening tools in healthcare. Moreover, insights obtained from SHAP value analysis revealed key predictors like media exposure and wealth status, shedding light on nuanced risk factors. However, challenges like the absence of regression coefficients and limited data sources posed constraints. Yet, the study's implications were profound, offering avenues for targeted interventions and policy decisions to mitigate anemia's impact. Ultimately, this research underscored the transformative potential of ML in healthcare and the imperative of addressing anemia among vulnerable populations.
Conclusion
In conclusion, insights from ML analysis of anemia prevalence among young girls in Ethiopia underscored the significance of targeted interventions informed by predictive models. By leveraging advanced algorithms, socio-demographic determinants like media exposure and wealth status were identified as pivotal factors. However, challenges such as data limitations and model interpretability needed addressing.
Despite these, the study's implications for public health interventions were profound, emphasizing the transformative potential of ML in mitigating anemia's impact. Continued research and tailored interventions are crucial for effectively addressing anemia among vulnerable populations in Ethiopia and beyond.
Journal reference:
- Zemariam, A. B., Yimer, A., Abebe, G. K., Wondie, W. T., Abate, B. B., Alamaw, A. W., Yilak, G., Melaku, T. M., & Ngusie, H. S. (2024). Employing supervised machine learning algorithms for classification and prediction of anemia among youth girls in Ethiopia. Scientific Reports, 14(1), 9080. https://doi.org/10.1038/s41598-024-60027-4, https://www.nature.com/articles/s41598-024-60027-4