Harnessing Machine Learning for Predicting School Dropout

Download PDF Copy

By Muhammad OsamaReviewed by Susha Cheriyedath, M.Sc.Feb 22 2024

In an article published in the journal Scientific Reports, researchers from Mexico utilized several machine learning (ML) algorithms to design predictive models that can identify students who are at risk of dropping out and provide them with appropriate support. They employed their techniques for predicting school dropout at secondary and higher education levels.

*Study: Harnessing Machine Learning for Predicting School Dropout. Image credit: Elnur/Shutterstock*

Background

ML is a branch of artificial intelligence that enables computers to learn from data and perform tasks that normally require human intelligence. It can be classified into two categories: supervised and unsupervised learning. Supervised learning is when the computer is given a set of input-output pairs and learns to map new inputs to the desired outputs. Unsupervised learning occurs when the computer is given input data without explicit labels and learns to uncover patterns or structures within the data.

ML has been widely used in various fields, such as medicine, engineering, finance, and education. It can help improve the quality and effectiveness of teaching and learning processes and address challenges such as student retention, performance, and satisfaction. School dropout is a complex phenomenon that has multiple causes and consequences, and it is influenced by individual, family, school, and social factors. Therefore, ML can model and predict school dropout by handling large and heterogeneous datasets, capturing nonlinear relationships, and providing accurate results.

About the Research

In the present paper, the authors aimed to develop a model for predicting school dropout with 90% reliability. They used data from the 2010 and 2020 housing and population censuses and the 2015 intercensal survey conducted by the National Institute of Statistics and Geography (INEGI). These data sets included information about the residents and households in Mexico's 32 states and 2,457 municipalities, including factors such as ethnicity, birth, education, health services, economic issues, and other relevant characteristics.

The study selected 20 variables from the data sources based on their correlation with the target variable, which was the academic level of the individuals. The target variable indicated whether the individual had completed or dropped out of secondary or higher education. The selected variables included demographic, socioeconomic, and educational factors, such as age, gender, marital status, occupation, income, school attendance, school type, and school location. The researchers cleaned and homogenized the data, discarding incomplete, duplicate, and unspecified records and retaining only the records of people over 14 years old who entered secondary or higher education. The final dataset consisted of 1,080,782 records.

Furthermore, artificial neural networks (ANN), support vector machines (SVM), Bayesian optimization, random forest (RF), and linear ridge and Lasso regression were applied to create predictive models. These techniques were chosen because they have proven effective and competitive in solving regression problems. Moreover, the performance of each technique was compared in terms of reliability and processing time using different evaluation metrics, such as the coefficient of determination, the mean squared error, and the root mean squared error. The study utilized 80% of the data for training and 20% for testing.

Research Findings

The outcomes showed that all the ML techniques achieved high-reliability results, above 91%. However, the best technique in terms of reliability and processing time was the ANN, which obtained a reliability of 99%, followed by SVM and Bayesian optimization, which obtained a reliability of 99.5% and 99.4%, respectively. RF, linear ridge, and Lasso regression obtained a reliability of 91.3% and 91.1%, respectively. The error rates of the techniques were below 10%, which was the convergence criterion established by the authors. The ANN also had the shortest processing time, while random forest required the most computing power.

Several tests were also performed to optimize the parameters and structure of the ANN, such as the number of layers, neurons, activation function, and optimization algorithm. The authors found that ANN was the best configuration multilayer perceptron with four hidden layers and two neurons each, using the adaptive moment estimation (ADAM) optimization algorithm and the rectified linear unit (ReLU) activation function. Moreover, it was able to learn from the data and to predict the probability of school dropout for everyone based on the input variables.

The study also identified the most influential variables in predicting school dropout using the feature importance method. The most influential variables were school attendance, the school type, the school location, the occupation, the income, and the marital status. These variables reflect the economic, social, and educational factors that affect the decision of students to continue or abandon their studies.

Conclusion

In summary, the paper comprehensively demonstrated the feasibility and usefulness of applying ML to predict school dropout. The authors indicated that the best ML approach was the ANN. They also highlighted the most influential variables in predicting school dropout, which can aid in understanding the causes and consequences of this issue.

The research has several applications and implications for the educational sector, including providing timely support to at-risk students and evaluating the impact of various policies and programs. Additionally, the researchers proposed developing an open platform for institutions to access and utilize the data and predictions, facilitating ongoing model improvement with new data.

Journal reference:

Jiménez-Gutiérrez, A.L., Mota-Hernández, C.I., Mezura-Montes, E. et al. Application of the performance of machine learning techniques as support in the prediction of school dropout. Sci Rep 14, 3957 (2024). https://doi.org/10.1038/s41598-024-53576-1, https://www.nature.com/articles/s41598-024-53576-1

Posted in: AI Research News

Comments (0)

Written by

Muhammad Osama

Muhammad Osama is a full-time data analytics consultant and freelance technical writer based in Delhi, India. He specializes in transforming complex technical concepts into accessible content. He has a Bachelor of Technology in Mechanical Engineering with specialization in AI & Robotics from Galgotias University, India, and he has extensive experience in technical content writing, data science and analytics, and artificial intelligence.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Osama, Muhammad. (2024, February 22). Harnessing Machine Learning for Predicting School Dropout. AZoAi. Retrieved on April 19, 2025 from https://www.azoai.com/news/20240222/Harnessing-Machine-Learning-for-Predicting-School-Dropout.aspx.
MLA
Osama, Muhammad. "Harnessing Machine Learning for Predicting School Dropout". AZoAi. 19 April 2025. <https://www.azoai.com/news/20240222/Harnessing-Machine-Learning-for-Predicting-School-Dropout.aspx>.
Chicago
Osama, Muhammad. "Harnessing Machine Learning for Predicting School Dropout". AZoAi. https://www.azoai.com/news/20240222/Harnessing-Machine-Learning-for-Predicting-School-Dropout.aspx. (accessed April 19, 2025).
Harvard
Osama, Muhammad. 2024. Harnessing Machine Learning for Predicting School Dropout. AZoAi, viewed 19 April 2025, https://www.azoai.com/news/20240222/Harnessing-Machine-Learning-for-Predicting-School-Dropout.aspx.