Predicting Upper Secondary Education Dropout Using Machine Learning

In a recent study published in the journal Scientific Reports, researchers introduced a novel predictive model employing machine learning techniques to identify students at risk of dropping out of upper secondary education. Their approach aims to pinpoint the most influential features for predicting dropout.

Proposed research workflow. Our process begins with data collection over 13 years, from kindergarten to the end of upper secondary education (Step 1), followed by data processing which includes cleaning and imputing missing feature values (Step 2). We then apply four machine learning models for dropout and non-dropout classification (Step 3), and evaluate these models using 6-fold cross-validation, focusing on performance metrics and ROC curves (Step 4). Image Credit: https://www.nature.com/articles/s41598-024-63629-0
Proposed research workflow. Our process begins with data collection over 13 years, from kindergarten to the end of upper secondary education (Step 1), followed by data processing which includes cleaning and imputing missing feature values (Step 2). We then apply four machine learning models for dropout and non-dropout classification (Step 3), and evaluate these models using 6-fold cross-validation, focusing on performance metrics and ROC curves (Step 4). Image Credit: https://www.nature.com/articles/s41598-024-63629-0

Background

Education is widely considered a key factor for social and economic development and individual empowerment, yet many students face the risk of not completing upper secondary education, which can severely impact their careers. Therefore, understanding and preventing school dropouts is crucial.

School dropout is a complex phenomenon with multiple influencing factors, including individual, family, school, and societal elements. Previous research, often relying on traditional statistical methods and short-term data, has attempted to identify dropout predictors and develop interventions. However, these approaches may not fully capture the dynamic and longitudinal nature of dropout.

About the Research

In this paper, the authors aimed to use machine learning techniques to predict upper secondary education dropout based on a comprehensive 13-year dataset from kindergarten to Grade 9. This approach allows for handling large and complex datasets, uncovering hidden patterns and relationships, and providing actionable insights into dropout prediction.

The dataset included data from the "First Steps" follow-up study and its extension, the "School Path: From First Steps to Secondary and Higher Education" study, which followed approximately 2,000 children born in 2000 in four municipalities in Finland. It covered a wide range of features such as family background, individual factors, behavior, motivation, engagement, bullying, health behavior, media usage, cognitive skills, and academic outcomes.

The target variable was the participant’s status 3.5 years after starting upper secondary education, as determined from school registers. Participants who had not completed upper secondary education by this time were coded as having dropped out.

The study employed four supervised classification algorithms: balanced random forest (BRF), easy ensemble (Adaboost ensemble), bagging decision tree, and random subspace boosting (Adaboost). These algorithms were chosen for their ability to handle imbalanced datasets, as the dropout cases were much fewer than the non-dropout cases. The performance of each algorithm was assessed utilizing six-fold cross-validation, which involves splitting the dataset into six subsets and using one subset as the test set while the remaining subsets serve as the training set for each iteration.

To examine whether accurate predictions could be made as early as the end of primary school, the researchers compared the performance of the algorithms using data up to Grade 9 and data up to Grade 6. This comparison aimed to determine the earliest point at which reliable dropout predictions could be achieved.

Research Findings

The outcomes revealed that the BRF algorithm performed the best among the four algorithms, achieving a mean area under the curve (AUC) of 0.65 with data up to Grade 9 and 0.61 with data up to Grade 6. It also showed the highest balanced accuracy, which accounts for class imbalance by applying balanced sample weights.

The balanced accuracy was 0.61 with data up to Grade 9 and 0.59 with data up to Grade 6. These results indicated that the algorithm could accurately classify dropout and non-dropout status as early as Grade 6, with only a slight decrease in performance compared to using data up to Grade 9.

Moreover, the authors identified the most influential features for predicting dropout based on the BRF algorithm's feature scores. The top 20 features, averaged across the six folds of cross-validation, belonged to two domains: academic outcomes and cognitive skills. The academic outcomes domain included reading fluency, reading comprehension, arithmetic, multiplication, and program for International Student Assessment (PISA) scores, measured at various grades from Grade 1 to Grade 9.

The cognitive skills domain included rapid automatized naming and vocabulary, measured in kindergarten. These features were consistent with previous literature highlighting the importance of early academic and cognitive skills for later educational attainment and dropout risk. 

This research demonstrates the potential of machine learning to predict upper secondary education dropout using comprehensive, long-term data. Early identification of at-risk students, even as early as the end of primary school, enables timely interventions and support strategies. The paper also underscored the importance of early cognitive and academic skills in predicting later dropout risk.

Conclusion

In summary, the machine learning-based approach proved effective for forecasting upper secondary education dropout. It accurately classified dropout and non-dropout status as early as Grade 6, utilizing data from various domains such as family background, individual factors, behavior, motivation, engagement, bullying, health behavior, media usage, cognitive skills, and academic outcomes.

Moreover, the most influential features for predicting dropout, such as reading fluency, reading comprehension, and arithmetic skills, were identified. Overall, the researchers suggested that machine learning could support educators in identifying and intervening with at-risk students, thereby enhancing educational outcomes and reducing societal costs.

Journal reference:
Muhammad Osama

Written by

Muhammad Osama

Muhammad Osama is a full-time data analytics consultant and freelance technical writer based in Delhi, India. He specializes in transforming complex technical concepts into accessible content. He has a Bachelor of Technology in Mechanical Engineering with specialization in AI & Robotics from Galgotias University, India, and he has extensive experience in technical content writing, data science and analytics, and artificial intelligence.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Osama, Muhammad. (2024, June 14). Predicting Upper Secondary Education Dropout Using Machine Learning. AZoAi. Retrieved on November 21, 2024 from https://www.azoai.com/news/20240614/Predicting-Upper-Secondary-Education-Dropout-Using-Machine-Learning.aspx.

  • MLA

    Osama, Muhammad. "Predicting Upper Secondary Education Dropout Using Machine Learning". AZoAi. 21 November 2024. <https://www.azoai.com/news/20240614/Predicting-Upper-Secondary-Education-Dropout-Using-Machine-Learning.aspx>.

  • Chicago

    Osama, Muhammad. "Predicting Upper Secondary Education Dropout Using Machine Learning". AZoAi. https://www.azoai.com/news/20240614/Predicting-Upper-Secondary-Education-Dropout-Using-Machine-Learning.aspx. (accessed November 21, 2024).

  • Harvard

    Osama, Muhammad. 2024. Predicting Upper Secondary Education Dropout Using Machine Learning. AZoAi, viewed 21 November 2024, https://www.azoai.com/news/20240614/Predicting-Upper-Secondary-Education-Dropout-Using-Machine-Learning.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Machine Learning Optimizes EV Charging Stations in Hong Kong's Green Transport Push