In a recent publication in the journal PLOS ONE, researchers presented a machine learning technique that explores the wealth of information generated by modern large-scale education assessment (LSA) for education.
Background
Education is a top priority on global government development agendas due to its far-reaching impact on societal progress, labor productivity, general well-being, and economic advancement. Evaluating the effectiveness of educational systems is essential, requiring in-depth exploration of the underlying structures and mechanisms governing student achievements. LSAs have emerged as crucial tools for comprehensive educational evaluation from both national and international perspectives.
Comparing LSA outcomes across different educational systems at various levels is valuable but complex. It assumes that economic, political, and social factors within geographical boundaries significantly influence education, which can vary. Traditional statistical models have attempted to mitigate bias in comparisons, but they have limitations. Studies comparing educational systems have revealed regional disparities but often lack in-depth analysis or an effective framework. The current study introduces a novel benchmark approach that leverages data mining and machine learning to evaluate the effectiveness of educational systems in various regions of Brazil, offering flexibility and the potential to identify systems that exceed expectations.
Analyzing educational effectiveness in Brazilian regions
To assess the effectiveness of educational systems in various regions of Brazil, the authors employ an educational effectiveness framework that juxtaposes observed student performance at the school level with expected performance, considering student, family, and school attributes. Their primary interest lies in understanding performance discrepancies across subpopulations and using an efficient function to predict student achievement outcomes when trained on all training data and tested across subpopulations.
The authors employed the slicing analysis (SA) technique to test educational effectiveness by conducting paired hypothesis tests. These tests compare the performance of a model when tested in specific educational systems, revealing differences in their effectiveness. The study utilizes binary classifiers for systematic testing of paired comparisons, making the analysis more accessible for various machine learning algorithms.
The empirical data used originates from the Brazilian National Secondary School Exam (ENEM). Data preprocessing was undertaken to standardize and transform variables, resulting in 41 input variables for analysis. For modeling and evaluation, three machine learning classifiers — logistic regression, random forest, and ada-boosting — are employed. The leave-one-group-out cross-validation (LOOCV) setting is utilized, iterating through the dataset with each state (and region) acting as the test set in different iterations. This approach ensures a comprehensive evaluation of model variance and generalization across the entire dataset.
The study also includes a comparison with traditional hierarchical linear models (HLM) at the state level. HLM computes a measure of effectiveness for all states simultaneously, regardless of their comparability due to contextual differences. In contrast, the proposed machine learning approach offers more flexible and paired estimates, adapting to specific comparisons between states.
Results and analysis
Model Evaluation: Experimental results display the average area under the ROC curve (AUC) for each year and its standard deviation during cross-validation. The probability scores from the cross-validation procedure for 2019 indicate that schools across the country have a similar likelihood of being classified as high achievers, regardless of the region. An exception is observed in the South region, which lacks schools in the first decile. However, similar estimates for all other deciles across regions indicate that the model is well-calibrated nationwide, aligning with state-level results in 2019 and preceding years.
Comparison: Despite the models' strong performance, significant disparities in LSA scores persist nationwide. Results consistently show that the South and Southeast regions have more schools above the median, while the North and Northeast regions exhibit poorer results over the years. Balancing classes and varying class sizes pose challenges for comparisons. Subpopulations with a lower concentration of the positive class tend to have higher false negative rates (FNR) and lower false positive rates (FPR), while those with a higher positive class show the opposite pattern.
Brazilian Regions: Tabulated results present the FNR and FPR by all regions for all combinations of variables for 2019. Looking at the full model, the FNR metric reveals lower model confidence in Northeastern and Northern schools compared to others. Northeastern schools above the median are disproportionately classified as below the median at an 85 percent rate, a significant difference compared to all other regions, including the North. This suggests that Northeastern schools have more effective policies when controlling for contextual variables.
Brazilian States: In the case of states, results show the FNR for all states within regions with similar overall scores in 2019. Regional analysis results confirm that almost all Northeastern states are more effective than those in the north. Pará (PA) from the North is an exception, ranking first. Among Northeastern states, Maranhão (MA) and Pernambuco (PE) perform poorly. MA is noteworthy for having the lowest fraction of positive schools in the Northeast, making it the most effective if a naive model equally misclassifies positive schools. However, MA has the lowest FNR in the Northeast, indicating relatively fewer effective policies.
Conclusion
In summary, the researchers introduced machine learning techniques to explore the information generated by LSA. An empirical examination in Brazil validates the effectiveness of the proposed method in scrutinizing the variations in effectiveness across Brazilian educational administrative units at both regional and state levels between 2009 and 2019.