In an article published in the journal Scientific Reports, researchers evaluated machine learning (ML) models for predicting anomalies in daily maximum temperature (Tmax) in India during March to June. They highlighted the optimal model's effectiveness, particularly in April and May, enhancing numerical forecasts.
Background
India's diverse geography and climatic variations make accurate forecasting a critical need for various sectors like agriculture, energy, and disaster management. Predicting Tmax 10 days in advance can aid in better preparation. Previous efforts to predict heat waves in India have relied on numerical models.
Recently, ML models have been used to forecast Tmax anomalies, especially in regions with high Tmax variability prone to heat waves. These models, powered by historical weather data, satellite imagery, and other relevant variables, offer an advanced approach to improving temperature predictions. They typically involve regression algorithms that analyze the relationship between multiple meteorological parameters and temperature and excel in flexible, data-driven, and adaptable Tmax anomaly predictions, complementing numerical models while addressing local factors.
However, ML models are in their early stages of development, and no systematic analysis of ML models that can be used to forecast extreme temperatures across India has been performed. This work attempts to fill this gap by predicting Tmax with the help of various ML models with a 10-day lead time.
About the Study
In this study, ten ML models, including the Adaptive Boosting (AdaBoost) regressor with Multi-layer Perceptron (MLP) and Support Vector Machine regressor (SVR) as base regressors, Gradient Boosting regressor (GBM), CatBoost, Light Gradient Boosted Machine (LightGBM), XGBoost, and Bagging regressors, are assessed for their capability to predict 10-day lead daily Tmax anomalies in India from March to June during the period 1982-2020. These models are implemented using Scikit-learn and are fine-tuned by varying parameters such as the number of neurons, activation functions, and solvers. Preprocessing techniques like standardization, min-max normalization, power transformation, and robust scaling are applied to the input data, and feature reduction using principal component analysis (PCA) is performed to enhance model performance.
A comprehensive evaluation of ML models was conducted to predict maximum Tmax anomalies in India during the critical months of March to June, focusing on extreme Tmax events exceeding 4°C. The assessment employs performance metrics such as the Accuracy (ACC) skill score and Root Mean Square Error (RMSE), as well as the ability to predict extreme Tmax anomalies. Ensemble models are explored as their collective predictions often outperform individual models.
The study identifies configurations that offer high ACC and low RMSE. To ensure their adequacy, the models' statistical properties are compared to observed Tmax anomalies. The researchers also evaluated the accuracy of extreme Tmax anomaly predictions using hit rate (HR) and false alarm rate (FAR). Additionally, granger causality tests are employed to determine how certain input attributes influence Tmax variations in India. Researchers also emphasized the need for further numerical model experiments to elucidate the physical processes through which these regions affect Tmax variations. They also investigated the relative importance of regions for predicting Tmax anomalies using permutation feature importance and Granger causality tests.
Results and Discussion
Various ML models were employed to predict daily Tmax anomalies in regions of high standard deviation over India from March to June. After extensive analysis, the researchers identified an optimal model. This model excelled in predicting Tmax anomalies, demonstrating a higher hit rate and lower false alarm rate for extreme temperatures in March to May.
Benchmarking against persistence and CFS reforecast predictions revealed that AdaBoost with MLP outperformed other models in predicting Tmax anomalies, matching the performance of CFS in forecasting paramount temperatures.
However, the model was not as useful in March and June, where it performed similarly to persistence. This suggests that ML models can complement numerical models in predicting Tmax over India during challenging months like April and May.
These findings show the potential of ML models to complement numerical predictions. They identified important regions influencing Tmax anomalies through permutation feature importance and Granger causality. Future work aims to extend predictions to all grid points in India and explore hybrid models incorporating more data sources for enhanced accuracy.
Conclusion
In conclusion, this research addresses the crucial task of predicting extreme Tmax anomalies in India using ML models. It evaluates ML model performance, identifies suitable ML models, and investigates the key input attributes and regions that influence Tmax anomalies. These findings contribute to the growing field of ML in climate and weather prediction, offering valuable insights for improving extreme temperature forecasts in India.
While this study focused on area-averaged Tmax anomalies over regions with high Tmax variability, future work will aim to extend these predictions to cover all grid points across India. Furthermore, there is potential for improved predictions by incorporating a wider range of data sources, including numerical weather prediction models, and exploring more complex deep learning models.