AI Improves Air Pollution Prediction Accuracy

Download PDF Copy

By Muhammad OsamaReviewed by Susha Cheriyedath, M.Sc.Aug 16 2024

A recent review published in the journal Environmental Research examined how well machine learning (ML) algorithms predict ambient air pollution levels compared to traditional statistical methods.

*Study: AI Improves Air Pollution Prediction Accuracy. Image Credit: TR STOK/Shutterstock.com*

The researchers focused on three key pollutants: nitrogen dioxide (NO₂), ultrafine particles (UFPs), and black carbon (BC). These pollutants have high spatial and temporal variability and significant health impacts. The review aimed to determine if ML methods offer better performance than traditional statistical techniques in capturing these variations.

Background

Air pollution is a major global health issue, affecting illness and death rates worldwide. Accurate exposure assessments are crucial for understanding the risks of air pollution. Pollutants such as NO₂, UFPs, and BC fluctuate widely in space and time, making them challenging to model with traditional methods.

Land use regression (LUR) models are often used to estimate outdoor air pollution, but these models may struggle with the complex, non-linear relationships between pollution levels and environmental factors. ML techniques, which can better capture these non-linear relationships, have become increasingly popular in air pollution modeling.

About the Review

In this study, the authors aimed to assess the performance of ML methods in predicting ambient concentrations of NO₂, UFPs, and BC compared to statistical regression models. To identify relevant studies, they searched two major scientific databases, Scopus and Web of Science, for research published up to June 13, 2024.

The studies had to meet specific criteria: they needed to report spatial or spatiotemporal models using both ML and statistical regression methods for the same pollutants and datasets, focus on outdoor UFPs, NO₂, or BC, include a quantitative assessment of model performance, and be peer-reviewed articles with original data.

The researchers identified 38 eligible studies with 46 model comparisons. These studies were conducted in various countries, covering urban, regional, and global spatial extents. Detailed information on study designs, modeling methods, and performance metrics, including coefficient of determination (R²), and root mean square error (RMSE) was extracted.

Statistical methods ranged from linear regression techniques, like multiple linear regression (MLR) and stepwise linear regression (SLR), to nonlinear and regularized methods such as generalized additive models (GAM) and least absolute shrinkage and selection operator (LASSO). The ML methods included random forest (RF), artificial neural networks (ANN), extreme gradient boosting (XGBoost), and convolutional neural networks (CNN), among others.

Key Results

The review found that ML methods outperformed statistical regression models in 34 of the 46 model comparisons. On average, the best ML models showed an increase of 0.12 in R² and a 20% decrease in RMSE compared to the best statistical models. Tree-based methods, such as RF and XGBoost, were the most frequently used and best-performing ML approaches, surpassing other methods in 12 of 17 multi-model comparisons. Whereas ANN models often performed the worst among all the evaluated ML methods.

ML methods provided greater performance gains for spatiotemporal models (predicting hourly, daily, or monthly pollutant levels) compared to spatial models (predicting annual or seasonal averages). This may be due to linear non-regularized statistical methods, which struggled with the complexity of short-term pollutant variations.

Interestingly, nonlinear and regularized statistical regression methods, such as GAM and LASSO, sometimes performed similarly to ML models, especially for spatial models. This suggests that flexible regression techniques can match ML performance in some scenarios.

Applications

This review has significant implications for air pollution exposure assessment and epidemiological research. Accurate modeling of ambient air pollutant concentrations is crucial for estimating individual exposures and understanding the health impacts of air pollution. The superior performance of ML, particularly tree-based methods, in predicting spatiotemporal variations of NO₂, UFPs, and BC suggests that these techniques could improve exposure assessments and epidemiological studies.

The insights from this study can guide the choice of modeling approaches for different air pollutants and study designs. For example, nonlinear and regularized statistical methods may be more suitable for modeling spatial patterns, while ML techniques could be beneficial for spatiotemporal modeling.

Conclusion

The review summarized that ML methods, especially tree-based algorithms, generally outperformed traditional statistical regression techniques in predicting the spatial and temporal variations of key air pollutants. It highlighted the potential of ML to enhance air pollution exposure assessment and contribute to more accurate epidemiological studies.

The review also emphasized the need for further research to compare a broader range of statistical and ML methods and the importance of standardized reporting of methodologies and results. Future research should explore the performance of different ML algorithms, including deep learning methods, in various contexts, and prioritize the development of standardized reporting guidelines to ensure transparency, reproducibility, and comparability across studies.

Journal reference:

Vachon, J., Kerckhoffs, J., Buteau, S., & Smargiassi, A. Do Machine Learning Methods Improve Prediction of Ambient Air Pollutants with High Spatial Contrast? A Systematic Review. Environment Research, 2024, 119751. DOI: 10.1016/j.envres.2024.119751, https://www.sciencedirect.com/science/article/pii/S0013935124016566

Posted in: AI Research News

Comments (0)

Written by

Muhammad Osama

Muhammad Osama is a full-time data analytics consultant and freelance technical writer based in Delhi, India. He specializes in transforming complex technical concepts into accessible content. He has a Bachelor of Technology in Mechanical Engineering with specialization in AI & Robotics from Galgotias University, India, and he has extensive experience in technical content writing, data science and analytics, and artificial intelligence.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Osama, Muhammad. (2024, August 16). AI Improves Air Pollution Prediction Accuracy. AZoAi. Retrieved on October 30, 2025 from https://www.azoai.com/news/20240816/AI-Improves-Air-Pollution-Prediction-Accuracy.aspx.
MLA
Osama, Muhammad. "AI Improves Air Pollution Prediction Accuracy". AZoAi. 30 October 2025. <https://www.azoai.com/news/20240816/AI-Improves-Air-Pollution-Prediction-Accuracy.aspx>.
Chicago
Osama, Muhammad. "AI Improves Air Pollution Prediction Accuracy". AZoAi. https://www.azoai.com/news/20240816/AI-Improves-Air-Pollution-Prediction-Accuracy.aspx. (accessed October 30, 2025).
Harvard
Osama, Muhammad. 2024. AI Improves Air Pollution Prediction Accuracy. AZoAi, viewed 30 October 2025, https://www.azoai.com/news/20240816/AI-Improves-Air-Pollution-Prediction-Accuracy.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.

Post a new comment

(Logout)

Post

Sign in to keep reading

We're committed to providing free access to quality science. By registering and providing insight into your preferences you're joining a community of over 1m science interested individuals and help us to provide you with insightful content whilst keeping our service free.