In a paper published in the journal Water, researchers used artificial intelligence (AI) to predict groundwater levels in the water-scarce Bilate watershed in southern Ethiopia by considering various machine learning models. Gradient Boosting Regression (GBR) performed the best, with a high R-squared value and meter median absolute error (MAE). This predictive model can aid in sustainable borehole drilling decisions for the region's irrigation and drinking water access.
Background
In Ethiopia's water-scarce environment, where rainfed agriculture prevails, this study seeks to harness the potential of AI to predict groundwater levels. With 95% of Ethiopia's agricultural areas reliant on rainfall and most of the population engaged in small-scale farming, poverty and food insecurity are significant challenges. Drought and unpredictable climate change-induced rainfall further exacerbate crop yield fluctuations, especially for cereals.
The study centers on the Bilate watershed in southern Ethiopia, utilizing non-time series data from 75 existing boreholes. By employing machine learning models, the research aims to accurately predict groundwater levels, enabling informed decisions on thriving drilling locations, thereby offering a viable solution to the region's water scarcity and its impact on agriculture.
Proposed Method
In this study, the focus is on the analysis of data related to groundwater levels in the Bilate watershed in southern Ethiopia. The primary dependent variable of interest is the static water level, collected from 75 boreholes in 2007 by the Arba Minch Water Technology Institute. The validity of the data's age substantiates its suitability for analysis, as changes in static water levels over the intervening years will typically remain minimal within a few centimeters. Such variations are deemed inconsequential in the context of the analysis's stringent accuracy requirements.
This minor deviation is inconsequential compared to the precision demanded by the analysis. This historical dataset serves as a valuable resource to showcase the effectiveness of machine learning in predicting water levels. A feature selection process was employed to determine which variables significantly influence water level predictions. This approach combined domain expertise with a backward stepwise feature selection method using random forest regression (RFR). It led to the identification of a subset of twenty independent variables for modeling and analysis, encompassing factors like elevation, soil type, meteorological variables (e.g., precipitation, specific humidity, wind speed, and land surface temperature), and vegetation (i.e., Normalized Difference Vegetation Index (NDVI)) for three seasons spanning from October 2005 to September 2007.
The study primarily focuses on cultivated and grassland areas, given the local conditions where boreholes for agricultural irrigation are typically not placed in forested, wetland, or shrubland areas. A training dataset comprising 63 randomly selected observations was used to build and evaluate machine learning models, with performance assessment conducted on a testing set consisting of the remaining 12 observations. Multiple experiments were performed for each algorithm, ensuring robust model generalization to unseen data. The analysis covers machine learning algorithms, including Multiple Linear Regression (MLR), Multivariate Adaptive Regression Spline (MARS), Artificial Neural Networks (ANN), RFR, and GBR. To expand predictions beyond the borehole locations and cover the entire Bilate region, a grid with a 100 m × 100 m resolution was generated and extracted the values of the same 20 independent variables for each grid point and processed the data like that of the original borehole dataset.
This broader analysis aims to understand water-level predictions for the entire Bilate region comprehensively. The study's data preparation and visualization primarily relied on Quantum Geographic Information System (QGIS) 3.24.1 and R 4.1.3 software tools. Resampling methods, including Leave-One-Out Cross-Validation (LOOCV) and bootstrapping, were employed to assess model performance and reduce bias and variance in the models. These evaluation metrics serve as critical indicators of model accuracy and predictive capabilities, including mutual information (MI), root mean square error (RMSE), MAE, and R-squared.
Study Results
The authors conducted an extensive analysis of various machine-learning algorithms for predicting groundwater depth in the Bilate region of Ethiopia. They employed a carefully chosen training and testing data partition, allowing for a robust model performance comparison. The study encompassed a range of algorithms, including MLR, MARS, ANN, RFR, and GBR. Through thorough evaluation and comparison, GBR emerged as the top-performing model, closely followed by RFR. These models exhibited their effectiveness in predicting groundwater levels, providing valuable insights for stakeholders involved in groundwater management and borehole drilling decisions. However, the study also highlighted the limitations of specific models, such as MLR's inability to capture complex relationships and ANN's tendency to overfit with small and less diverse datasets.
Furthermore, the paper extended its analysis to evaluate model performance in a practical context by comparing predicted water levels at grid points with actual measurements from boreholes. This grid point prediction approach offered insights into the models' applicability for real-world drilling decisions. Despite a slight reduction in prediction accuracy when relying solely on grid point data, the models still proved valuable for guiding drilling locations, particularly for sustainable irrigation purposes.
The study's culmination in a high-resolution map of predicted groundwater levels serves as a practical tool for stakeholders, including drilling companies, governmental agencies, and local farmers, facilitating informed decisions regarding groundwater utilization in the Bilate region of Ethiopia. This research provides a comprehensive and valuable contribution to groundwater prediction methodologies and their practical utility.
Conclusion
In summary, this research highlights AI's potential in locating sustainable drilling sites for water in water-scarce regions. It utilized non-time series data and found GBR to be the best-performing model for groundwater prediction. The model produced a high-resolution map for Bilate, aiding irrigation decisions. However, future studies must address limitations like data size, variable constraints, and model complexities.