In a paper published in the journal Water, researchers explored the critical issue of groundwater quality and its significance for various purposes, including drinking and irrigation, by focusing on Sakrand, a city in the province of Sindh. They sought to determine the region's Water Quality Index (WQI) by employing an Artificial Intelligence (AI) prediction model to streamline the process and enhance accuracy.
The study collected 80 data samples, and classification learners were engaged, considering raw and normalized data. The classifiers examined included Support Vector Machine (SVM), K-Nearest Neighbors (K-NN), Ensemble Tree (ET), and Discrimination Analysis (DA), implemented using MATrix LABoratory (MATLAB). The findings underscored SVM as the most effective classifier for raw and normalized data, achieving high prediction accuracy levels in the primary and normalized datasets.
Background
The availability of clean groundwater is essential for various purposes, including drinking, industry, and agriculture. However, increasing population, industrialization, and changing lifestyles have raised water demand, making groundwater a crucial source. Groundwater quality is often jeopardized by chemical and microbial contamination, particularly in regions relying heavily on groundwater due to limited surface water resources. This contamination has worsened due to human activities and environmental changes. Therefore, assessing groundwater quality is crucial to address the consequences of these challenges on this vital resource.
Related Work
Previous studies have highlighted the growing concern of groundwater contamination, particularly in Pakistan's Punjab and Sindh provinces, caused by rapid industrialization, mining activities, and agriculture. These regions have reported significant contamination issues, compounded by increased human activities and environmental changes that have deteriorated water quality, leading to more severe waterborne diseases.
Traditional WQI computation methods are time-consuming and prone to errors during sub-index calculations. Prior research has explored the application of AI models, including SVM, NB, Random Forest (RF), KNN, and Gradient Boosting (XGBoost), to predict accurate WQIs, consistently demonstrating the effectiveness of AI-based models in addressing groundwater contamination and its impacts on human health.
Proposed Method
The examination took place in Sakrand, located in Pakistan's Sindh province. With an elevation of 25 meters above sea level, this region experiences a harsh climate characterized by cold, dry winters and scorching, arid summers with monsoon rains from July to September. Wheat and cotton are the main crops in the delta plain, with groundwater sourced from the nearby Indus River. Notably, the local groundwater needs to be more saline due to factors like low crop intensity and canal seepage. Groundwater levels range from 1.5 to 12 meters, with the flow direction varying between westerly and south-westerly. The primary aquifer consists of sand and lacks artesian water.
Between April and May 2022, 80 groundwater samples were gathered from shallow aquifers (<35 m) in Sakrand. Researchers filtered these samples to 0.45 μm and actively recorded their geographic locations using a Global Positioning System (GPS). They analyzed cations and anions following the standard procedures outlined in the American Public Health Association's (APHA 2005) guidelines.
It adjusted the samples' potential of Hydrogen (pH) to less than 2.0 for cation analysis using nitric acid (HNO3). Researchers assessed water quality based on water quality indices (WQI) by measuring various physicochemical parameters such as electrical conductivity (EC), total dissolved solids (TDS), temperature, the potential of hydrogen (pH), total alkalinity, chloride (Cl−), bicarbonate (HCO3−), nitrate (NO3−), sulfate (SO42−), calcium (Ca2+), magnesium (Mg2+), sodium (Na+), potassium (K+), iron (Fe2+), and arsenic (As) using appropriate methods.
The WQI was calculated to assess groundwater suitability for human consumption, following the criteria the World Health Organization (WHO) outlined. The three-step process for computing the WQI began with determining the weights of individual parameters. Following this, the relative weights of each parameter were determined.
In the third step, quality-rating scales were assigned to each parameter based on their concentrations and WHO standards, resulting in sub-indices. The WQI was obtained by summing these sub-indices. The WQI classes were divided into three categories (excellent, sound, and poor) and used in machine learning classification models for further analysis.
Classification Learner is a MATLAB tool encompassing various classifiers, including decision trees, SVM, KNN, ET, and DA. This study used it to train a classification model on a known dataset for supervised machine learning.
The study employed the linear SVM to construct a classifier model for evaluating the new data and determining their class. Linear SVM demonstrated the highest prediction accuracy for raw and normalized data, followed by KNN. This approach used the trained linear SVM model to classify new data into excellent and poor categories based on the WQI. Inaccurate predictions can arise from insufficient features, imbalanced training data, overfitting, or underfitting.
Experimental Results
The linear SVM achieved a high prediction accuracy of 90.8% with raw data. The classifier accurately predicted samples with excellent and poor WQI states but misclassified some samples with good WQI. Furthermore, applying data normalization maintained a high prediction accuracy of 89.2%, and the researchers evaluated the model's performance using testing samples. These results demonstrate the effectiveness of the linear SVM model in assessing groundwater quality and suggest that data normalization can further enhance its accuracy.
Conclusion
To summarize, this study used the classification learner tool in MATLAB to create a prediction model to reduce the computation time required for determining WQI states. In both cases, the study employed raw and normalized data to evaluate the prediction accuracy of various classifiers. In both instances, linear SVM emerged as the top performer, achieving prediction accuracies of 90.8% and 89.2% for raw and normalized data, respectively. Furthermore, when applied to testing data, the model exhibited improved accuracy with normalized data, reaching 93.33% compared to 86.67% with raw data. These results demonstrate that the linear SVM model can accurately predict WQI codes for new data samples.