Machine Learning Algorithms for Predicting Water Quality Index

In a paper published in the journal Water, researchers explored the critical issue of groundwater quality and its significance for various purposes, including drinking and irrigation, by focusing on Sakrand, a city in the province of Sindh. They sought to determine the region's Water Quality Index (WQI) by employing an Artificial Intelligence (AI) prediction model to streamline the process and enhance accuracy.

Study: Machine Learning Algorithms for Predicting the Water Quality Index. Image credit: wertinio/Shutterstock
Study: Machine Learning Algorithms for Predicting the Water Quality Index. Image credit: wertinio/Shutterstock

The study collected 80 data samples, and classification learners were engaged, considering raw and normalized data. The classifiers examined included Support Vector Machine (SVM), K-Nearest Neighbors (K-NN), Ensemble Tree (ET), and Discrimination Analysis (DA), implemented using MATrix LABoratory (MATLAB). The findings underscored SVM as the most effective classifier for raw and normalized data, achieving high prediction accuracy levels in the primary and normalized datasets.

Background

The availability of clean groundwater is essential for various purposes, including drinking, industry, and agriculture. However, increasing population, industrialization, and changing lifestyles have raised water demand, making groundwater a crucial source. Groundwater quality is often jeopardized by chemical and microbial contamination, particularly in regions relying heavily on groundwater due to limited surface water resources. This contamination has worsened due to human activities and environmental changes. Therefore, assessing groundwater quality is crucial to address the consequences of these challenges on this vital resource.

Related Work

Previous studies have highlighted the growing concern of groundwater contamination, particularly in Pakistan's Punjab and Sindh provinces, caused by rapid industrialization, mining activities, and agriculture. These regions have reported significant contamination issues, compounded by increased human activities and environmental changes that have deteriorated water quality, leading to more severe waterborne diseases.

Traditional WQI computation methods are time-consuming and prone to errors during sub-index calculations. Prior research has explored the application of AI models, including SVM, NB, Random Forest (RF), KNN, and Gradient Boosting (XGBoost), to predict accurate WQIs, consistently demonstrating the effectiveness of AI-based models in addressing groundwater contamination and its impacts on human health.

Proposed Method

The examination took place in Sakrand, located in Pakistan's Sindh province. With an elevation of 25 meters above sea level, this region experiences a harsh climate characterized by cold, dry winters and scorching, arid summers with monsoon rains from July to September. Wheat and cotton are the main crops in the delta plain, with groundwater sourced from the nearby Indus River. Notably, the local groundwater needs to be more saline due to factors like low crop intensity and canal seepage. Groundwater levels range from 1.5 to 12 meters, with the flow direction varying between westerly and south-westerly. The primary aquifer consists of sand and lacks artesian water.

Between April and May 2022, 80 groundwater samples were gathered from shallow aquifers (<35 m) in Sakrand. Researchers filtered these samples to 0.45 μm and actively recorded their geographic locations using a Global Positioning System (GPS). They analyzed cations and anions following the standard procedures outlined in the American Public Health Association's (APHA 2005) guidelines.

It adjusted the samples' potential of Hydrogen (pH) to less than 2.0 for cation analysis using nitric acid (HNO3). Researchers assessed water quality based on water quality indices (WQI) by measuring various physicochemical parameters such as electrical conductivity (EC), total dissolved solids (TDS), temperature, the potential of hydrogen (pH), total alkalinity, chloride (Cl), bicarbonate (HCO3−), nitrate (NO3−), sulfate (SO42−), calcium (Ca2+), magnesium (Mg2+), sodium (Na+), potassium (K+), iron (Fe2+), and arsenic (As) using appropriate methods.

The WQI was calculated to assess groundwater suitability for human consumption, following the criteria the World Health Organization (WHO) outlined. The three-step process for computing the WQI began with determining the weights of individual parameters. Following this, the relative weights of each parameter were determined.

In the third step, quality-rating scales were assigned to each parameter based on their concentrations and WHO standards, resulting in sub-indices. The WQI was obtained by summing these sub-indices. The WQI classes were divided into three categories (excellent, sound, and poor) and used in machine learning classification models for further analysis.

Classification Learner is a MATLAB tool encompassing various classifiers, including decision trees, SVM, KNN, ET, and DA. This study used it to train a classification model on a known dataset for supervised machine learning.

The study employed the linear SVM to construct a classifier model for evaluating the new data and determining their class. Linear SVM demonstrated the highest prediction accuracy for raw and normalized data, followed by KNN. This approach used the trained linear SVM model to classify new data into excellent and poor categories based on the WQI. Inaccurate predictions can arise from insufficient features, imbalanced training data, overfitting, or underfitting.

Experimental Results

The linear SVM achieved a high prediction accuracy of 90.8% with raw data. The classifier accurately predicted samples with excellent and poor WQI states but misclassified some samples with good WQI. Furthermore, applying data normalization maintained a high prediction accuracy of 89.2%, and the researchers evaluated the model's performance using testing samples. These results demonstrate the effectiveness of the linear SVM model in assessing groundwater quality and suggest that data normalization can further enhance its accuracy.

Conclusion

To summarize, this study used the classification learner tool in MATLAB to create a prediction model to reduce the computation time required for determining WQI states. In both cases, the study employed raw and normalized data to evaluate the prediction accuracy of various classifiers. In both instances, linear SVM emerged as the top performer, achieving prediction accuracies of 90.8% and 89.2% for raw and normalized data, respectively. Furthermore, when applied to testing data, the model exhibited improved accuracy with normalized data, reaching 93.33% compared to 86.67% with raw data. These results demonstrate that the linear SVM model can accurately predict WQI codes for new data samples.

Journal reference:
Silpaja Chandrasekar

Written by

Silpaja Chandrasekar

Dr. Silpaja Chandrasekar has a Ph.D. in Computer Science from Anna University, Chennai. Her research expertise lies in analyzing traffic parameters under challenging environmental conditions. Additionally, she has gained valuable exposure to diverse research areas, such as detection, tracking, classification, medical image analysis, cancer cell detection, chemistry, and Hamiltonian walks.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Chandrasekar, Silpaja. (2023, October 13). Machine Learning Algorithms for Predicting Water Quality Index. AZoAi. Retrieved on October 24, 2025 from https://www.azoai.com/news/20231013/Machine-Learning-Algorithms-for-Predicting-Water-Quality-Index.aspx.

  • MLA

    Chandrasekar, Silpaja. "Machine Learning Algorithms for Predicting Water Quality Index". AZoAi. 24 October 2025. <https://www.azoai.com/news/20231013/Machine-Learning-Algorithms-for-Predicting-Water-Quality-Index.aspx>.

  • Chicago

    Chandrasekar, Silpaja. "Machine Learning Algorithms for Predicting Water Quality Index". AZoAi. https://www.azoai.com/news/20231013/Machine-Learning-Algorithms-for-Predicting-Water-Quality-Index.aspx. (accessed October 24, 2025).

  • Harvard

    Chandrasekar, Silpaja. 2023. Machine Learning Algorithms for Predicting Water Quality Index. AZoAi, viewed 24 October 2025, https://www.azoai.com/news/20231013/Machine-Learning-Algorithms-for-Predicting-Water-Quality-Index.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

Sign in to keep reading

We're committed to providing free access to quality science. By registering and providing insight into your preferences you're joining a community of over 1m science interested individuals and help us to provide you with insightful content whilst keeping our service free.

or

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Machine Learning Transforms Laser Metal Manufacturing With Real-Time Precision And Efficiency