In an article published in the journal Scientific Reports, researchers from India, Saudi Arabia, the USA, and the UK developed and evaluated data-driven machine learning (ML) models to predict the effluent soluble chemical oxygen demand (SCOD) of wastewater treated by a two-stage anaerobic onsite sanitation system. They compared four ML algorithms, namely linear regression, decision tree (DT), random forest (RF), and artificial neural network (ANN). The research also compared the proposed technique with the conventional anaerobic digestion (AD) model No. 1 (ADM1).
Background
AD is a biological process that converts organic matter into biogas, which can be used as a renewable energy source. It provides a low-cost and sustainable solution for sanitation and energy production, especially in developing countries where this technology is used to treat wastewater. However, AD is a complex and sensitive process that depends on various environmental and operational factors, such as temperature, flow, and load.
ADM1 is the most commonly used method for predicting and simulating the anaerobic digestion process. This mathematical model describes the biochemical reactions and mass balances involved in anaerobic digestion and can be applied to different types of anaerobic reactors and substrates. However, ADM1 has limitations, such as high computational complexity, high data requirement, high parameter uncertainty, and low adaptability to dynamic conditions. Moreover, ADM1 is not suitable for modeling the two-stage anaerobic onsite sanitation system.
About the Research
In the present paper, the authors designed and assessed several ML-based models to predict the effluent SCOD of wastewater. SCOD is a parameter that quantifies the amount of oxygen needed to oxidize dissolved organic matter in wastewater. This parameter reflects the organic content and biodegradability of wastewater. High SCOD levels in the effluent not only affect biogas production but also pose a threat to the environment. Therefore, predicting the effluent SCOD is essential for optimizing and controlling the AD process and reducing the operational cost and time of wastewater treatment.
The study used the data collected from a laboratory-scale two-stage anaerobic treatment reactor that was operated for one year. The reactor consisted of two cylindrical chambers, the first one acting as a septic tank and the second as an anaerobic filter. It was fed with domestic wastewater. The influent and effluent parameters, such as total alkalinity, total chemical oxygen demand (CODT), SCOD, total suspended solids (TSS), Kjeldahl nitrogen (TKN), ammoniacal nitrogen (NH3-N), and nitrate nitrogen (NO3-N), were measured and analyzed.
The researchers applied linear regression, DT, RF, and ANN to predict the effluent SCOD based on the influent parameters. They also employed data pre-processing and feature selection methods to improve the quality and dimensionality of the data. Moreover, they used correlation analysis and mutual information gain to select the most important features for the prediction. ADM1 was also utilized as a benchmark model to simulate the AD process to predict the effluent SCOD and compare it with the new model.
The authors evaluated the ML models and the ADM1 model using the coefficient of determination or R-squared score (R2) and the mean absolute percentage error (MAPE). The R2 value indicates how well the model fits the data. Its value closer to 1 indicates a better fit. The MAPE value represents the average deviation of the model predictions from the actual values. Its lower value indicates a higher accuracy.
Research Findings
The outcomes showed that the ANN model performed the best among all the employed ML techniques, achieving an R2 value of 0.959 and a MAPE value of 10.63%. The random forest model ranked second, with an R2 value of 0.955 and a MAPE value of 17.83%, while the decision tree model followed as the third best, with an R2 value of 0.951 and a MAPE value of 19.23%. The linear regression model exhibited the poorest performance, with an R2 value of 0.88 and a MAPE value of 35.87%.
The study also found that the ML models predicted better than the ADM1 model, which had an R2 value of 0.88 and a MAPE value of 36.12%. The authors demonstrated that the ML models were superior due to their data-driven nature, which enabled them to capture the complex and nonlinear relationships between the input and output variables without requiring detailed knowledge of the process kinetics. They also highlighted the advantages of the machine learning models over the ADM1 model in terms of simplicity, speed, and reliability.
Conclusion
In summary, the ML-based novel approach to predicting effluent SCOD of wastewater is effective and efficient. It can optimize and control the anaerobic digestion process by reducing the operational cost and time of wastewater treatment and offers various benefits such as high treatment efficiency, low energy consumption, low sludge production, and low land requirements. Moreover, this can provide a simple approach for forecasting the complex process of wastewater treatment plants.
The researcher suggested that the ML techniques can be extended and adapted to other types of anaerobic reactors and substrates, as well as other output parameters, such as biogas production and quality, nutrient recovery, and pathogen reduction. It can be integrated with online sensors and controllers to enable real-time monitoring and feedback of the anaerobic digestion process.