In a study published in the journal Scientific Reports, researchers developed a hybrid deep neural network and multivariate water quality forecasting model for aquaculture ecosystems. This model, known as the Ensemble Empirical Mode Decomposition (EEMD)– Multivariate Linear Regression (MLR)– Long Short-Term Memory Neural Network (LSTM NN) model, demonstrated promising accuracy in predicting water quality parameters. It can potentially be a valuable tool for improving water quality management in aquaculture industries.
Background
Harmful Algal Blooms (HABs) represent a significant global concern, affecting bodies of water such as oceans, rivers, lakes, and ponds, particularly relevant to the aquaculture industry. Many countries have documented HABs, and their occurrence may escalate due to global warming and human impact on marine environments. The international community recognizes HABs as critical because they threaten human health, marine ecosystems, and local and regional economies.
Studies have explored the potential of precision aquaculture systems for early HAB detection, allowing aquafarmers and decision-makers to address the issue proactively. However, implementing precision aquaculture systems necessitates efficient decision-making based on continuous water quality parameter data. Traditional methods for measuring these parameters are often costly and labor-intensive, involving the collection of water samples, transportation to laboratories, and chemical analysis. These processes are time-consuming and prone to inefficiencies and errors, leading to delays in corrective actions. A more effective approach involves automatically monitoring and analyzing water quality parameters using artificial neural networks (ANNs).
Previous research has identified the need for addressing aquaculture water pollution through crucial automated analysis and timely prediction of water quality parameters. Various methods, including numerical modeling, multivariate statistics, geo-statistics, and artificial intelligence, have been explored for forecasting water quality changes. Deep learning, particularly LSTM NN, has shown promise in overcoming challenges related to data complexity. However, further research is needed to enhance existing models in precision aquaculture.
Proposed Method
The EEMD method is employed to enhance the analysis of time-series datasets in the proposed hybrid forecasting model. EEMD introduces Gaussian white noise to the dataset, separating different time-series scales and improving the efficiency of EMD. Introducing Gaussian white noise enhances the decomposition process, allowing for the extraction of essential components from water quality time series data.
EEMD iteratively decomposes the dataset into individual Intrinsic Mode Functions (IMFs) and a residue, addressing mode mixing issues associated with traditional EMD. By applying this method to the Chlorophyll-a (530 nm) and Turbidity time-series data, the hybrid model efficiently extracts relevant components for forecasting.
Deep Learning LSTM NN is critical in the proposed hybrid forecasting model. LSTM NNs are a Recurrent Neural Network (RNN) designed to handle time-series datasets, offering significant advantages in learning long-term dependencies. Unlike standard RNNs, which employ a basic structure, LSTM NNs feature purpose-built memory cells that facilitate the retention of information over extended periods. The feature of LSTM NNs having purpose-built memory cells enables them to excel in capturing complex temporal patterns in the data. The model uses a stacked LSTM architecture with multiple hidden layers comprising numerous memory cells to process and analyze the decomposed water quality components obtained through the EEMD method. This combination of EEMD and LSTM NNs forms the foundation of the hybrid forecasting model, allowing for accurate and reliable water quality parameter predictions.
The proposed hybrid forecasting model employs a systematic approach that first preprocesses the water quality series data, followed by EEMD decomposition into IMFs and a residual item. These components are then normalized and utilized for deep learning LSTM NN forecasting. Finally, the model integrates the individual forecasts through a summation operation, applying reverse normalization to obtain the final predicted values. The stacked LSTM architecture, with its multiple hidden layers and memory cells, enables the model to capture intricate patterns in the water quality data, resulting in improved prediction accuracy and reliability.
Experimental Analysis
Study Area and Dataset Acquisition: Loch Duart, a Scottish salmon aquafarm company in northwest Scotland, was the focus of this study. Researchers used an aquaculture dataset from 8 sea sites and 2 hatcheries. Chelsea Technologies Ltd collected Chlorophyll-a and Turbidity data using a TriLux multi-parameter sensor probe. This sensor, located at a sheltered site, featured solar-powered telemetry for remote data transmission. The dataset included 22,708 sets of time-series data collected between May and October 2020, monitoring Chlorophyll-a (470 nm), Chlorophyll-a (530 nm), and Turbidity.
Data Preprocessing and Correlation Analysis: The collected data underwent preprocessing, including handling missing data through linear interpolation. Researchers used Pearson's correlation coefficient to analyze the correlations between water quality parameters and Phytoplankton data. The results indicated positive correlations between Chlorophyll-a (470), Chlorophyll-a (530), Turbidity, and Phytoplankton data, highlighting the impact of these parameters on water quality.
Multivariate Linear Regression: Multivariate linear regression was employed to model the relationship between independent water quality parameters and a dependent parameter. The regression equation predicted the dependent parameter based on the independent variables. The regression coefficients were determined through the least squares method, enabling the prediction of future water quality parameters. This methodology enhances understanding and prediction of water quality in the aquaculture environment.
Evaluation Metrics and Model Accuracy: The study employed four performance evaluation metrics - Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Percentage Error (MAPE)- to assess the hybrid EEMD-LSTM water quality forecasting model's accuracy. Using hourly centered moving averages on Loch Duart Salmon aquaculture data, it decomposed them into stable IMFs and a residual item via EEMD, enhancing trend extraction and correlated signal analysis. The model showed high accuracy in forecasting algal bacteria presence, thanks to EEMD's stable IMF selection. Comparative analysis revealed the superiority of the hybrid EEMD-MLR-LSTM NN model over related models due to EEMD's feature-rich data decomposition.
Conclusion
To conclude, this study introduces a novel hybrid water quality forecasting model utilizing the EEMD method, MLR, and deep learning LSTM NN based on monitored TriLux multi-parameter sensor data from Loch Duart Salmon aquaculture farms. The model accurately forecasts future water conditions, particularly in the early detection of harmful green biomass (Algal blooms). This information is invaluable for effective aquaculture industry management. Future work will expand the model to include more water quality measurement sites.
Article Revisions
- Jul 11 2024 - Fixed broken journal link.