In this article published in the journal Nature, the authors aimed to improve water quality modeling in the Great Barrier Reef by matching ungauged catchments with gauged ones. They employed an explainable AI approach to identify catchment similarities and classify them based on their dissolved inorganic nitrogen (DIN) response categories.
Background
Water quality modeling is essential for understanding and managing the health of aquatic ecosystems. It involves predicting the concentrations of various substances, including DIN, which plays a crucial role in water quality assessment. Accurate modeling of DIN is vital for addressing environmental challenges and making informed decisions regarding land use and conservation efforts.
Modeling DIN in ungauged catchments, areas without prior data collection, presents a significant challenge. Traditional methods heavily rely on data from gauged catchments, but such approaches are less suitable for DIN modeling due to their complex interactions with both natural and human-induced factors. This problem is exacerbated by the limited availability of observed data in ungauged areas, hindering the development of accurate water quality models.
DIN concentrations are influenced by a wide range of factors, both biotic and abiotic, leading to spatial and temporal variability. Existing classification methods primarily based on physical similarities among catchments do not account for these complex biotic influences, resulting in limitations in the predictive capabilities of water quality models.
While process-based models have proven effective for modeling abiotic processes, their applicability to constituents like DIN, which are influenced by biotic factors, remains largely unexplored. Research on the spatiotemporal scales necessary for accurate DIN modeling is notably deficient.
The authors of the present study addressed the existing research gaps in water quality modeling, particularly for DIN, in ungauged catchments. They proposed an innovative approach that used spatial data, specifically original vegetation data, as proxies to categorize catchments based on their DIN responses. The integration of Artificial Neural Networks (ANN) and Explainable AI (XAI) facilitates the matching of ungauged catchments to gauged ones, considering the intricate interplay of biotic and abiotic factors affecting DIN.
Study Results
Catchment Matching using ANN-PR and XAI-SHAP
The results of catchment matching show that, except for the Mary Catchment, the ungauged portions of gauged catchments do not consistently classify together, and catchments do not necessarily classify with their nearest neighbors. The choice of spatial dataset used for matching led to different catchment matches. While Category 2 matched catchments generally clustered together spatially, Category 3 matched catchments had different distributions based on the dataset used. This indicates that different datasets reveal different spatial characteristics of the catchments.
Variable Feature Independence
The study revealed that each catchment had a unique combination and weighting of deviated features. The top 10% XAI-SHAP floristic structure variables could match the most similar gauged catchment based on the combination of deviated variables. It also identified catchments with unique combinations of deviated variables. Only variable combinations occurring in ungauged catchments, and not in the gauged ones, were identified.
ANN-WQ Simulator Performance
The performance of the ANN-WQ simulator was notably influenced by the combination of catchments included in the training datasets. Training using data grouped from multiple catchments generated satisfactory to very good performance for DIN simulation. However, simulations generated in the unsupervised environment for individual catchments showed flatline results. When datasets were discriminated by spatiotemporal regime, they achieved satisfactory to very good performance for most metrics.
Classifying Catchments: Variable Independence vs ANN-PR
While the ANN-PR approach matched all ungauged catchments to a gauged counterpart, the XAI-SHAP variable independence approach using relative variable distributions could not match 17 catchments. The ANN-PR matches using the Original Vegetation dataset most closely aligned with the XAI-SHAP landform and floristic structure dataset.
Verification of Catchment Classification for DIN Similarities
Both XAI-SHAP Variable Independence and ANN-PR techniques for catchment classification matched the pseudo-ungauged Herbert catchment to the gauged Mary catchment. Performance criteria clustered towards datasets discriminated to Mary and Category 1 flows only. Training data discriminated to the individually matched catchment (Mary) and discriminated to wet season flows achieved the best DIN simulations.
Discussion
This research focused on classifying ungauged catchments that flow into the Great Barrier Reef based on proxy data for drivers of DIN using an explainable AI approach called XAI-SHAP. The study demonstrates the importance of data for proxy drivers of DIN in classifying catchments and evaluating DIN simulation performances. Dataset complexity and consistency, training dataset arrangements, and prior knowledge of spatiotemporal similarities play crucial roles in the performance of the ANN-WQ simulator.
- Dataset Complexity and Consistency: The complexity and representative flow patterns in datasets greatly influence the performance of the ANN-WQ simulator. Flatline simulations result from inadequate complexity in the dataset, and the relationship between flow, spatial data, and DIN response is essential for simulations.
- Training Dataset Influence: Training data arrangements that group catchments using prior knowledge of spatiotemporal similarities or datasets that discriminate by flow regime improve DIN simulation performance. These arrangements remove heteroscedasticity in DIN patterns to flow.
- ANN-PR vs XAI-SHAP Classification: Catchments matched using ANN-PR are not always the same as those recommended to be matched by the XAI-SHAP deviation approach for variable independence. The XAI-SHAP approach provides insights into catchments grouped by known DIN-to-flow proxy drivers, showing that the most similar drivers of DIN are not necessarily the neighboring catchment.
- Practical Application: The study establishes Original Vegetation as a suitable proxy for DIN dynamics in water quality modeling. Out of 41 ungauged catchments, only 20 are suitable for data transfer with existing gauged catchments for water quality modeling. For the ungauged catchments that failed to match gauged ones, new monitoring and gauging sites are recommended to collect data representative of all DIN regimes.
Conclusion
To sum up, the authors successfully matched ungauged catchments that flow into the Great Barrier Reef with gauged catchments using ANN-PR and datasets related to Land Use and Original Vegetation. Additionally, the XAI-SHAP method was employed to explain similarities between catchments based on feature deviations and to group them into known spatiotemporal categories, which improved the performance of the ANN-WQ simulator.
However, it was found that not all catchments matched using ANN-PR shared deviated feature similarity with a spatiotemporal category, suggesting the need for further monitoring in those unmatched areas. Prior discrimination of data based on the spatiotemporal category of ungauged catchments significantly enhanced the ANN-WQ simulator's performance. These findings highlighted the value of XAI-SHAP in customizing catchment matching for water quality datasets, emphasizing the importance of knowledge derived from original vegetation data in this process.