In a paper published in the journal Water Research, researchers tackled the complex interaction between microplastics (MPs) and organic pollutants (OPs) in freshwater systems by introducing a novel machine learning (ML) approach to analyze 475 sorption data points from various sources. The approach incorporated a hybrid model, combining genetic algorithms and support vector machines, which exhibited impressive predictive power with a coefficient of determination (R2) of 0.93 and an error of 0.07.
Notably, this method identified key factors such as the chemical properties of MPs, excess molar refraction, and the hydrogen bonding of OPs as dominant influences on sorption mechanisms. This innovative study opens doors to a better understanding of the intricate process of OP sorption onto MPs.
Background
MPs have raised significant concerns recently due to their potential environmental risks and toxic effects. Their widespread distribution in freshwater systems, such as surface water, groundwater, and drinking water, has garnered substantial attention from both academia and the public. MPs of different sizes and shapes pose a threat to various organisms. One of the key concerns is their interaction with OPs, which can amplify their ecological impact. Understanding this interaction is crucial for assessing the potential risks associated with MPs.
Numerous studies have examined MP and OP interactions, highlighting the key role of sorption. Yet, traditional methods and models fall short of capturing the complexities. Quantitative structure-property relationship (QSPR) models like poly-parameter linear free-energy relationships (ppLFER) offer promise, though creating individual models for each MP is time-intensive.
ML methods, such as random forests (RF) and support vector machines (SVM), balance complexity but can lack interpretability. Integrating QSPR and ML could bridge this gap, enhancing the understanding of MPs' diverse interactions. Prior attempts with ppLFER and ML have constraints, underscoring the need for a comprehensive strategy that draws from diverse data sources. This approach is crucial for grasping the intricate dynamics of MP-OP interactions and revealing underlying fundamental mechanisms.
Proposed method
The paper evaluates various modeling approaches, including traditional methods, ML, and DL, to address the challenge of understanding complex interactions between MPs and OPs. Several ML models, such as SVM, Gaussian process regression (GPR), extreme gradient boosting (XGB), and a genetic algorithm-support vector machine (GA-SVM) hybrid, are highlighted for their potential in environmental studies. The study focuses on evaluating model performance using metrics like R2, root mean square error (RMSE), mean absolute percentage error (MAPE), and mean square error (MSE).
To ensure robustness and reliability, the selected models undergo tests like five-fold cross-validation, Gaussian noise introduction, and evaluating model performance on data with removed features. Parameter sensitivity analysis is also conducted using methods like forward stepwise (FSW), Shapley method, and global sensitivity analysis (GSA) to determine the importance of input features. The paper emphasizes the importance of comprehensively assessing model performance and interpretability in tackling the intricate dynamics of MP-OP interactions.
Experimental results
Data Parameter Range and Analysis
The results show that four key features—specific surface area (SSA), carbon content (C%), hydrogen-to-carbon ratio (H/C), and oxygen-to-carbon ratio (O/C)—influence MP-OP interactions. The logarithmic distribution coefficient (Kd) prevents data leakage in ML models. C% reflects homogeneity, stronger in carbon-rich MPs. H/C signifies aromaticity and is robust MP-OP interaction, while O/C indicates polarity and sorption mechanisms. SSA defines porosity, and solute descriptors (E, S, A, B, V) characterize OPs' chemical properties. E is normally distributed, while S, A, and V cluster due to polarity and volume. Ensuring feature independence is vital. Strong correlations led to representative SSA selection. MP composition and solute descriptors exhibit independence, with A and B moderately correlated.
Creation and Assessment of ML Models
Models including traditional, ML, hybrid, and DL were tested for predicting OPs sorption on MPs. Traditional and DL models fell short, while ML models like RF, GPR, XGB, and GA-SVM excelled. GA-SVM showed the best accuracy (R2=0.93) and stability. GPR, XGB, and hybrid GA-SVM improved prediction and stability over SVM. The hybrid GA-SVM stood out by automating parameter optimization and enhancing generalization, providing a strong solution for studying MPs-OPs interactions.
Analyzing Feature Importance in Model Predictions
Feature importance analysis was performed to understand the impact of input features on the predicted output (Kd). GPR, XGB, and GA-SVM models, known for their predictive capabilities, were examined. SVM results were inconclusive due to improper hyperparameters. XGB and GA-SVM exhibit similar trends, while GPR differs. MAS and FSW methods offer consistent results, unlike Si of GSA. Given GA-SVM's strong performance and the reliability of MAS and FSW, their results were chosen for interpretation.
Conclusion
Effective utilization of ML for environmental challenges requires careful model selection and integration. This study created a comprehensive predictive framework for understanding OPs sorption onto MPs by combining a hybrid ML approach and the ppLFER model. This outperformed traditional, single ML, and DL models in data adaptation and prediction accuracy. Sensitivity analysis delved into optimal parameter choices, revealing mechanistic insights into the sorption process. This novel sorbate-sorbent-based ML model enhances our understanding of complex interactions between organic compounds and microplastics.