In a paper published in the journal Scientific Reports, researchers conducted a comparative study on the unconfined compressive strength (UCS) of cohesive soil reconstituted with cement and lime. They used multiple ensemble-based machine learning (ML) classification and symbolic regression techniques to estimate UCS and found that it is most influenced by maximum dry density, consistency limits, and cement content, with minimal impact from optimum moisture content.
Background
Past work by various researchers has utilized ML techniques like evolutionary polynomial regression, gene expression programming, and artificial neural networks to predict the UCS of stabilized soils. Studies have shown the significant influence of factors such as cement and lime content, curing duration, and moisture content on UCS.
However, accurately predicting UCS remains challenging due to the diverse soil properties and the intricate interplay of various influencing factors that must be carefully navigated. Additionally, the need for extensive datasets and computational resources for robust ML models can be a significant hurdle.
Predicting Stabilized Soil UCS
The study collected 190 records from the literature on UCS test results for cohesive soil stabilized with cement and lime. Each record includes data on cement and lime weight ratios, liquid limit, plasticity index, optimum moisture content, maximum dry density, and UCS. Data preprocessing was conducted through data cleaning and dimensionality reduction to streamline the models.
The dataset was divided into training (140 records) and validation (50 records) sets, with statistical characteristics and Pearson correlation matrix provided to ensure comparability and parameter independence. Histograms and correlation plots were used to visualize data distribution and relationships between inputs and outputs.
Eight ML classification techniques—gradient boosting (GB), Clark and Niblett 2 (CN2), naïve Bayes (NB), support vector machine (SVM), stochastic gradient descent (SGD), k-nearest neighbor (K-NN), decision tree (Tree), random forest (RF), artificial neural network (ANN), and response surface methodology (RSM)—were used to predict UCS. The analysts developed these models using orange data mining software. The process involved reading and splitting the database, training the models, and evaluating training and validation sets.
The study thoroughly assessed each model's performance by systematically comparing predicted outcomes against experimental data. Comprehensive frameworks and detailed descriptions of each technique were presented, elucidating their underlying principles, strengths, and constraints. Emphasis was placed on the critical role of meticulous model selection and fine-tuning processes in attaining precise and reliable predictions.
K-NN is a supervised learning algorithm used for classification and regression that predicts based on the similarity of training instances to new ones. It involves selecting the value of K, calculating distances, selecting the NN, using majority voting for classification, handling both categorical and numerical features, and capturing complex decision boundaries. DT, another popular ML algorithm, recursively partitions the feature space to make predictions, is easy to interpret, handles various feature types, and captures nonlinear relationships. Despite K-NN being computationally intensive and DT prone to overfitting, both are widely used in domains like recommendation systems, image recognition, finance, and healthcare.
ML Model Evaluation
The study systematically evaluated several ML models to predict the UCS of cohesive soils stabilized with cement and lime. Each model was meticulously configured and assessed based on its ability to accurately forecast UCS values, which is crucial for applications in construction and soil stabilization projects. GB emerged as a standout performer, employing 100 trees with a learning rate 0.1. It achieved impressively low errors of 6% for training data and 5% for validation, along with high R2 values of 0.98 and 0.99, respectively.
In contrast, the CN2 model, designed with Laplace accuracy and generating 93 detailed rules, exhibited marginally higher errors of 11% for training and 5% for validation. Despite these deviations, CN2 maintained robust R2 values of 0.94 and 0.99, as illustrated in Figures 15 and 16, highlighting its interpretability and ability to derive meaningful rules from the data. However, the NB model, known for its scalability but not for precision in this context, showed significant errors of 87% and 86% for training and validation, respectively, accompanied by lower R2 values of 0.17 and 0.41.
The SVM model, leveraging a quadrilateral polynomial kernel and specific parameters, achieved comparatively lower errors of 12% for training and 6% for validation, along with respectable R2 values of 0.94 and 0.97. Overall, the evaluation highlighted each model's diverse strengths and weaknesses in predicting UCS for cement-lime stabilized cohesive soils, emphasizing the importance of selecting models based on accuracy and suitability for engineering applications.
Conclusion
To sum up, this research compared eight ensemble-based ML classification techniques with ANN and RSM to estimate the UCS of cement-lime stabilized cohesive soil. GB and K-NN showed the highest accuracy (95%), while NB had the lowest (13%). ANN and RSM matched the accuracy of SVM and Tree models. Sensitivity analysis indicated UCS was most influenced by MDD, followed by consistency limits and cement content. The models were complex and valid only within the studied parameter range.
Journal reference:
- Onyelowe, K. C., (2024). Estimating soil strength stabilized with cement and lime at optimal compaction using ensemble-based multiple machine learning. Scientific Reports, 14:1, 15308. DOI:10.1038/s41598-024-66295-4, https://www.nature.com/articles/s41598-024-66295-4