In an article recently published in the journal Agriculture, researchers performed a comprehensive comparison of multiple machine learning (ML) models to predict the most effective microbial strain to mitigate the effects of droughts on crop production.
Background
Drought conditions pose a significant risk to global food security as these conditions severely affect the yield and growth of crops. Moreover, the severity and frequency of droughts will increase substantially in the future due to climate change, which necessitates the identification of effective solutions to address drought impacts.
Beneficial microbial strains, specifically plant-growth-promoting rhizobacteria (PGPR), can be used to mitigate the adverse effects of droughts. PGPR colonizes the rhizosphere and promotes crop growth by improving the root system architecture to increase plant water uptake, producing plant growth hormones to stimulate plant development and growth, and solubilizing the soil nutrients to enhance nutrient availability.
Additionally, PGPR plays a critical role in preserving soil health by facilitating soil aggregate formation, which improves the soil water-holding capacity and structure, enabling plants to withstand drought conditions better. Specific PGPR strains can decrease the emission of nitrous oxide, a greenhouse gas, from the soil, which can mitigate climate change impacts and improves nitrogen utilization efficiency in plants.
Thus, selecting proper microbial strains is crucial to promote sustainable agriculture. However, selecting the appropriate strains for specific environmental conditions and crops is extremely challenging due to the complexity of plant-microbe-soil interactions and the extreme diversity of these strains.
Advanced ML algorithm-powered predictive models can be utilized for the proper selection of microbial strains. These models can evaluate large datasets of plant responses, environmental factors, and microbial traits to predict the strains that will be most effective under specific conditions.
Additionally, predictive models can handle the high complexity and dimensionality of the data to reveal the hidden relationships and patterns, which enables them to make precise predictions even with noisy or incomplete data, leading to significantly higher microbial strain selection effectiveness and efficiency.
Moreover, predictive models can facilitate the new beneficial microbial strain discovery and synthetic microbial community designs tailored to specific environments and crops. These models can also be used to gain crucial insights into underlying plant-microbe-soil interaction mechanisms. Thus, farmers and researchers can reliably utilize these models to make more informed decisions on the use of microbial strains.
Comparative analysis of ML models
In this study, researchers performed a comparative analysis of several ML models, including naïve Bayes (NB), generalized linear model (GLM), logistic regression (LR), fast large margin (FLM), deep learning (DL), decision tree (DT), random forest (RF), and gradient boosted trees (GBTs), to predict the optimal microbial strains to mitigate drought impacts on crops.
Models were assessed on several metrics, including standard deviation of results, accuracy, gains, training time per 1000 rows of data, and total computation time. The data utilized in the study were obtained from two previous studies on using PGPR to decrease greenhouse gases in strawberry cultivation under various soil moisture conditions and the rhizosphere bacteria effects on strawberry plants under water deficit conditions.
The data contained information on different microbial strains, their traits that promote plant growth, and the strawberry plant response to these strains under various soil moisture conditions. Researchers used a dataset of 1500 data points, with 70% of these data points being used for training and 30% for validation and testing of the ML models compared in this study.
Significance of the study
The ML analysis identified multiple microbial strains, including Pantoea strains DKB68 DKB64, and DKB63, Pseudomonas strain PJ 1.1, and Azotobacter strain AJ 1.2, that could effectively mitigate the drought impacts on strawberry plants. These results were consistent with the outcomes of the previous studies.
GBTs displayed the highest accuracy in predicting the beneficial microbial strain, followed by the DL model. However, the GLM was most efficient based on total computational time among all compared ML models, which indicated a significant trade-off between computational efficiency and accuracy.
Most ML models showed a 3–8% standard deviation in their results, with GBTs displaying a standard deviation of only 4%, which indicated high consistency of the model results. NB model was the quickest based on the total training time for 1000 rows with a time of 2014.9 units compared to 22,101.2 units in GBTs, indicating that GBTs require substantially more training time. GBTs demonstrated the highest gains at 68.0, followed by the LR model at 46.0. This result indicated the ability of GBTs to maximize the true positive rate.
Overall, the comparative analysis displayed that the GBTs model is the most effective model for beneficial microbial prediction due to its high gains and accuracy. However, the need for significant computational resources and training time is the major disadvantage of using this model for real-world applications with time and computational resource constraints.
Models such as GLM, DT, and RF showed a balance between both computational efficiency and accuracy, which could make them suitable for practical applications. Similarly, GLM/NB, DL, and FLM models can be used for applications that require quick results with moderate accuracy, need nonlinearity and deep insights, and have limited resources or data, respectively. Thus, available computational resources, ML model performance metrics, and specific application requirements must be considered while selecting an ML model for microbial strain prediction.
Journal reference:
- Miller, T., Mikiciuk, G., Kisiel, A., Mikiciuk, M., Paliwoda, D., Krzemińska, A., Kozioł, A., Brysiewicz, A. (2023). Machine Learning Approaches for Forecasting the Best Microbial Strains to Alleviate Drought Impact in Agriculture. Agriculture, 13(8), 1622. https://doi.org/10.3390/agriculture13081622, https://www.mdpi.com/2077-0472/13/8/1622