In a recent publication in the journal Agriculture, researchers employed the Least Squares Support Vector Machine (LSSVM) algorithm, fine-tuned using the Water Cycle Algorithm (WCA), to forecast sugarcane yield in primary production regions in China.
Background
Sugarcane, a key contributor to over 90 percent of sugar production in China, extends beyond its role in food, ethanol, and electricity production. Traditional methods for sugarcane yield face challenges due to factors such as precipitation, temperature, and field management. The current study employs WCA-LSSVM, comparing its effectiveness in regional production potential simulation against other algorithms.
Data collection and model construction
Data: The dataset was collected from 2005 to 2019 in the four major sugarcane-producing provinces in southern China, namely Guangxi, Hainan, Guangdong, and Yunnan. There are 270 samples in the dataset, with 230 samples for training and the remaining for testing. Meteorological and soil data are recorded. Wind and sunshine data are extracted from the weather forecasting center and soil data from the Global High-Resolution Land Surface Simulation System (GLDAS).
Different machine learning algorithms were employed to construct prediction models.
Back-propagation neural networks (BPNN): The BPNN algorithm consists of an input layer, a hidden layer, and an output layer, all connected via thresholds and weights. The gradient descent approach is used to continuously modify these parameters to reduce the error between the expected value and the network's output.
Random Forest: The Random Forest algorithm excels in regression and classification tasks by integrating multiple decision trees through amalgamated learning techniques. In random forest regression, each decision tree serves as an elemental unit, selecting a dataset segment based on a defined threshold. Conceptually, Random Forest is a classifier integration algorithm rooted in decision trees, each relying on independently distributed random vectors.
LSSVM: LSSVM, a versatile machine learning method applicable to classification and regression, improves upon SVM by utilizing a least squares linear system as the loss function. It operates by mapping nonlinear problems to linear problems in a high-dimensional space.
WCA: The WCA algorithm draws inspiration from the natural water cycle process. Simulating the rainfall process, the algorithm categorizes individuals into three levels: ocean, river, and stream. The population undergoes iterations, updating river and ocean locations through a random process influenced by the ocean and streams. To address local optimization issues, WCA introduces evaporation and rainfall processes, enhancing searchability and preventing premature convergence.
Variance-Based Sensitivity Analysis: This algorithm decomposes the variance of a model into terms attributable to each input and its interaction effects. The first-order sensitivity index represents the contribution to the output variance of an input variable alone, while the total-effect index consolidates all variance caused by interactions of any order. The cumulative sum of total-effect indices highlights the collective impact of each variable on output variance, offering comprehensive insights into the model's dynamics and the relative influence of input variables.
Results and analysis
In the current study, researchers considered climate and soil factors as inputs to predict sugarcane yield. The unit yield of sugarcane serves as the output in constructing a predictive model. The model is applied to two instances, differing in sample selection for the test set. The first instance involves samples selected based on equal variances from the dataset, with a focus on verifying the model's accuracy over time differences. In contrast, the second instance selects samples from the last 40 entries of the dataset to assess the model's accuracy with regional differences. The study validates the model's efficacy within specific temporal and spatial ranges through diverse divisions.
To attain the algorithm with the utmost prediction accuracy and optimal generalization capability, three machine learning algorithms were compared. Evaluation metrics, including root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), R-square (R2), and symmetric MAPE (SMAPE), are utilized to compare algorithm performance. Results reveal that, in the first instance, LSSVM exhibits a higher degree of fitting, surpassing BPNN and random forest. In the second instance, LSSVM demonstrates superior performance, showcasing higher R2 values. Overall, LSSVM proves more accurate with enhanced generalization ability for sugarcane yield prediction.
The study further explores LSSVM models with different kernel functions, namely linear, polynomial, and Gaussian radial basis kernel (RBF) functions. Results indicate that the RBF kernel exhibits superior prediction performance. Following this, a WCA is applied to optimize the LSSVM prediction model. The WCA-LSSVM model outperforms both the non-optimized LSSVM model and the LSSVM model with particle swarm optimization in terms of accuracy and fitting degree.
The authors conclude with a sensitivity analysis, emphasizing the significant influence of temperature and rainfall on sugarcane yield at different growth stages. Additionally, soil moisture and evapotranspiration are crucial factors affecting yield variations. The WCA-LSSVM model, incorporating parameter optimization, emerges as a competitive and applicable approach for predicting sugarcane yield across various temporal and spatial contexts.
Conclusion
In summary, researchers proposed machine learning algorithms for sugarcane yield predictions. They selected the optimal LSSVM model, fine-tuned using WCA. Results demonstrated the proposed model's superior accuracy, achieving lower RMSE and MAPE in predicting sugarcane yield in Guangxi compared to other models. The model's sensitivity analysis highlighted temperature, precipitation, and soil moisture as crucial factors. Despite climatic dependency, this study suggests incorporating additional variables for a more comprehensive prediction of sugarcane yield, emphasizing the need for future research on climate change effects in Guangxi.