A recent article published in the journal Scientific Reports introduced a comprehensive approach for predicting rice production using artificial intelligence (AI) and multi-source data. The researchers analyzed climate, remote sensing, soil properties, and agricultural statistics to train and test several machine learning (ML), deep learning (DL), and hybrid models. They aimed to identify the best input combinations and key factors affecting rice production across China.
Background
Rice is one of the most important crops in the world, especially in Asia, where it feeds about 2.4 billion people. China, the largest rice producer globally, contributes 28% of the world's rice supply. Ensuring the security of China’s rice harvest is important for sustainable food production and national food security. However, rice production faces challenges from various factors, including climate, soil quality, irrigation, fertilization, pests, diseases, and farming practices. Therefore, accurate and timely predictions of rice production are essential for policymakers and farmers to make informed decisions and optimize resource allocation.
Traditional methods, such as crop simulation and statistical regression models often struggle with high data requirements, complex parameterization, low accuracy, and poor generalization. Recently, ML and DL emerged as promising alternatives. These AI methods can leverage large and diverse datasets, capture complex and non-linear relationships, and provide robust and reliable predictions for crop yields.
About the Research
In this paper, the authors focused on the main rice cultivation regions in China, which account for about 96% of the country's total rice area and 94% of its rice production. They used four types of data as input variables: climate, remote sensing, soil properties, and agricultural statistics. The climate data included monthly precipitation, temperature, relative humidity, sunshine duration, and evapotranspiration.
Remote sensing data comprised three vegetation indices and two biophysical parameters derived from satellite images. Soil properties data covered soil depth, organic matter, pH, cation exchange capacity, porosity, bulk density, nitrogen, phosphorus, potassium (NPK), and soil texture for topsoil and subsoil layers. The agricultural statistics data included annual rice production and sown area for 64 rice districts from 2000 to 2017.
The study evaluated six AI models, including four single models: random forest (RF), extreme gradient boosting (XGB), convolutional neural network (CNN), and long short-term memory (LSTM) and two hybrid models, which combined single models (RF-XGB and CNN-LSTM) to enhance performance. They tested eleven scenarios of input variable combinations, such as using only climate data, only soil data, only remote sensing data, only sown area data, and all variables together.
For model training and testing, 70% of the data was used for training, while the remaining 30% was used for testing. The models' accuracy was assessed using four performance metrics: root mean square error (RMSE), Nash-Sutcliffe efficiency (NSE), mean absolute error (MAE), and the coefficient of determination (R²).
Research Findings
The outcomes revealed that the hybrid models outperformed the single models in predicting rice production, regardless of the input variable combinations. Among the models tested, the RF-XGB hybrid model achieved the best performance, with the highest R² of 0.97 and NSE of 0.97, and the lowest RMSE of 14.9 × 104 tons and MAE of 5.85 × 104 tons. The second-best was the CNN-LSTM hybrid model, which produced results close to those of the RF-XGB. The worst model was LSTM, which achieved the lowest R² of 0.68 and NSE of 0.67, and the highest RMSE of 43.9 × 104 tons and MAE of 51.4 × 104 tons.
The study also highlighted the varying impacts of different input variable combinations on model performance. The most effective combinations were scenario 8 (using soil variables and sown area) and scenario 11 (using all variables), both of which produced similar results. In contrast, scenario 6 (using only remote sensing data) was the least effective, with the lowest R² of 0.36 and NSE of 0.34, and the highest RMSE of 68.7 × 104 tons and MAE of 54.9 × 104 tons. Interestingly, using the sown area alone proved relatively significant, achieving a high R² of 0.83 and NSE of 0.82, with a low RMSE of 35.6 × 104 tons and MAE of 28.6 × 104 tons.
Further analysis of the predictor variables across different regions of China revealed that soil properties were the most influential, particularly in the east and southeast, where they accounted for 87% and 53% of the total importance, respectively. The sown area was the second most crucial factor, especially in northeast China, where it comprised 90% of the total importance. Climate variables, on the other hand, were the least impactful, mainly in northeast and east China, contributing only 3% and 4% to the total importance, respectively.
Conclusion
In summary, all AI models proved effective for predicting rice production in China, with hybrid models surpassing individual models. The authors highlighted soil properties as the most crucial factors, followed by sown area and climate variables. They recommended enhancing soil management practices, such as increasing organic matter and clay content, to increase rice production in northeast and southeast China. They also suggested adapting agricultural practices to climate change, including adjusting sowing schedules and irrigation techniques, to mitigate adverse impacts on rice production in southeast China.
Journal reference:
- Mokhtar, A., He, H., Nabil, M. et al. Securing China’s rice harvest: unveiling dominant factors in production using multi-source data and hybrid machine learning models. Sci Rep 14, 14699 (2024). DOI: 10.1038/s41598-024-64269-0, https://www.nature.com/articles/s41598-024-64269-0