In a paper published in the journal Food Control, researchers investigated machine learning (ML) algorithms' efficacy in predicting Prunoideae fruits' quality attributes like peaches, apricots, and cherries. They employed extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), CatBoost, and random forest (RF) algorithms to develop models for soluble solids content (SSC) and titratable acidity (TA), integrating various hyperspectral denoising methods and feature extraction techniques.
Notably, the multi-layer-perceptron (MLP)- spectral data (SG)-XGBoost model excelled in predicting SSC for peaches and apricots. In contrast, the LGR-SG-LightGBM model demonstrated superior accuracy in forecasting TA for these fruits. These findings highlighted the potential of ML in enhancing fruit quality estimation and control practices.
Related Work
Previous research has underscored the importance of Prunoideae fruits like peaches, apricots, and cherries in the consumer fruit market, focusing on both their external and internal quality traits. Scientists have explored diverse approaches to enhance fruit quality, including modifying storage conditions for cherries and using hyperspectral imaging to identify freeze damage in peaches during transit.
Fruit quality assessment technologies have evolved to meet market needs, with ML algorithms, notably ensemble methods like RF and XGBoost, showing promise in non-destructive fruit testing. These algorithms have successfully predicted attributes such as sweetness and acidity in fruits like durian and grapes.
Sample Preparation Process
In preparing Prunoideae fruit samples, peaches, apricots, and cherries were carefully selected from a local orchard, while peaches and apricots were procured from a nearby fruit market. Each group consisted of 600 fruits devoid of defects or diseases, and to mitigate varietal differences, three varieties were included in each fruit type. Before experimentation, all samples were stored in a controlled laboratory environment for 24 hours to reach a standardized room temperature, minimizing temperature-related influences on predictive accuracy.
The team developed the hyperspectral characteristics curve for the Prunoideae fruits using a hyper-spectrometer with a wavelength range of 180 nm to 1100 nm. Spectral data were collected under stable laboratory lighting conditions at different fruit measurement points. However, dark background correction was performed using the equipment's control software to address noise inherent in the spectral data due to the hyper-spectrometer's background spectral intensity.
Spectral characteristic curves were obtained at various positions on the fruits to explore the impact of spectral measurement positions on model predictions' accuracy. Meridian lines were chosen at the base, tip, and maximum diameter for peaches and apricots, with multiple measurement points along each line. However, due to the smaller volume of cherries, spectral data were collected from fewer measurement points along the fruit's longitudinal axis.
Average spectral data for each fruit layer and opposite spectral data were obtained to represent the overall spectral characteristics. Spectral data preprocessing, including denoising using methods like multiscale convolutional denoising (MSC), was performed to improve data quality and enhance feature signals. Furthermore, feature extraction using MLP and logistic regression (LGR) was employed, followed by building analysis models using four ML algorithms: LightGBM, XGBoost, CatBoost, and RF.
Cross-validation using the K-fold method was employed to evaluate model performance, with the dataset divided into 10 subsets for training and testing. The analysts utilized evaluation metrics such as root mean squared error (RMSE) and R2 to measure prediction accuracy and the model's explanatory power for the data variance. These comprehensive methods ensure efficient and precise performance evaluation of the established prediction models.
Fruit Quality Prediction
The spectral description and feature extraction process reveal distinct variations in spectral characteristics across different layers of peaches, apricots, and cherries. These variations are attributed to differences in organic acid distribution and cell structure, impacting the accuracy of quality characteristic analysis.
Feature extraction techniques like MLP and LGR highlight significant wavelengths crucial for predictive modeling. Due to its surface scattering properties, MLP shows advantages in SSC prediction, particularly in cherry fruit. Furthermore, algorithms like XGBoost demonstrate superior predictive accuracy, likely due to their regularization mechanisms and robustness in handling complex datasets.
Moving on to the prediction results for fruit quality characteristics, such as SSC and TA, various ML algorithms exhibit differing performance levels across fruit types. For SSC prediction, models combining MLP, SG processing, and XGBoost show the highest accuracy, while for TA prediction, models utilizing LGR, SG processing, and LightGBM demonstrate optimal performance. These results underscore the importance of algorithm selection and data preprocessing techniques in enhancing prediction accuracy for fruit quality attributes.
The analysis of fruit color's impact on prediction accuracy indicates that while fruit pigments absorb spectral intensities, their effect on models is relatively minor compared to other variables. Moreover, assessing prediction models using diverse spectral data suggests that models utilizing opposite spectra data demonstrate superior accuracy, underlining the importance of factors like measurement origin and illumination conditions during fruit development. These findings underscore the intricate relationship between spectral characteristics, data preprocessing techniques, algorithm choice, and the accuracy of fruit quality predictions.
Conclusion
To sum up, this study successfully developed prediction models for SSC and TA in Prunoideae fruit using ensemble ML algorithms and hyperspectral data. The optimal models were identified by comparing various smoothing, feature extraction, and modeling techniques: MLP-SG-XGBoost for SSC and LGR-SG-LightGBM for TA.
Furthermore, altering the type of spectral data input improved model accuracy, with opposite spectra yielding the highest performance. However, it was noted that the measurement origin of opposite spectra can influence results, suggesting the need to consider factors like light conditions and fruit texture in future studies. Overall, this research advances the accurate prediction of fruit quality characteristics and provides a foundation for future hyperspectral-based fruit quality studies.