In an article recently published in the journal Scientific Reports, researchers compared and ranked multiple machine learning (ML) algorithms to investigate their effectiveness for weight prediction in sheep.
Background
Artificial intelligence (AI) is ushering significant transformation in different sectors by providing novel solutions to complex analytical problems. In the animal husbandry sector, these solutions can identify several crucial farm management aspects for improving productivity and reducing mortality.
ML techniques can handle data efficiently and reveal hidden patterns as the modeling tolerance of these methods is significantly higher compared to statistical methodologies, which can be attributed to the lack of requirement for hypothesis testing or assumptions in ML. Additionally, ML techniques can effectively handle imprecise, noisy, and non-linear data and are more flexible compared to conventional statistical models/data analysis, as conditions do not limit these techniques.
Studies demonstrated that several AI techniques, including K-nearest neighbors (KNN), support vector machines (SVM), and artificial neural networks (ANNs), can be used to address different problems in animal sciences more effectively than conventional approaches due to the use of substantial amounts of data.
Moreover, studies have also been performed to compare different ML techniques in animal sciences for predicting immunity, body weights, genetic merits, lactation, hatchability, performance, disease, transcriptomics, genetic selection, and ribonucleic acid (RNA) sequencing gene expression.
Accurate future performance prediction is crucial to make important decisions to improve both production and income from animal husbandry. However, an adequate number of studies have not been performed to compare the effectiveness of popular supervised learning algorithms for animal husbandry applications until now.
Comparison of AI Algorithms for Sheep Weight Prediction
In this study, researchers compared the most popular ML algorithms and ranked them based on their ability to make accurate predictions on sheep farm data. Researchers fine-tuned the assessed models to ensure the future development of deployable ML models for sheep weight prediction. Corriedale breed data for 11 years was obtained from an organized sheep farm to predict body weight.
Over 37,200 data points were available for the study, with initial raw data including animal numbers, litter size, birth coat, parent record, weaning date, time of birth, monthly morphometric measurements up to weaning, treatment and disposal records, daily humidity and temperature, and body weights, including monthly body weight up to 12th month, fortnightly weights up to 6th fortnight, and weekly body weights up to 4th week.
Data imputation was iteratively performed using Bayesian ridge regression (BRR). Winsorization was performed for outlier removal, and the data were encoded appropriately, standardized, and then split into testing and training datasets.
The optimal train test split was determined heuristically, with testing data equivalent to 10% and training data equivalent to 90% of the standardized dataset. A part of the training data was again used for validation, with the validation data proportioned to 10% of the training data.
Feature selection (FS) and principal component analysis (PCA) were used to perform dimensionality reduction (DR) to reduce the number of input variables and select those variables that were mostly contributing to the variance. FS was performed for both original datasets and after extracting features from PCA. The input variables used across all ML models in this study were kept constant to eliminate the bias during the training process caused by an uneven number of input variables/features.
Subsequently, three datasets were created, including the PCA dataset, in which the PCA technique was utilized for DR, the FS dataset, where the F-test estimate of linear dependency degree between two numerical variables was utilized for DR, and the PCA+FS dataset, where both techniques were utilized to achieve DR.
Twelve reusable and deployable models, including BRR, ANN, SVM, random forests (RF), classification and regression trees algorithm (CART), gradient boosting, extreme gradient boosting (XGBoost), polynomial regression, KNN, multivariate adaptive regression splines (MARS), ridge regression, and genetic algorithms (GA), were developed to predict the sheep body weights at 12 months of age.
Pure morphometric measurements were performed to predict sheep body weight using ANNs, which constituted the DM dataset used for weaning weight prediction. Weight parameter prediction was performed using body measurements and earlier body weights as inputs to ANNs.
Random search and search grid algorithms were used, followed by heuristic tuning for hyperparameter optimization. Four scoring criteria, including correlation coefficient, coefficient of determination, mean absolute error (MAE) and mean squared error (MSE), were used to evaluate the AI algorithms.
Significance of the Study
All models demonstrated high prediction ability, with tree-based algorithms displaying better performance in regression-based tasks. The correlations between the true and predicted values for GA, KNN, polynomial regression, CART, ANNs, XGBoost algorithm, RF, gradient boosting algorithm, SVM, ridge regression, BRR, and MARS algorithm were 0.734, 0.949, 0.957, 0.984, 0.984, 0.99, 0.99, 0.991, 0.991, 0.991, 0.992, and 0.993, respectively, for bodyweights.
The top five algorithms showing the best performance for sheep body weight prediction were MARS, BRR, ridge regression, SVM, and gradient boosting algorithms. To summarize, the customization and deployment of these algorithms can effectively assist in making informed decisions beneficial for animal production, leading to greater economic prosperity and food security.
Journal reference:
- Hamadani, A., Ganai, N. A. (2023). Artificial intelligence algorithm comparison and ranking for weight prediction in sheep. Scientific Reports, 13(1), 1-13. https://doi.org/10.1038/s41598-023-40528-4, https://www.nature.com/articles/s41598-023-40528-4