In a paper published in the journal Scientific Reports, researchers employed deep neural networks and machine learning to predict facial landmarks and pain scores using the Feline Grimace Scale (FGS). They annotated 3447 cat face images with 37 landmarks, training convolutional neural networks (CNNs) and computing 35 geometric descriptors.
The most successful CNN model attained a 16.76% Normalized Root Mean Squared Error (NRMSE) in landmark prediction. The key eXtreme Gradient Boosting (XGBoost) model demonstrated excellent accuracy and achieved a notably low Mean Squared Error (MSE) in predicting FGS scores. These models effectively discern between painful and non-painful cats, paving the way for an automated smartphone application for acute pain assessment in felines.
Background
Assessing pain in non-verbal individuals remains challenging, impacting biomedical research and veterinary care. Current methods relying on behavioral cues or pain scales often need more consistency and depth, prompting the need for automated approaches using technology like artificial intelligence. Facial expression changes, recognized as indicators of pain, have led to grimace scales, yet these manual assessments are labor-intensive and multifactorial. The absence of fully automated models capable of discerning pain states across diverse datasets underscores the necessity for advanced, unbiased models.
Automated Prediction of Feline Pain
This study, divided into Phases I and II, aimed to predict facial landmark positions and FGS scores using CNN models and geometric descriptors derived from facial images of domestic cats experiencing different degrees of naturally occurring pain. In Phase I, a dataset comprising 3447 facial images of cats from various sources, including research studies, a mobile phone application, and an open-access dataset, was utilized. These images captured cats with or without naturally occurring pain across different clinical trials and encompassed diverse characteristics like coat color, age, sex, and breed. Landmark annotations were conducted on these images, defining 37 facial landmarks based on the FGS's five action units (AU). Multiple raters confirmed the reliability of these landmarks through a trial involving semi-automatic annotation using specialized software.
CNN models were employed to predict the coordinates of these 37 facial landmarks. To enhance the dataset for deep learning, geometric and color-space transformations were applied randomly to the original images for augmentation. Researchers explored and adapted several CNN-based models—NASNetMobile, EfficientNetB0, MobileNetV2, MobileNetV3, and ShuffleNetV2—for landmark prediction. The models underwent structural adjustments and transformations to optimize performance. Researchers used evaluation metrics like model size, prediction time, and NRMSE to assess their suitability for smartphone application integration.
The NRMSE served as a measure of the average normalized Euclidean distance between predicted and ground truth landmarks. Phase II involved predicting FGS scores based on geometric descriptors calculated from the facial landmarks. Researchers used a subset of 1188 images from the research studies dataset and their corresponding FGS scores. From the 37 facial landmarks, researchers defined three categories of geometric descriptors—angles, ratios of distances, and ratios of areas—comprising 35 descriptors.
Researchers utilized XGBoost models to predict FGS scores, covering binary classification (‘painful’ or ‘non-painful’), regression (total FGS scores), and ordinal classification (AU scores). They applied various aggregation functions to condense multiple scores per image into a single representative score. Different combinations of geometric descriptors underwent evaluation, integrating feature selection algorithms and hyperparameter tuning to avoid overfitting.
Researchers evaluated the models' performances using accuracy, area under the receiver operating characteristic curve (AUROC), and other relevant metrics on separate test sets to determine the most effective models for automated prediction of facial landmarks and FGS scores in cats experiencing pain. Principal component analysis (PCA) visually represented the connections between geometric descriptors and pain categorization within the binary classification models.
Automated Feline Pain Assessment System
This study conducted two phases to create an automated system for assessing cat pain. Phase I focused on predicting facial landmarks using CNN models trained on a dataset of 3447 cat images exhibiting varying degrees of pain. Models like ShuffleNetV2, EfficientNetB0, and MobileNetV3 displayed the best predictive performance for landmark positions. Structural adjustments, like replacing the Global Average Pooling 2D (GAP2D) layer and symmetric parallel convolutional layer blocks, notably improved the models' accuracy.
Phase II involved predicting FGS scores using geometric descriptors calculated from facial landmarks. The system utilized XGBoost models for binary, regression, and ordinal classification, achieving high accuracy in differentiating painful and non-painful cats. This approach demonstrated promising results for automated pain assessment, providing accuracy and discriminatory ability while minimizing errors in predicting FGS scores.
The study aimed to create a comprehensive system integrating facial landmark prediction, geometric descriptor computation, and FGS score prediction. It successfully demonstrated the potential for smartphone application integration to assess feline pain accurately and efficiently, addressing a critical need in cat pain assessment. Despite its promise, image positioning and certain limitations might impact the system's performance, emphasizing the need for further development and refinement.
Conclusion
To sum up, this study revolutionizes cat pain assessment by leveraging advanced deep-learning techniques to predict facial landmarks and FGS scores. With a diverse dataset and sophisticated models like ShuffleNetV2, EfficientNetB0, and MobileNetV3, the system accurately determines facial landmarks in cats experiencing different pain levels.
Integrating geometric descriptors into XGBoost models enables precise differentiation between painful and non-painful cats, showcasing the potential for an automated smartphone application. This innovation addresses the longstanding challenge of non-verbal pain assessment in felines. Yet, certain limitations in image positioning and system refinement highlight the ongoing need for further development and optimization in this field.