In a paper published in the journal BioMedInformatics, researchers tackled the challenge of limited annotated medical data in eye movement artificial intelligence (AI) analysis by introducing a novel gaze data augmentation library based on physiological principles.
Unlike traditional methods that may distort medical datasets, this innovative approach emulated natural head movements during gaze data collection, enhancing sample diversity while preserving data authenticity. Evaluations across convolutional neural networks (CNN) and hybrid architectures using various datasets demonstrated the library's effectiveness in stabilizing training and improving model generalization.
Remarkably, their proposed augmentation technique, enhanced medical outdoor & laboratory eye-tracking data augmentation (EMULATE), yielded impressive results with a macro F1 score reaching higher accuracy. This study represents a significant advancement in leveraging domain-specific insights to enhance the reliability and fidelity of deep learning (DL) models in medical applications.
Background
Previous research in DL has categorized data augmentation into implicit distribution learning and explicit transformation modeling. Implicit methods involve learning and sampling from data distributions using sequence-to-sequence algorithms and generative models for eye movement and electroencephalography (EEG) analysis tasks.
Popular in computer vision, explicit methods include mixup, which transforms existing data to enhance model robustness. In eye movement analysis, hierarchical temporal convolutions for eye movement (HTCE) have been effective in clinical screening for learning disorders and extended to multi-label classification.
Physiological Gaze Data Augmentation
The methods introduce a novel approach utilizing physiologically-based transformation techniques to augment eye movement gaze data. Two distinct strategies are proposed: static and dynamic augmentation. The static method mimics head movements occurring before data acquisition without compromising head stability during the acquisition process, utilizing nine parameters.
In contrast, the dynamic approach focuses on emulating head movements during data acquisition, allowing for more extensive dataset augmentation with 15 parameters. This methodological choice is motivated by observations using the real-time, optimal, mobile, biometric identification (REMOBI) system, which permits unrestricted head movement during data collection and incorporates accelerometer measurements to capture subtle variations in head position and movement.
The algorithmic framework of the proposed model involves several steps. The process begins with converting spherical coordinates into Cartesian coordinates, then translating the eye position vectors relative to the head center and applying rotation transformations across the three axes.
Each eye position vector is then translated to its original coordinate system and converted back to spherical coordinates. These steps facilitate static and dynamic data augmentation strategies, enhancing dataset diversity while preserving physiological realism.
In experimental settings, the effectiveness of the proposed augmentation method, termed EMULATE, is evaluated across three variants of the HTCE. The study compares these variants trained with static, dynamic, and dynamic high augmentation setups against a baseline without augmentation and several non-physiological augmentation methods like dropout, cutmix, mixup, and cutout. The evaluation employs a stratified three-fold cross-validation approach to ensure robustness and fairness in comparing model performances across pathological conditions.
Model training and evaluation involve optimizing deep learning architectures on a single NVIDIA A100 80 GB graphics processing unit (GPU), with hyperparameters tailored to the specific requirements of the HTCE encoder and its variants. Training lasts 100 epochs using the AdamW optimizer, with additional focal loss optimization and class balancing techniques.
Evaluation metrics focus on precision, recall, sensitivity, and specificity, emphasizing macro F1 scores to assess screening performance across multiple classes of pathologies. Results are aggregated over the three folds to provide comprehensive insights into the efficacy of EMULATE in improving model generalization and diagnostic accuracy in medical applications.
Enhancing Gaze Analysis
This study thoroughly evaluates the effectiveness of various data augmentation methods, including EMULATE, in enhancing the performance of DL models for gaze data analysis. The results demonstrate significant improvements across both saccade and vergence visual tasks when employing EMULATE, particularly its dynamic variants.
EMULATE consistently outperformed non-physiological augmentation methods on the saccade dataset, showing improvements ranging from 0.9 to 5.3 points in global F1 scores across different models. Specifically, the dynamic variant of EMULATE proved most effective for HTCE-MEAN and HTC sequence encode (HTCSE), showcasing the highest global F1 scores of 71.6% and 71.5%, respectively, compared to other augmentation techniques.
Similarly, on the vergence dataset, EMULATE enhanced the overall performance of the models, albeit with varying degrees of improvement compared to baseline methods. Cutout and dropout methods showed notable enhancements in global F1 scores across all architectures, with gains ranging from 1.3 to 2.5 points. The dynamic variant of EMULATE again demonstrated its effectiveness, achieving the best global F1 score of 69.1% when training HTCSE, surpassing the performance gains of other non-physiological methods like mixup.
These findings underscore the utility of physiological-based augmentation techniques like EMULATE in improving DL models' generalization and diagnostic accuracy for gaze data analysis. By simulating realistic head movements and variations during data acquisition, EMULATE enhances model robustness across different gaze tasks and outperforms traditional augmentation methods, advancing the state-of-the-art in biometric identification and related applications.
Conclusion
In summary, the challenges posed by limited annotated medical data necessitated innovative solutions for effective data augmentation. Traditional mixing-based algorithms were deemed unsuitable due to their potential to introduce artifacts and alter pathological features.
EMULATE, a novel physiologically based head data augmentation library, was proposed to emulate natural head movements during data collection, enhancing sample diversity and authenticity. By incorporating physiological aspects, EMULATE efficiently generated transformed eye movement data and proved effective in regularizing training and improving the generalization of our hybrid architecture, surpassing CNN-based methods in eye movement classification.