In an article published in the journal AISeL, researchers proposed a framework leveraging the Internet of Things (IoT) and wearable technology to enhance the adaptability of Augmented Reality (AR) glasses. A multi-modal data processing system was developed to classify performance in the aviation industry, aiming to provide customized and adaptive information through AR glasses.
Background
AR glasses are integral in various industries, yet their one-shoe-fits-all approach limits customization. Current AR glasses lack adaptability to diverse worker needs and environmental conditions. The present study addresses these gaps by integrating IoT and wearable technology to tailor AR information provision. It focused on a unified system for multi-modal data analysis, utilizing IoT and wearables, offering insights into worker and environmental dynamics.
Notably, the research aimed to automate performance assessment, which is critical for tailoring information provision. In the aviation industry, where AR glasses assist in pre-flight inspections, existing protocols are labor-intensive and prone to errors. While some firms introduced AR glasses, their adaptability remains limited. The proposed framework leveraged IoT sensors in aircraft and the environment to refine AR information based on multi-modal data streams, promoting a more dynamic and adaptive approach.
Researchers of the present study collaborated with industry partners, applying Information System Design Theory (ISDT) for a multi-modal data processing system in aviation. Analytical techniques used in this study included Lasso for feature selection requirements, Long Short-Term Memory (LSTM) for video sequence extraction purposes, and Multi Kernel-Support Vector Machines (MK-SVM) for classification.
Kernel Theory-Based Design
Researchers employed Design Science Research, particularly ISDT, to develop a kernel theory-based framework for a wearable device-enhanced multi-modal data processing system. The research focused on performance classification in the aviation industry, addressing the need for more efficient and adaptable methods.
The ISDT framework comprises kernel theories, meta-design, meta-requirements, and testable hypotheses. Kernel theories, including Attention Allocation from Multiple Resource Theory (MRT) and Process Standardization from Total Quality Management (TQM), guide system design. Meta-requirements dictate support for computable dimensions representing attention allocation as well as process standardization.
The meta-design constructed computable features to represent these dimensions, resulting in an IT system capable of extracting features, predicting performance, and training/testing a model. The testable hypothesis posited that an IT system combining multi-modal data will outperform single-modal data for performance classification.
The kernel theories draw on MRT, explaining limited attentional capacities and the need for effective focus, and TQM, emphasizing consistent, repeatable processes to enhance efficiency and quality. Meta-requirements involve attention allocation and process standardization dimensions, gauged through eye-tracking metrics and video sequence analysis. The proposed metrics include fixation and saccade data for attention allocation and first-person point-of-view video features for process standardization.
The research aligned with Type VIII design science, creating artifacts applying specific theories and contributing to Information Systems (IS) theory by integrating IoT, wearables, and AR glasses in aviation. The study's contributions included interdisciplinary exchange, a theoretical framework, insights into multi-modal information processing, and a holistic model for classifying performance. The testable hypothesis sets the foundation for evaluating the proposed system's capabilities in enhancing performance performance classification.
Design Instantiation and Evaluation
The proposed performance classification system comprised three stages: collection of data, extraction of features, and classifier construction. Wearable devices captured two-dimensional data during inspections, including eye-tracking and demographic information. Lasso regression was employed for feature selection, retaining 31 non-zero coefficient features that influence performance. These included the number of saccades, maximum fixation duration, pupil diameter variance, and others.
For first-person video data, the system used a Vision Transformer (ViT) and a four-layer LSTM model for feature extraction, optimizing data utility. The LSTM model, with a sliding window technique, analyzed 22-frame clips, achieving optimal performance with a four-layer structure and Adam optimizer. The study collected data from China Southern Airlines, resulting in a highly structured dataset with 167 inspection records.
Evaluation settings focused on landing gear, aircraft wings, and wheel wells. The proposed dataset provided valuable multi-modal data for performance analysis. Feature construction involved attention allocation features from eye-tracking data, emphasizing the importance of saccades, fixation duration, and pupil diameter variance. Process standardization features from first-person video data were obtained using a segmented approach and a four-layer LSTM model, demonstrating optimal performance with an Adam optimizer. The study emphasized the significance of optimization algorithms in efficacious model training.
Results
In the evaluation of the performance classification model, wearable device-collected features were analyzed to identify instances of poor performance for safety assurance. Inspections scoring below the average (19) were labeled as worse performance, while those equal to or above average were considered better performance. A flexible multi-kernel learning approach, specifically the MK-SVM with multi-modal features, outperformed single-modal models, confirming the hypothesis that multi-modal fusion enhances performance.
The researchers conducted a robustness check by altering the classification cutoff mark to 18, 19, and 20, revealing that the model's performance decreased as the threshold increased, demonstrating the ability to identify outlier data with abysmal performance. Spectral clustering was employed for an extension analysis, uncovering a dominant cluster with samples adhering to inspection manual specifications.
Noteworthy observations included lower video prediction errors in the dominant cluster, longer inspection durations, higher education and experience levels among workers. Workers with longer experience exhibited shorter durations of maximum eye fixation, indicative of cognitive drift or "mind-wandering." The study emphasized the importance of incorporating individual characteristics for tailoring augmented reality (AR) glasses' information provision, providing a foundation for optimizing performance through subsequent research.
Conclusion
In conclusion, the proposed multi-modal framework for classifying performance using wearable device data offered a significant advancement in industries like aviation, manufacturing, and healthcare. Leveraging machine learning, this system ensured a more objective and consistent evaluation than single-modal approaches. The ultimate aim is to integrate this framework into IoT and wearable technology, specifically AR eye-tracking devices, for real-time performance evaluation. The goal is to enhance safety in the aviation industry by providing timely alerts to prevent errors or accidents.