Enhancing Smartphone Gaze Tracking through Machine Learning

In an article recently submitted to the ArXiv* server, researchers emphasized the significance of eye tracking across diverse domains such as vision research and usability assessment by presenting an open-source gaze-tracking solution for smartphones with a primary goal of achieving accuracy without the need for extra hardware. By harnessing machine learning techniques, this approach successfully achieved precise eye tracking on smartphones, achieving levels of accuracy comparable to high-priced mobile trackers.

Study: Enhancing Smartphone Gaze Tracking through Machine Learning. Image credit: MaximP/Shutterstock
Study: Enhancing Smartphone Gaze Tracking through Machine Learning. Image credit: MaximP/Shutterstock

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

Using the Massachusetts Institute of Technology (MIT) GazeCapture dataset, the method was able to replicate crucial findings related to ocular behavior and saliency analysis. The adoption of smartphone-based gaze tracking holds significant potential, particularly in addressing reading comprehension challenges and facilitating expanded research participation. This scalability not only contributes to advancements in vision research but also extends its benefits to areas like accessibility enhancement and healthcare applications.

Background

In recent years, eye tracking has gained prominence across vision research, linguistics, and usability assessment. Yet, much of the focus has been on desktop displays with specialized and costly hardware, limiting accessibility and scalability. Meanwhile, smartphones have transformed human-computer interaction, prompting a need for understanding eye movement on these mobile devices. Despite extensive smartphone use, a gap exists in studying ocular motion patterns.

Proposed method

The gaze tracking model employs a multilayer feed-forward convolutional neural network (ConvNet) with distinctive components for precise gaze prediction. The process commences by extracting essential facial features from input images via MobileNets-based face detection. The base model is trained on the MIT GazeCapture dataset to effectively predict gaze locations. Eye regions are processed through individual ConvNet towers, scaled to 128x128x3 pixels, with convolutional layers of decreasing kernel sizes. Rectified Linear Units (ReLUs) introduce nonlinearity, and horizontal flipping ensures symmetry in learning. ConvNet outputs merge with fully connected layers handling eye landmarks.

The final regression head predicts x and y screen coordinates. Fine-tuning using calibration data and a lightweight support vector regression (SVR) model further refines gaze predictions. The combined approach results in accurate gaze tracking, adhering to Google's methodology, and implementation includes meticulous data preparation, post-training quantization, adaptive learning rate strategies, distinctive loss functions, and evaluation metrics. SVR personalization enhances gaze tracking accuracy, particularly in scenarios with varied gaze positions, underlining the significance of tailored adjustments based on dataset characteristics.

Experimental Results

The PyTorch-trained models display promising predictive capabilities, showing results slightly differing from Google's figures yet remaining valid throughout the experimental context. Using the second-to-last layer's output from the PyTorch model, SVR personalization aims to enhance gaze tracking precision. The impact of SVR is not consistently positive but yields promising outcomes. Notably, SVR benefits from a larger dataset in the 70/30 split scenario, resulting in significant improvement. Visual representations demonstrate SVR's influence on predictions and its nuanced outcomes. Overall, the approach emphasizes the potential for personalized gaze tracking enhancement using SVR.

In the pursuit of refining gaze tracking accuracy, incorporating affine transformations emerges as a promising avenue. By harnessing insights from network-generated forecasts, an exploration unfolded into the potential benefits of applying affine transforms to improve accuracy. Within this framework, transformations involving shifts, scales, and rotations were explored. This approach demonstrated a significant reduction in foundational model error, showcasing its potential for enhancement even though its impact might be less pronounced than SVR training.

In parallel, enhancements to the PyTorch model were introduced, including adjustments to Batch Normalization's epsilon value and learning rate scheduling parameters. These modifications were strategically designed to enhance the model's training dynamics and augment overall performance and convergence speed. Visual comparisons between the current and previous implementations underscored subtle differences in output clustering, highlighting the intricate balance between precision and generalization. The implementation was also extended to TensorFlow, achieving results comparable to previous PyTorch models.

Furthermore, the evaluation was conducted using the MIT and Google split datasets, with different train-test splits and variations. SVR results demonstrated substantial improvements in model performance, particularly evident in both MIT and Google split scenarios. An individualized approach using SVR, incorporating diverse train-test setups, showcased variations in error reductions among different users. However, challenges stemming from limited data availability were evident, suggesting ongoing efforts for refining performance under varying conditions. This comprehensive analysis laid the foundation for enhancing gaze-tracking accuracy through a multifaceted approach involving both model enhancements and personalized techniques.

Conclusion and Future Work

In summary, the study presents a gaze-tracking solution for smartphones and discusses its precise interaction with the Google model binary, understanding SVR patterns across different model versions, and enhancing the model's performance through training with Google's normalization function. Rigorous testing using proprietary app data allowed for a thorough comparison with Google's binary model outputs. Comparative analysis with alternatives like iTracker was conducted, exploring avenues for efficacy improvement through network expansion. Instances of data leakage were identified during SVR fitting on the Google split version, highlighting a concern to be addressed in future work. This comprehensive approach thoroughly evaluated the model's performance and potential enhancements.

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

Journal reference:
Silpaja Chandrasekar

Written by

Silpaja Chandrasekar

Dr. Silpaja Chandrasekar has a Ph.D. in Computer Science from Anna University, Chennai. Her research expertise lies in analyzing traffic parameters under challenging environmental conditions. Additionally, she has gained valuable exposure to diverse research areas, such as detection, tracking, classification, medical image analysis, cancer cell detection, chemistry, and Hamiltonian walks.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Chandrasekar, Silpaja. (2023, August 30). Enhancing Smartphone Gaze Tracking through Machine Learning. AZoAi. Retrieved on September 18, 2024 from https://www.azoai.com/news/20230830/Enhancing-Smartphone-Gaze-Tracking-through-Machine-Learning.aspx.

  • MLA

    Chandrasekar, Silpaja. "Enhancing Smartphone Gaze Tracking through Machine Learning". AZoAi. 18 September 2024. <https://www.azoai.com/news/20230830/Enhancing-Smartphone-Gaze-Tracking-through-Machine-Learning.aspx>.

  • Chicago

    Chandrasekar, Silpaja. "Enhancing Smartphone Gaze Tracking through Machine Learning". AZoAi. https://www.azoai.com/news/20230830/Enhancing-Smartphone-Gaze-Tracking-through-Machine-Learning.aspx. (accessed September 18, 2024).

  • Harvard

    Chandrasekar, Silpaja. 2023. Enhancing Smartphone Gaze Tracking through Machine Learning. AZoAi, viewed 18 September 2024, https://www.azoai.com/news/20230830/Enhancing-Smartphone-Gaze-Tracking-through-Machine-Learning.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Using Machine Learning to Identify Suicide Risks