Bridging the Perception Gap: DNNs and Human Peripheral Vision

In an article published in a conference paper at ICLR, researchers from the USA investigated the performance of machine learning models, particularly deep neural networks (DNNs), in detecting objects in images compared to humans. They introduced a new dataset called COCO-Periph to bridge the gap between humans and DNNs by simulating peripheral vision.

Study: Bridging the Perception Gap: DNNs and Human Peripheral Vision. Image credit: Kitreel/Shutterstock
Study: Bridging the Perception Gap: DNNs and Human Peripheral Vision. Image credit: Kitreel/Shutterstock

Background

DNN is a computational model composed of multiple layers of interconnected nodes, or neurons, designed to process and learn from complex data. Inspired by the structure of the human brain, DNNs utilize algorithms to iteratively adjust connections between neurons, enabling them to recognize patterns, classify data, and make predictions. They have demonstrated remarkable capabilities in various fields such as image and speech recognition, natural language processing, and autonomous driving.

Additionally, DNNs have shown significant promise in modeling human visual perception. They can predict neural response patterns and task performance to a certain extent. However, there are still significant differences in how DNNs process information compared to humans. To address this, researchers explored biologically inspired object recognition models that offer benefits such as robustness to occlusion, generalization across scale, and adversarial robustness.

Adversarial training has emerged as a technique with the potential to enhance alignment between DNNs and human perception. DNN benchmarks like BrainScore have played a key role in advancing research in this domain, facilitating a deeper understanding of DNN performance and guiding efforts to improve their alignment with human perceptual capabilities.

About the Research

In the present paper, the authors provide a detailed and thorough examination of DNNs' object detection capabilities when limited by peripheral vision. They aim to understand how DNNs perform in detecting objects under the constraints of human peripheral vision. To achieve this, modifications were made to the existing texture tiling model (TTM) to ensure compatibility with DNNs. Additionally, the researchers created the COCO-Periph dataset, comprising images transformed to capture information available in human peripheral vision.

The study trained the DNNs using the newly developed dataset, evaluated its performance in object detection tasks, and then compared it with the results obtained from human psychophysics experiments. It involved a total of 12 subjects who participated in a human psychophysics experiment. The subjects were seated with their heads placed in the chinrest of an EyeLink 1000 in a tower-mount configuration, positioned 82 cm away from a monitor screen. Throughout the experiment, their left eye position was tracked, and calibration was performed to ensure accurate tracking of eye movements.

The experiment employed a two-interval forced choice (2IFC) task, where subjects had to determine if a target object appeared or disappeared in a sequence. The target object's location was indicated by red arrows, and subjects fixated at a cross positioned at different distances (5, 10, 15, or 20 degrees) away from the object. It consisted of 1040 trials, with breaks provided every 150 trials. Before the main experiment, subjects completed a practice round comprising 15 trials of very easy image pairs to familiarize themselves with the task. Two subjects needed to repeat the practice round to feel comfortable. The images used in the practice round were larger than those in the actual experiment to aid learning.

The authors selected images with present/absent objects that models could confidently detect in the original image. This approach aimed to give models a fair chance at performing on par with humans. The participants in the study included 4 males, 5 females, and 1 non-binary subject, ranging in age from 19 to 31. All participants had self-reported normal or corrected-to-normal visual acuity, with no history of eye surgery. By conducting these experiments, the authors gained insights into the similarities and differences between human and machine perception in the context of peripheral vision.

Research Findings

The outcomes showed that DNNs underperformed compared to humans in object detection tasks when peripheral vision was simulated using the modified TTM. However, training the DNNs on the COCO-Periph dataset helped to reduce the performance gap between humans and DNNs to some extent. This training also led to slight improvements in the robustness of the DNNs. Despite these improvements, DNNs still struggled to achieve the same level of performance as humans in terms of object detection and sensitivity to clutter.

This suggested that the observed behavior could not be solely attributed to a domain shift. The researchers employed the Algorithm1 approach, which yielded the highest critical mean (µ) scores and demonstrated similar performance trends to other approaches. However, for some models, the psychometric fit did not converge, preventing the determination of a value for µ. The study also highlighted the differences in performance between human and machine models, indicating that stimuli alone could not explain the observed gaps.

Conclusion

In summary, the paper comprehensively explored the efficiency of DNNs for object detection. Additionally, it emphasized the importance of task formulation and optimization to improve the alignment between DNNs and humans. The authors tried to enhance the performance and accuracy of DNN models using a novel dataset.

Moreover, they demonstrated that the newly designed dataset could be utilized for tasks such as computer vision, autonomous systems, and artificial intelligence, where accurate object detection is crucial. By accurately modeling human vision, practitioners can improve the accuracy and robustness of DNN models, enabling them to benefit from human visual processing properties.

Journal reference:
Muhammad Osama

Written by

Muhammad Osama

Muhammad Osama is a full-time data analytics consultant and freelance technical writer based in Delhi, India. He specializes in transforming complex technical concepts into accessible content. He has a Bachelor of Technology in Mechanical Engineering with specialization in AI & Robotics from Galgotias University, India, and he has extensive experience in technical content writing, data science and analytics, and artificial intelligence.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Osama, Muhammad. (2024, March 20). Bridging the Perception Gap: DNNs and Human Peripheral Vision. AZoAi. Retrieved on December 22, 2024 from https://www.azoai.com/news/20240320/Bridging-the-Perception-Gap-DNNs-and-Human-Peripheral-Vision.aspx.

  • MLA

    Osama, Muhammad. "Bridging the Perception Gap: DNNs and Human Peripheral Vision". AZoAi. 22 December 2024. <https://www.azoai.com/news/20240320/Bridging-the-Perception-Gap-DNNs-and-Human-Peripheral-Vision.aspx>.

  • Chicago

    Osama, Muhammad. "Bridging the Perception Gap: DNNs and Human Peripheral Vision". AZoAi. https://www.azoai.com/news/20240320/Bridging-the-Perception-Gap-DNNs-and-Human-Peripheral-Vision.aspx. (accessed December 22, 2024).

  • Harvard

    Osama, Muhammad. 2024. Bridging the Perception Gap: DNNs and Human Peripheral Vision. AZoAi, viewed 22 December 2024, https://www.azoai.com/news/20240320/Bridging-the-Perception-Gap-DNNs-and-Human-Peripheral-Vision.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Machine Learning Identifies Seismic Precursors, Advancing Earthquake Forecasting Capabilities