In a paper published in the journal Scientific Reports, researchers examined how humans swiftly process scene information to navigate their immediate surroundings. They utilized electroencephalography (EEG) to record brain responses to visual scenes and correlated them with computational models representing three critical aspects of scene processing: 2D, 3D, and semantic information.
Additionally, they integrated a behavioral model capturing navigational affordances. The findings unveiled a temporal processing hierarchy, indicating that navigational affordance is processed later than other scene features like 2D, 3D, and semantic information. It sheds light on the sequential order in which the human brain computes complex scene details, suggesting a strategic utilization of these pieces of information for navigation planning.
Related Work
Past work has delved into rapidly extracting multifaceted visual information from scenes, crucial for navigational planning. The question of how the brain computes this information for route planning has sparked debate. Some studies suggest the early intertwining of navigational affordance with low-level visual features, implying parallel processing. Conversely, navigation entails complex computations integrating various scene features like 3D and semantic aspects. Object affordance research highlights the secondary role of affordances to perception.
DNN feature assessment
This study collected EEG data from 16 healthy volunteers, ensuring normal or corrected-to-normal vision. Participants provided informed consent and received monetary compensation. The Freie Universität Berlin ethics committee approved the researchers, who maintained ethical standards throughout the experiment.
Stimuli comprised 50 color images of indoor environments with clearly discernible navigational paths, each measuring 1024 x 768 pixels. The researchers presented these images on a gray screen, positioning fixation targets centrally to engage participants during viewing.
The researchers aimed to engage participants in explicitly processing navigational affordances through the experimental paradigm. They tasked participants with imagining the directions of navigational paths relative to their viewpoint while viewing the stimuli. Each image was presented for 200 ms, followed by a randomized inter-trial interval of 600-800 ms. The researchers introduced catch trials to ensure participant engagement. They prompted participants to indicate whether an arrow on the screen pointed congruently or incongruently with the navigational path from the previous trial.
Researchers recorded EEG and preprocessed using standard procedures, focusing on the 17 most relevant channels. Epochs were segmented relative to stimulus onset, baseline-corrected, and down-sampled. Artifacts were identified and removed before further analysis. Pairwise decoding accuracy was calculated for each image pair across epochs, providing insight into how well event-related potential (ERP) epochs differentiated between scene images at various time points.
Navigational affordance features were quantified using a model developed by Bonner and Epstein based on participants' drawings of navigational paths in the same set of images. To assess indoor scene's low-, mid-, and high-level features, the researchers utilized activations from 18 pre-trained deep neural network (DNN) models. These models were grouped into 2D, 3D, and semantic tasks, aligning with previous research demonstrating their correlation with brain activations.
The researchers compared EEG responses and DNN models using representational similarity analysis (RSA), enabling data integration from computational models, behavior, and neuroimaging modalities. Variance partitioning analysis was employed to estimate unique variances explained by different models at each time point, aiding in identifying feature representations emerging during visual processing.
Temporal dynamics analysis
This study recorded EEG responses from 16 healthy volunteers while viewing 50 indoor scene images. Researchers instructed participants to assess navigational affordance by mentally planning potential exit paths through the scenes, indicating whether the paths led to the left, center, or right. Interspersed catch trials required participants to determine if the displayed exit path corresponded to any from the previous trial. EEG data, DNN models, and behavioral data were analyzed using RSA to understand the temporal emergence of visual and navigational features in the human brain.
Transforming peri-stimulus EEG responses into representational dissimilarity matrices (RDMs) enabled comparing EEG data with DNN models and behavioral data. Researchers constructed RDMs for 2D, 3D, and semantic features using DNN activations, while they based navigational affordance model RDMs on participants' indicated exit routes. Researchers performed variance partitioning via regression to determine how much variance of an EEG RDM at a given time point each model's RDMs uniquely explained, revealing different temporal activation patterns.
All models explained unique variance in EEG, with the 2D DNN RDM explaining the most variance, followed by the semantic DNN RDM. The contributions of the navigational affordance model (NAM) and 3D DNN RDMs were lower, indicating that the experiment could uniquely track all feature representations. A temporal pattern in peak timings was observed, with the 2D DNN RDM peaking first, followed by the semantic and 3D DNN RDMs. Interestingly, the NAM RDM peaked significantly later, suggesting a hierarchy of scene feature processing leading up to navigational affordance representation.
The study highlights the temporal sequence of 2D, 3D, semantic features, and navigational affordances, with prior research supporting the emergence of low-level 2D features followed by high-level semantic features. Researchers noted that the parallel processing of 3D features with semantic features. They discussed the delay in maritime affordance representation emergence alongside potential differences in study design. Acknowledged limitations include ecological validity concerns, suggesting future research directions to incorporate additional computational models and explore dynamic environments, ultimately enhancing understanding of scene perception's temporal dynamics and significance in navigation planning.
Conclusion
To sum up, the study elucidated the temporal dynamics of scene perception, highlighting the sequential emergence of 2D, 3D, semantic features, and navigational affordances. It underscored the precedence of low-level 2D features, followed by high-level semantic features, while noting the concurrent processing of 3D features with semantic features. Researchers discussed the delay in maritime affordance representation and considerations for study design differences.
Acknowledging limitations in ecological validity, the researchers proposed avenues for future research to incorporate additional computational models and explore dynamic environments. Overall, the findings deepened the understanding of scene perception's temporal dynamics and implications in navigation planning.
Journal reference:
- Dwivedi, K., Sadiya, S., Balode, M. P., Roig, G., & Cichy, R. M. (2024). Visual features are processed before navigational affordances in the human brain. Scientific Reports, 14:1, 5573. https://doi.org/10.1038/s41598-024-55652-y, https://www.nature.com/articles/s41598-024-55652-y