Unveiling High-Risk Scenarios: Deep Embedded Clustering for Autonomous Vehicle Testing

Download PDF Copy

By Dr. Sampath LonkaReviewed by Susha Cheriyedath, M.Sc.Jul 26 2023

In a paper published in the journal Accident Analysis & Prevention, researchers proposed a novel method for identifying typical car-to-powered-two-wheeler (PTW) crash scenarios for autonomous vehicle (AV) safety testing. By utilizing real-world crash data from the China In-depth Mobility Safety Study-Traffic Accident (CIMSS-TA) database, the study introduces a high-quality AV testing database and employs an innovative unsupervised machine learning method.

The proposed approach combines dimensionality reduction with stacked autoencoders (SAE) and clustering with k-means to accurately describe and construct testing scenarios. The study emphasizes the importance of scenario-based safety testing, addressing the limitations of previous clustering methods and enhancing scenario reliability.

*Study: Unveiling High-Risk Scenarios: Deep Embedded Clustering for Autonomous Vehicle Testing. Image Credit: metamorworks/Shutterstock*

Background

The rise of AVs aims to reduce traffic accidents, necessitating robust safety validation procedures. Conventional vehicle testing standards prove insufficient for AVs, prompting the need for extensive road testing to outperform human drivers. However, resource-intensive road testing poses significant challenges. To mitigate these issues, virtual simulation testing has emerged as a preferred option due to its cost-effectiveness and risk-free nature. Nevertheless, constructing realistic testing scenarios requires access to real-world crash data.

This study introduces the CIMSS-TA database, which contains detailed motion information crucial for scenario-based AV safety testing. It proposed an unsupervised learning framework that incorporates dimensionality reduction with SAE and clustering with k-means to identify typical car-to-PTW crash scenarios. This approach offers a valuable contribution to AV safety testing.

Literature review

The literature review underscores the importance of scenario-based safety testing for AVs and emphasizes the necessity of standard scenario definitions. While expert-based and data-driven approaches have been employed to construct typical crash scenarios, clustering analysis has gained popularity in the data-driven methods. However, existing clustering techniques often lack comprehensive motion details of the crash-involved parties, leading to biased results.

Addressing this limitation, the study proposes a novel approach using an autoencoder method to reduce data dimensionality before applying the k-means clustering algorithm. This new method aims to enhance scenario-based AV safety testing by considering both static and dynamic information and overcoming the shortcomings of previous studies.

Method

Data: The study relies on the CIMSS-TA database, a research project conducted by Central South University and Volkswagen Group China focusing on road safety and AV safety testing. The database comprises 180 car-to-PTW crashes recorded between 2017 and 2021, featuring high-quality videos from on-road surveillance cameras or onboard video recorders. Crash features are categorized into static and dynamic variables, with special attention paid to transforming the PTW's trajectory for a better description relative to the ego vehicle.

Deep embedded clustering method: To effectively process the high-dimensional pre-crash matrix (PCM) data, the study employs a deep embedded clustering method utilizing autoencoders (AE) for feature extraction. SAE handles complex feature transformations, and the K-means algorithm partitions samples into clusters based on the learned features. The optimal number of clusters is determined using the elbow method, resulting in more accurate and fine-grained scenario descriptions for AV safety testing.

Results

Parameters: The SAE model was trained using MATLAB 2021b on a laptop GPU, and the number of hidden layers was fine-tuned by grid search. The hyperparameters were set to default in MATLAB, and the sparsity regularization in the hidden layer enforced output sparsity. K-means and k-medoids clustering algorithms were compared, and K equal to six was chosen as the optimal number of clusters for both algorithms based on the sum of squares error, average silhouette coefficient, and minimum number of samples (in each cluster) values.

Static analysis: The clustering analysis results are presented, focusing on the static and dynamic elements of the typical scenarios. The static analysis includes several lanes, road type, and road surface conditions, with four-leg intersections being dominant in clusters one, two, and three, while straight roads, three-leg intersections, and ramps are present in clusters four, five, and six, respectively. Initial contact area analysis indicates that clusters one, three, four, and five have higher proportions in front of the car, while cluster two shows a higher proportion on the left side and cluster six on the right side.

Weather conditions are mostly clear, with cluster four having more nighttime scenarios with lighting. Typically, clusters are free from visual obstructions, but there is an exception with cluster five, which has a visual obstruction rate of 59.1%.

Dynamic analysis: The dynamic analysis involves calculating the average distances between PTWs and cars before the crash and illustrating their trajectories and speed distributions. The study identifies similarities and differences compared to existing studies, including hard-to-collect variables significantly improving scenario accuracy and aligning them more closely with real-world traffic conditions. Furthermore, a unique scenario involving a six-lane, two-way road with an exit ramp is identified, contributing to the construction of AV safety testing.

Conclusion

In summary, this study highlights the importance of in-depth crash data for constructing realistic, high-risk scenarios in autonomous driving testing. The novel unsupervised learning framework presented in this paper addresses the limitations of traditional clustering methods by effectively extracting relevant features and enhancing scenario reliability.

The study offers a high-quality AV safety test database with detailed motion and pre-crash video data, incorporating variables like lanes and acceleration to improve scenario accuracy. Nevertheless, challenges remain due to the rarity of traffic crashes and data collection constraints. While the study focuses on motion data and urban roadways, it provides valuable insights for AV safety testing scenarios.

Journal reference:

Zhou, R., Huang, H., Lee, J., Huang, X., Chen, J., & Zhou, H. (2023). Identifying typical pre-crash scenarios based on in-depth crash data with deep embedded clustering for autonomous vehicle safety testing. Accident Analysis & Prevention, 191, 107218. DOI: https://doi.org/10.1016/j.aap.2023.107218, https://www.sciencedirect.com/science/article/pii/S0001457523002658

Posted in: AI Research News

Comments (0)

Written by

Dr. Sampath Lonka

Dr. Sampath Lonka is a scientific writer based in Bangalore, India, with a strong academic background in Mathematics and extensive experience in content writing. He has a Ph.D. in Mathematics from the University of Hyderabad and is deeply passionate about teaching, writing, and research. Sampath enjoys teaching Mathematics, Statistics, and AI to both undergraduate and postgraduate students. What sets him apart is his unique approach to teaching Mathematics through programming, making the subject more engaging and practical for students.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Lonka, Sampath. (2023, July 26). Unveiling High-Risk Scenarios: Deep Embedded Clustering for Autonomous Vehicle Testing. AZoAi. Retrieved on April 01, 2025 from https://www.azoai.com/news/20230726/Unveiling-High-Risk-Scenarios-Deep-Embedded-Clustering-for-Autonomous-Vehicle-Testing.aspx.
MLA
Lonka, Sampath. "Unveiling High-Risk Scenarios: Deep Embedded Clustering for Autonomous Vehicle Testing". AZoAi. 01 April 2025. <https://www.azoai.com/news/20230726/Unveiling-High-Risk-Scenarios-Deep-Embedded-Clustering-for-Autonomous-Vehicle-Testing.aspx>.
Chicago
Lonka, Sampath. "Unveiling High-Risk Scenarios: Deep Embedded Clustering for Autonomous Vehicle Testing". AZoAi. https://www.azoai.com/news/20230726/Unveiling-High-Risk-Scenarios-Deep-Embedded-Clustering-for-Autonomous-Vehicle-Testing.aspx. (accessed April 01, 2025).
Harvard
Lonka, Sampath. 2023. Unveiling High-Risk Scenarios: Deep Embedded Clustering for Autonomous Vehicle Testing. AZoAi, viewed 01 April 2025, https://www.azoai.com/news/20230726/Unveiling-High-Risk-Scenarios-Deep-Embedded-Clustering-for-Autonomous-Vehicle-Testing.aspx.