VALERIE22: Advancing Autonomous Driving with High-Performance Synthetic Data

Download PDF Copy

By Samudrapom DamReviewed by Susha Cheriyedath, M.Sc.Aug 23 2023

In an article recently submitted to ArXiv, researchers discussed the VALERIE synthesis pipeline and assessed the quality of the VALERIE22 synthetic dataset.

*Study: VALERIE22: Advancing Autonomous Driving with High-Performance Synthetic Data. Image credit: ART STOCK CREATIVE /Shutterstock*

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.

Background

The VALERIE procedural tools pipeline is primarily a synthetic data generator built to improve the understanding of domain-specific factors influencing the deep neural network (DNN) perception performance. This DNN validation methodology was developed under the German collaborative research project KI Absicherung to detect pedestrians for automated driving in urban environments.

One of the critical objectives of the research project was to make the machine learning (ML)--based perception function safety aspects more predictable. The VALERIE tools pipeline can be utilized to improve data synthesis quality and to comprehend the factors determining the domain gap between real and synthetic datasets.

A robust VALERIE synthesis pipeline has been developed that allows fully automated complex urban scene creation. The VALERIE22 dataset generated using the VALERIE tool pipeline provides a photorealistic sensor simulation that is rendered from scenes synthesized automatically.

Additionally, the dataset contains a uniquely rich metadata set, which allows the extraction of specific scenes and semantic features, such as pixel-accurate occlusion rates, angle + distance to the camera, and positions in the scene, enabling different tests on the data and facilitating a better understanding of DNN performance.

In this paper, the authors discussed several VALERIE synthesis pipeline functionalities and evaluated the quality of synthetic data generated by the VALERIE procedural tools pipeline by comparing the data with other synthetic datasets in the domain of autonomous driving.

VALERIE data synthesis pipeline

Synthetic data computation: Graphics methods are used to generate synthetic data, with commercially available and open-source software systems being used for color images. Blender can be utilized as a base to render, edit, and import three-dimensional (3D) content.

The highly varied synthetic data generation involves multiple steps, including the generation of a 3D scene model with a city model using a street/terrain generator, a placement step that involves the insertion of 3D assets, such as road elements, pedestrians, vegetation, and cars, into the scene, and varying a scene parameter set, including time-of-the-day and object/camera position, before every rendering pass.

In 3D scene model generation, parameters such as pavement and street width, segment types such as sub-urban residential and tall houses, and materials for segments, sidewalks, and roads are generated depending on the scene description. Additionally, the semantic information on the geometry and types of the segments is passed to the subsequent step as input.

In the placement step, the object/asset insertion is based on per-segment density declaration and an asset list for the segment type, such as sidewalk or road. After this step, a complete 3D scene with all assets is realized.

Object and asset instances: In the asset database, the assets possess an identifier designated as the universally unique identifier (UUID), which is utilized in scene description either in selection lists used by the probabilistic scene generator or explicitly for static objects. Moreover, the asset ID can also be utilized for object identification in rendered frames.

Metadata and ground truth: The VALERIE22 dataset contains a rich set of ground truth and metadata annotations, including pixel-aligned object instances, pixel-aligned class groups, object 3D bounding box, object two-dimensional (2D) bounding box, scene parameters, camera parameter, object occlusion, and object orientation and position.

Sensor simulation: A sensor model can be used for real sensor behavior simulation. The camera error model can be simulated by applying an automatic, histogram-based exposure control and sensor noise as Gaussian Noise, followed by non-linear Gamma correction.

Evaluation of the VALERIE-22 dataset quality

Several experiments were performed using the semantic segmentation task to evaluate the quality of the dataset generated using the VALERIE synthesis pipeline. The segmentation performance of the DeeplabV3+ model trained using the VALERIE22 dataset was compared with the segmentation models trained using other synthetic datasets, including synthetic pedestrian dataset (SynPeDS), grand theft auto V (GTAV), and Synscapes.

Subsequently, the performance of these segmentation models was evaluated on five real-world automotive segmentation datasets, including Mapillary Vistas, India Driving Dataset (IDD), Berkeley deepdrive 100K (BDD100K), Cityscapes, and Audi autonomous driving dataset (A2D2).

Additionally, the authors assessed the segmentation performance on the CityPersons dataset person class when the DeeplabV3+ model was trained on subsets of the VALERIE22 dataset. The person class performance was also evaluated using models trained on SynPeDS dataset subsets.

Moreover, the authors investigated the differences in the performance of models based on the number of unique person assets utilized to prepare the datasets and their subsets.

Significance of the study

During the evaluation of the cross-domain segmentation performance of Synscapes, GTAV, SynPeDS, and VALERIE22 synthetic datasets on five real-world datasets, the VALERIE22 dataset demonstrated the best performance on three real-world datasets, including BDD100K, Cityscapes, and IDD, and marginally worse performance compared to the SynPeDS trained model on A2D2.

Overall, the VALERIE22 dataset performed significantly better than Synscapes and GTAV synthetic datasets and outperformed SynPeDS on the Cityscapes dataset in the cross-domain evaluation. In the cross-domain performance evaluation of higher unique person counts, the VALERIE22 subset outperformed the SynPeDS subset on the Cityscapes dataset.

To summarize, the findings of this study demonstrated that the VALERIE22 dataset is one of the best-performing synthetic datasets in the open domain.

Journal reference:

Preliminary scientific report. Grau, O., Hagn, K. (2023). VALERIE22 -- A photorealistic, richly metadata-annotated dataset of urban environments. ArXiv. https://doi.org/10.48550/arXiv.2308.09632, https://arxiv.org/abs/2308.09632

Posted in: AI Research News

Comments (0)

Written by

Samudrapom Dam

Samudrapom Dam is a freelance scientific and business writer based in Kolkata, India. He has been writing articles related to business and scientific topics for more than one and a half years. He has extensive experience in writing about advanced technologies, information technology, machinery, metals and metal products, clean technologies, finance and banking, automotive, household products, and the aerospace industry. He is passionate about the latest developments in advanced technologies, the ways these developments can be implemented in a real-world situation, and how these developments can positively impact common people.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Dam, Samudrapom. (2023, August 23). VALERIE22: Advancing Autonomous Driving with High-Performance Synthetic Data. AZoAi. Retrieved on July 03, 2025 from https://www.azoai.com/news/20230823/VALERIE22-Advancing-Autonomous-Driving-with-High-Performance-Synthetic-Data.aspx.
MLA
Dam, Samudrapom. "VALERIE22: Advancing Autonomous Driving with High-Performance Synthetic Data". AZoAi. 03 July 2025. <https://www.azoai.com/news/20230823/VALERIE22-Advancing-Autonomous-Driving-with-High-Performance-Synthetic-Data.aspx>.
Chicago
Dam, Samudrapom. "VALERIE22: Advancing Autonomous Driving with High-Performance Synthetic Data". AZoAi. https://www.azoai.com/news/20230823/VALERIE22-Advancing-Autonomous-Driving-with-High-Performance-Synthetic-Data.aspx. (accessed July 03, 2025).
Harvard
Dam, Samudrapom. 2023. VALERIE22: Advancing Autonomous Driving with High-Performance Synthetic Data. AZoAi, viewed 03 July 2025, https://www.azoai.com/news/20230823/VALERIE22-Advancing-Autonomous-Driving-with-High-Performance-Synthetic-Data.aspx.