In an article recently submitted to the ArXiv* server, researchers delved into the future of automated driving by introducing a novel approach known as Qualitative eXplainable Graph (QXG) for qualitative spatiotemporal reasoning in long-term scenes while emphasizing the need for robustness. Using experiments conducted using real-world data, a compelling demonstration emerged: the QXG could be efficiently computed in real-time while imposing minimal storage demands. This accomplishment significantly reinforced trust and perception within automated driving processes.
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.
Background
The rapid development of Automated Vehicles (AVs) is significantly propelled by the integration of Artificial Intelligence (AI) methods and Deep Learning (DL) models. Notably, a recent milestone has been reached in Europe with the introduction of the first level-3 automated system, enabling hands-off driving. As AI gains widespread adoption, it has become imperative to elucidate the intricacies of AV automated perception and control. This is particularly crucial for the comprehension of complex DL models like Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and transformers. The societal acceptance of AVs is intrinsically tied to these AI models' dependability, transparency, and credibility.
Related work
Past studies have categorized Explainable AI methods customized for AV into specific groupings. These include vision-based explanations, which revolved around pinpointing particular image segments that influenced the decision-making process of the AV controller. Another category encompassed feature importance scores, where researchers directed their efforts toward quantifying how individual inputs contributed to the model's predictions. Additionally, there were text-based explanations, which aimed to provide comprehensive insights into natural language, shedding light on the rationale behind the decision-making of AV.
Proposed method
The present paper examines the potential of leveraging QXGs to improve the perception and accuracy of automated driving systems. The scenes are extracted from the NuScenes dataset, comprising only four frames. The QXG captures decisive spatiotemporal relationships among the objects present in the scene. However, the QXG uses no quantitative specifics such as object distances or speeds. The scene contains an ego car (o1) and three other vehicles (o2, o3, and o4), and each vehicle exhibits distinct temporal behaviors. Many objects appear and disappear, while others remain constant throughout the timeframe. The QXG records this information, including the different relationships between the objects in the scene. The motivation behind computing QXGs for AD scenes is rooted in several objectives, four of which are discussed in this present paper, with experimental validation provided for the first motivation:
Efficient Processing and Storage: QXGs offer more advantages in data processing and storage. By extracting the extensive scenes into qualitative relationships, the amount of data volume is considerably reduced compared to the raw sensor data or pixel-based representations. For instance, QXGs condense essential spatiotemporal relationship information from potentially 40 GB (as in pixel-based datasets like NuScenes) to under 4 GB. This streamlined data empowers quicker processing, maximizes computational resources, and clears the way for real-time operations, thus mitigating the computational load associated with raw sensor data.
Interpretability and Explainability of Long Scenes: QXGs allow explanations of AV actions and their ability to capture qualitative relationships of the objects in the intricate scenes. This interpretable form highlights clear insights into the relationship of objects, namely enhancing the understanding of factors like pedestrian behavior, speed, and position. By showing elements such as proximity, traffic rules, and intentions of the pedestrians, QXGs give to comprehending and justifying AV decisions, cultivating greater trust and accountability.
Learning and Mining from QXGs: QXGs allow learning and pattern discovery from AD scenes. By analyzing qualitative relationships, many data mining techniques can discover the latent spatial and temporal patterns, correlations, and high-level knowledge about the objects in the scene. This structured and interpretable framework extends opportunities for advanced learning techniques that amplify various aspects of AD, including behavior prediction and anomaly detection.
Enhanced Scene Description & Scenario Generation: QXGs, as symbolic representations, are used to enhance scene descriptions for large language models (LLMs). By including the qualitative relationships that unravel over the period, the LLM model can result in more contextually informed and accurate scene descriptions. For example, ChatGPT fed with a QXG can create a detailed description capturing object dynamics and relationships, enriching understanding. Moreover, QXGs easily generate driving scenarios using Scenario Description Language (SDL).
Constructing the QXG: QXG Builder is a novel approach tailored to construct spatiotemporal explainable graphs that faithfully represent intricate scenes. At the core of this approach, the QXG is introduced, a profound tool for capturing and unraveling the qualitative relationships that underscore the interpretation of scenes. This approach holds immense potential in untangling the complexities inherent in scenes, particularly those marked by spatiotemporal dynamics.
Within the framework of the QXG, a diverse array of qualitative relationships is systematically encoded through a set of labels. Each label corresponds to a specific relationship, such as 'p' for 'precedes,' 'm' for 'meets,' 'o' for 'overlaps,' and so forth. These relationships collectively furnish a foundation to decipher the intricate dynamics of objects within a given scene.
The operationalization of the approach relies on two pivotal algorithms:
Algorithm 1: Graph-based Efficient Qualitative Constraint Acquisition (GEQCA): This algorithm forms the bedrock for efficiently acquiring qualitative constraints between objects within a graph. It operates based on a set of objects (X) and a predefined language (Γ) encapsulating the spectrum of qualitative relations. The outcome is a qualitative graph (G) that encapsulates relationships between pairs of objects.
Algorithm 2: QXG-BUILDER: This algorithm is responsible for constructing the QXG. It takes a sequence of frames (S) that represent a scene and a collection of possible qualitative relations (Φ) and crafts a QXG that adeptly captures the spatiotemporal relationships across frames.
The mechanism of Algorithm 2 entails iterating through each frame (fk) of the scene. For every pair of objects (oi, oj) within a frame, Algorithm 2 employs Algorithm 1 (GEQCA) to discern the relevant qualitative relation that defines the interaction between the objects. Consequently, the QXG is updated to represent these relationships accurately.
A key highlight is the automated oracle that drives the GEQCA procedure. This mechanism autonomously classifies qualitative queries rooted in object relations, removing the need for manual intervention. This automation significantly bolsters efficiency, especially when contrasted with an exhaustive approach necessitating the consideration of all 169 possible relations for every object pair.
Theorem 1: Computational Complexity: The computational complexity underpinning the QXG Builder approach is established within the paper. In a scene encompassing 'n' frames and 'm' objects, Algorithm QXG-BUILDER's time complexity hinges on both frame count (n) and object count (m). Specifically, the time complexity is characterized by O (n × (DT + m^2)), where 'DT' encapsulates the time complexity of the object detection and tracking function. The spatial dimensions of the approach are encapsulated by O (n × m^2), underscoring the QXG's configuration in terms of vertices, edges, and potential relations.
In practical terms, the QXG Builder algorithm stands out for its remarkable efficiency. It operates within a cubic time complexity, accommodating real-world constraints seamlessly. The fact that the object detection and tracking function routinely operates within 100 milliseconds underscores the pragmatic viability and effectiveness of the proposed approach.
Overall, the QXG Builder methodology emerges as a robust and efficient avenue for constructing the QXG. This graph, in turn, serves as a formidable asset for deciphering intricate relationships within scenes and elevating the transparency of automated systems.
Conclusion
The presented paper introduces QXG-BUILDER, an algorithm tailored for constructing QXG from sensor data within AV. By harnessing qualitative constraint acquisition through GEQCA, QXG-BUILDER enhances the interpretability and operational efficiency of extended driving scenes. Its computational efficiency facilitates real-time processing, rendering it suitable for immediate AV responses.
The evaluation demonstrates QXG Builder's effectiveness across various sensors, showcasing its potential for real-time applications in autonomous vehicles. The approach significantly reduces memory usage and processing time, enhancing performance and safety in autonomous driving scenarios.
A noteworthy advantage lies in its capacity to curtail memory storage demands compared to storing raw sensor data, offering utility for resource-constrained AV systems. QXG-BUILDER's merits encompass structured scene comprehension, transparent decision-making, and insightful information extraction. Future avenues include assessing scalability, integration with other AV modules, and exploring applications in anomaly detection and predictive analysis to bolster safety and performance.
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.