In a paper published in the journal PLOS ONE, researchers address the vulnerability of modern vehicles' controller area networks (CANs) to exploitation due to lacking security features. They introduce the first comprehensive guide to open CAN intrusion detection system (IDS) datasets, categorizing attacks and critically analyzing each dataset's quality and attributes.
Current datasets often need to improve in representing real-world scenarios, limiting the comparability and reproducibility of results in CAN security research. The researchers contribute to the Real Oak Ridge National Laboratory automotive dynamometer (ROAD) CAN IDS dataset, featuring over 3.5 hours of a vehicle's CAN data with diverse attacks, aiming to establish a benchmark for more effective and realistic testing in the field.
Related Work
Past research has extensively explored the vulnerabilities of modern vehicles, which increasingly rely on electronic control units (ECUs) and CANs for drive-by-wire functionality. Despite the widespread use of CANs for facilitating ECU communication through a standardized protocol, their security flaws, including the absence of authentication and encryption, have raised concerns.
Attack vectors on intra-vehicle CANs have expanded, especially with the growing connectivity options in vehicles. While previous studies have primarily focused on demonstrating the susceptibility of cars to hacking or proposing novel CAN IDSs, there exists a critical challenge in reproducing and comparing these methods due to the need for appropriate datasets.
Summary of Automotive IDS Datasets
The dataset from the hacking and countermeasures research lab (HCRL) is called the HCRL CAN Intrusion Dataset. It focuses on offset ratio and time interval-based IDS for CAN networks. The dataset includes accurate CAN data from a Kia Soul and features attacks such as denial of service (DoS), fuzzing, and impersonation attacks. However, the dataset has drawbacks, including discrepancies in documentation, unclear injection intervals, and issues with the impersonation attack characterization. It is recommended primarily for testing IDSs leveraging remote frames.
Another dataset from HCRL is the HCRL survival analysis dataset for Auto IDS. This dataset involves accurate CAN data from three vehicles (Hyundai Sonata, Kia Soul, and Chevrolet Spark). It includes flooding attacks like DoS, fuzzing, and targeted ID attacks. The dataset is beneficial for testing timing-based IDS on multiple vehicles, illustrating adaptability to different bus load scenarios. However, drawbacks include the blatant nature of attacks, lack of information on target IDs, and potentially insufficient ambient data for robust training.
HCRL also utilized the car hacking dataset to assess various IDSs. This dataset includes accurate CAN data from a Hyundai sonata and covers DoS, fuzzing, and targeted ID attacks. While researchers widely use the dataset in the literature, it presents drawbacks such as significant artifacts after the conclusion of attacks, unstealthy attack characteristics, and differing ambient and attack data formats.
Bosch SynCAN, created by Hanselmann at Bosch gesellschaft mit beschränkter haftung (GmbH), is a synthetic dataset designed for training and testing their CAN IDS, CANet. It is the first signal-based dataset, providing timestamped signal values rather than raw binary data. The dataset includes simulated attacks like fabrication, suspension, and masquerade. Benefits include nuanced masquerade attacks, but drawbacks include the synthetic nature of the data and the inability to verify the real-world impact of simulated attacks.
Finally, the Technische Universiteit Eindhoven (TU/E) auto CAN bus intrusion dataset from Eindhoven University of Technology includes accurate data from two cars and synthetic data from a CAN testbed. It covers various simulated attacks, including diagnostic protocol attacks and suspension attacks. Drawbacks include the alteration of timestamps in post-processing, potentially unrealistic synthetic data, and unstructured attack labels.
ROAD Dataset: IDS Benchmarking Goldmine.
The ROAD dataset is a valuable resource for testing IDS in CAN security. It comprises 33 attack captures and 12 ambient captures, offering a diverse range of scenarios for evaluation. The dataset includes syntactic metadata and signal-translated versions for specific captures, providing a comprehensive foundation for IDS development. The CAN-D algorithm translates raw CAN data into signals, enhancing the dataset's usability.
CAN data was collected using SocketCAN software on a Linux computer connected to a vase leaf light V2 via the onboard diagnostics II (OBD-II) port. For the sake of anonymity, the organization has chosen not to disclose the make/model of the vehicle from the mid-2010s, and they have actively obfuscated the data to preserve privacy while maintaining its relevance for IDS research. The dataset features attacks such as fuzzing, targeted ID fabrication, masquerade attacks, and a unique accelerator Attack. Physical verification of the attack effects adds more realism to the dataset.
The ROAD dataset employs obfuscation techniques like timestamp shifts, ID replacement, reserved ID removal, and anonymization. Despite limitations, such as focusing on dynamometer data and reliance on simulation for masquerade attacks, ROAD's strengths lie in diverse dynamometer data for realistic testing. It features various blatant and subtle attacks, making it valuable for benchmarking and evaluating IDS techniques. Case studies highlight its role in comparing methods, testing architectures, and establishing a taxonomy for CAN data. Future studies plan to utilize ROAD for benchmarking and exploring emerging automotive network security technologies.
Conclusion
To sum up, this paper identifies two concerning trends in CAN IDS research: the need for comparability among methods and the inability to assess IDS approaches against subtle, advanced attacks. Introducing a comprehensive guide to publicly available CAN data addresses these challenges, providing a go-to resource for researchers seeking suitable datasets.
The newly introduced ROAD dataset further contributes by offering accurate CAN data featuring diverse attacks, facilitating the testing of various techniques found in the literature. While ROAD bridges existing gaps in CAN IDS data, such as the absence of accurate masquerade attack data and limited accurate signal-translated CAN data with attacks, it does not delve into testing IDS methods, as demonstrated in related literature.
Journal reference:
- Verma, M. E., Bridges, R. A., Iannacone, M. D., Hollifield, S. C., Moriano, P., Hespeler, S., Kay, B., & Combs, F. L. (2024). A comprehensive guide to CAN IDS data and introduction of the ROAD dataset. PLOS ONE, 19:1, e0296879–e0296879. https://doi.org/10.1371/journal.pone.0296879, https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0296879