In an article published in the journal Nature, researchers addressed challenges in low-altitude drone collision avoidance by developing an air-to-air drone target detection model. They curated a dataset with 10,000 diverse drone images and enhanced you-only-look-once (YOLO) v5 with a lightweight CF2-MC feature extraction network and MG feature fusion module. The enhanced intersection over union (EIoU) loss function replaced complete IoU (CIoU) for improved accuracy.
Background
Unmanned aerial vehicles (UAVs), or drones, have witnessed widespread adoption in military, civilian, and research domains due to their cost-effectiveness, versatility, and adaptability. The proliferation of low-altitude UAVs has posed challenges in collision avoidance and necessitated air-to-air target detection. While existing research primarily focuses on ground-to-air UAV detection, air-to-air scenarios present unique challenges with dynamic and complex backgrounds, small-scale UAVs, and variations in appearance due to dynamic flight characteristics. Previous lightweight models for UAV detection have shown effectiveness in specific scenarios but face limitations in embedded devices and air-to-air target detection.
This paper introduced YOLOv5s-ngn, a novel approach for air-to-air UAV detection, addressing the limitations of existing models. The researchers curated a comprehensive dataset, incorporating diverse backgrounds and varying time periods. They developed YOLOv5s-ngn by enhancing the YOLOv5s backbone with a lightweight feature extraction network, a fusion module utilizing convolutional block attention (CBA), and adopting the EIoU loss function.
Extensive experiments demonstrated the model's superior performance on UAVfly and Det-Fly datasets, with real-time application capabilities on embedded hardware. The study highlighted the need for efficient air-to-air UAV detection models, providing a robust solution with YOLOv5s-ngn, contributing significantly to the evolving landscape of vision-based target detection in UAV domains.
UAVfly dataset
The construction of high-quality datasets was pivotal for training and evaluating object detection models, particularly in the challenging context of visually detecting air-to-air UAVs. While existing datasets like Det-Fly provided valuable data, their accuracy in detecting UAVs in complex backgrounds remained suboptimal. Addressing this gap, this study introduced the novel "UAVfly" dataset, comprising 10,281 images collected from three Chinese provinces.
Unlike the Det-Fly dataset, UAVfly focused on intricate empty-to-empty scenarios, enhancing the dataset's generalization capabilities. The collection spanned diverse geographical environments, including urban, suburban, desert, field, lake, sky, and mountain settings. Images were captured throughout the day, ensuring a comprehensive representation of various time periods. Strict adherence to local UAV regulations, a maximum altitude of 100 m, and a minimum separation distance of 5 m were enforced for safety.
The dataset incorporated challenging factors like varying lighting conditions and dynamic blurring. Professional annotation using LabelMe ensured data accuracy. The dataset was partitioned into training and validation sets for effective model training and evaluation, with a 70/30 split to maximize diversity and mitigate overfitting risks. The UAVfly dataset served as a valuable resource for advancing air-to-air UAV detection research.
Methods
The authors outlined enhancements made to the YOLOv5s object detection model for efficient air-to-air UAV target detection. The YOLOv5s network was described, emphasizing its lightweight architecture and real-time capabilities. Modifications were introduced in the feature extraction network, replacing the C3 module with CF2 and adapting the downsampling segment to MC. The CF2 structure employed channel splitting for improved parallelism and feature reuse, contributing to computational efficiency. The feature fusion module, MG, incorporated spatial and channel attention mechanisms to holistically model high-level semantic information. It also enhanced global information representation through effective attention mechanisms.
Additionally, the authors introduced an improved loss function. The CIoU loss function, addressing bounding box regression challenges, was replaced with the EIoU loss function. EIoU employed an enhanced metric to quantify the overlap between predicted and ground truth boxes, offering penalties for predicted width and height discrepancies. The EIoU loss function was partitioned into overlap loss, center distance loss, and width-height loss components, providing a refined approach to bounding box regression and improving convergence rates. The methods were validated using the self-constructed UAVfly dataset, emphasizing its complexity and diversity in air-to-air scenarios. The proposed YOLOv5s-ngn model demonstrated enhanced accuracy on UAVfly and Det-Fly datasets, achieving industry-leading performance.
Results
Evaluation metrics included traditional performance metrics (precision, recall, mean average precision) and lightweight metrics (parameters, floating points operation, frames per second). The YOLOv5s-ngn achieved a balance between speed and accuracy, demonstrating superior performance over the original YOLOv5s in terms of detection accuracy and convergence speed. Comparative experiments with different backbone networks and ablation studies on feature fusion validated the effectiveness of the proposed modifications.
The model outperformed classical object detection networks on the self-constructed UAVfly dataset and Det-Fly dataset. Embedded experiments on the NVIDIA TX2 platform showcased real-time detection capabilities, with the proposed model achieving a processing rate of 69 frames per second. The results highlighted the YOLOv5s-ngn's exceptional performance in UAV object detection, making it a compelling choice for real-time applications in unmanned aerial vehicle scenarios.
Conclusion
In conclusion, the authors introduced YOLOv5s-ngn, a lightweight and efficient air-to-air UAV detection model, which addressed challenges in collision avoidance. Through innovations in feature extraction, fusion modules, and the adoption of the EIoU loss function, YOLOv5s-ngn achieved superior accuracy and real-time performance. Extensive experiments on diverse datasets and embedded hardware validated its effectiveness, showcasing a remarkable balance between speed and accuracy. The model's deployment potential in UAV scenarios was significant, marking a notable advancement in vision-based target detection for unmanned aerial vehicles.
Journal reference:
- Cheng, Q., Wang, Y., He, W., & Bai, Y. (2024). Lightweight air-to-air unmanned aerial vehicle target detection model. Scientific Reports, 14(1), 2609. https://doi.org/10.1038/s41598-024-53181-2, https://www.nature.com/articles/s41598-024-53181-2