Lightweight Enhancements in YOLOv5 for Vehicle Detection

In a paper published in the journal Sensors, researchers introduced a lightweight enhancement to the YOLOv5 algorithm, leveraging integrated perceptual attention (IPA) and multiscale spatial channel reconstruction (MSCCR) modules. The proposed method reduced model parameters and boosted average accuracy (mAP@50). It maintained computational efficiency without increasing floating point operations per second (FLOPS)—this improvement optimized vehicle detection for intelligent traffic management systems, enhancing efficiency and functionality.

Study: Lightweight Enhancements in YOLOv5 for Vehicle Detection. Image credit: carlos castilla/Shutterstock
Study: Lightweight Enhancements in YOLOv5 for Vehicle Detection. Image credit: carlos castilla/Shutterstock

In addition to reducing model parameters and improving accuracy, integrating IPA and MSCCR modules provided richer contextual information for enhanced vehicle detection in diverse traffic environments. The optimized algorithm promises to advance intelligent traffic management and control systems significantly.

Related Work

Previous research in vehicle detection algorithms, primarily centered around you only looking once version 5 (YOLOv5), has focused on tackling challenges in intricate traffic environments. While original YOLO and YOLOtiny models offer different trade-offs in accuracy and computational complexity, recent enhancements have improved accuracy with increased complexity or reduced parameters with lower accuracy. Integrating transformer encoders improved performance but added computational cost, while lightweight networks like MobileNet sacrificed accuracy for simplicity. However, these approaches still need help with issues such as increased complexity or difficulty capturing detailed features in complex scenes.

YOLOv5 Enhancements and MSCCR Integration

In the improvements to YOLOv5s, integrated perceptual attention (IPA) and a C3_MR structure were introduced to redesign the backbone network. Inspired by the mobile vision transformer (MobileViT), a combination of convolution and self-attention principles was employed, with C3_MR used to aggregate shallow features and integrated perceptual attention to aggregate deep features. This reduced model parameters and facilitated hierarchical feature learning, enhancing the model's expressiveness.

The integrated perception attention module (IPA) aimed to mitigate the high computational cost of transformer encoders. IPA adopted a parallel two-branch structure, utilizing efficient attention for capturing global information and convolutional attention for local information. By incorporating the idea of grouping, IPA reduced parameters and computational complexity while effectively aggregating information from global and local branches.

Furthermore, the MSCCR is centered on spatial and channel reconstruction convolution (SCConv) to reduce computational redundancy and facilitate representative feature learning. By employing SCConv, MSCCR effectively reduced the number of parameters, with its parameters being a fifth of standard convolution parameters. Integrating efficient multiscale attention (EMA) into MSCCR facilitated multiscale spatial information acquisition without adding parameters.

In the C3_MR build, researchers replaced the bottleneck residual structure module in the YOLOv5 backbone network with MSCCR. This replacement aimed to address the loss of feature information while reducing parameters. MSCCR was approximately 1.8 times smaller than the bottleneck residual structure module, as shown by parameter comparisons, thus optimizing the backbone network's efficiency.

Advancements in Vehicle Detection

The study utilized the UA-detrac multi-object tracking and detection benchmark (UA-DETRAC) dataset, comprising surveillance videos from various locations and weather conditions, with 8250 vehicles and 1.21 million labeled objects. Researchers performed frame extraction to streamline the dataset and prevent redundancy, resulting in a new training and validation set.

Experimental equipment included an Ubuntu 20.04 long-term support (LTS) operating system with an Intel Xeon gold 6330 CPU, 128 GB of random access memory (RAM), and ray tracing extensions (RTX) 3090 graphics processing unit (GPU) with 24 GB of VRAM. Researchers employed PyTorch 1.10.1 with CUDA 11.8 as the deep learning framework, with a batch size of 32 and 100 training epochs.

The evaluation focused on mean average precision ([email protected]) and model parameter count to assess YOLOv5s' performance improvements. Researchers conducted a comparative analysis against a faster-region-based convolutional neural network (RCNN) and single shot multiBox detector (SSD), indicating superior accuracy and reduced parameters with the enhanced YOLOv5s algorithm.

Compared with popular networks such as mobilenet version 2 (MobileNetV2), MobileNetV3, and EfficientNet, the improved backbone network demonstrated higher mAP@50 and mAP@50:95 while maintaining a similar parameter count. Furthermore, comparisons with YOLOv3-tiny, YOLOv4-tiny, and the original YOLOv5s model showcased improved accuracy and reduced parameter count in the enhanced algorithm.

Visual results and gradient-weighted class activation mapping (Grad-CAM) visualizations depicted the enhanced model's superior adaptability and feature extraction capabilities, especially in complex environments. Ablation experiments further validated the effectiveness of the proposed improvements, highlighting enhanced accuracy without increasing model parameters.

Conclusion

To sum up, the study proposed integrating integrated perceptual attention into the YOLOv5s framework to create a lightweight vehicle detection model, aiming to address existing algorithms' complexities and hardware demands. Experimentally, the enhanced algorithm exhibited a 3.1% average precision increase on the UA-DETRAC dataset compared to YOLOv5s, outperforming SSD and Faster-RCNN with a 3.3% higher mAP@50 each.

Additionally, it surpassed other backbone networks, achieving a 5.6% to 6.7% higher mAP@50 than MobileNetV2, MobileNetV3, and EfficientNet, and demonstrated a 5.7% to 6.7% higher mAP@50 than YOLOv3-tiny and YOLOv4-tiny. The model proved effective across various scenarios, ensuring improved accuracy while reducing computational costs, thereby preventing the potential for deployment in resource-constrained devices. Future endeavors could focus on practical implementations in embedded devices, further refining the algorithm for real-world applications.

Journal reference:

Article Revisions

  • Jun 26 2024 - Fixed broken journal paper link.
Silpaja Chandrasekar

Written by

Silpaja Chandrasekar

Dr. Silpaja Chandrasekar has a Ph.D. in Computer Science from Anna University, Chennai. Her research expertise lies in analyzing traffic parameters under challenging environmental conditions. Additionally, she has gained valuable exposure to diverse research areas, such as detection, tracking, classification, medical image analysis, cancer cell detection, chemistry, and Hamiltonian walks.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Chandrasekar, Silpaja. (2024, June 25). Lightweight Enhancements in YOLOv5 for Vehicle Detection. AZoAi. Retrieved on November 24, 2024 from https://www.azoai.com/news/20240218/Lightweight-Enhancements-in-YOLOv5-for-Vehicle-Detection.aspx.

  • MLA

    Chandrasekar, Silpaja. "Lightweight Enhancements in YOLOv5 for Vehicle Detection". AZoAi. 24 November 2024. <https://www.azoai.com/news/20240218/Lightweight-Enhancements-in-YOLOv5-for-Vehicle-Detection.aspx>.

  • Chicago

    Chandrasekar, Silpaja. "Lightweight Enhancements in YOLOv5 for Vehicle Detection". AZoAi. https://www.azoai.com/news/20240218/Lightweight-Enhancements-in-YOLOv5-for-Vehicle-Detection.aspx. (accessed November 24, 2024).

  • Harvard

    Chandrasekar, Silpaja. 2024. Lightweight Enhancements in YOLOv5 for Vehicle Detection. AZoAi, viewed 24 November 2024, https://www.azoai.com/news/20240218/Lightweight-Enhancements-in-YOLOv5-for-Vehicle-Detection.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
OpenAI Simplifies and Scales Continuous-Time Consistency Models for Faster AI Generation