In a paper published in the journal Applied Science, researchers introduced a new approach to improve object detection in low-illumination environments. Their method involves three modules: Low-Level Feature Attention (LFA), Feature Fusion Neck (FFN), and Context-Spatial Decoupling Head (CSDH). Experimental results show promising performance for end-to-end object detection in challenging conditions.
Introduction
Dimly lit environments are common in daily life. In recent times, advancements in computer vision have excelled in visual and speech recognition, with object detection being crucial in applications, especially in fields like autonomous driving and video surveillance. Nevertheless, poor lighting poses challenges for object recognition in the real world due to distortion, low signal-to-noise ratio, and significant noise, resulting in increased false positives and negatives.
Earlier, low-illumination object detection utilized infrared imaging, which was limited by cost and temperature sensitivity. Advanced tasks like object detection in such conditions remain underexplored. Deep learning has produced successful algorithms, including two-stage (e.g., Fast Region Convolutional Neural Network (R-CNN), Faster R-CNN) and single-stage (e.g., You Only Look Once (YOLO), Fully Convolutional One-Stage Object Detection (FCOS)) methods. Despite this, low-illumination detection remains challenging.
Related work
Past studies have treated low-illumination enhancement as pre-processing. An approach employed deep adversarial networks and Faster R-CNN to simulate normal lighting. However, evaluating this enhancement using metrics like mean average precision (mAP) posed challenges due to potential noise and feature loss. Recent studies have focused on optimizing entire detection networks to mitigate these issues. For example, introducing a context fusion and feature pyramid module aimed to enhance feature extraction for low-light images. Despite these developments, the impact of light intensity on performance under severe underexposure remains a significant factor.
In the context of improving low-illumination images, various methods, including those inspired by the Retinex theory and deep learning, have focused on addressing contrast and brightness issues. Synthetic data has often been utilized due to limited ground truth availability, and object detection techniques have evolved by exploring one-stage and two-stage methods. One-stage approaches directly predict bounding boxes and classes, while two-stage methods involve candidate proposal and classification.
Techniques such as Faster R-CNN and CenterNet have played roles in improving both accuracy and efficiency. The application of attention mechanisms has gained momentum within computer vision, offering the capability to concentrate on spatial and channel-based information. Attention modules like Convolutional Block Attention Module (CBAM), Bottleneck Attention Module (BAM), Selective Kernel Networks (SKnet), and Feature Calibration Attention Networks (Fcanet) have demonstrated effectiveness in augmenting the capabilities of deep learning models.
Proposed method
In the present paper, the introduced modules encompass LFA, FFN, and CSDH, all directed towards enhancing low-illumination object detection. Additionally, an improved model based on YOLOv5s is presented. This framework consists of three main parts: LFA in the backbone network, FFN for feature fusion, and CSDH for improved classification and localization. In low-illumination scenarios, objects often blend with dark backgrounds, complicating accurate detection.
The LFA mechanism is introduced to prioritize relevant feature information and address challenges posed by low contrast. The FFN module tackles limitations in feature extraction by effectively combining high- and low-level features, enhancing model comprehension. Further enhancing object localization, the CSDH replaces the coupled head and emphasizes distinct classification and localization tasks. Shallow features provide precise spatial details, while deep features contribute to semantic context for improved performance. By leveraging both feature types, the model achieves enhanced localization capabilities.
Experimental results
The experiments focus on the ExDark dataset for low-illumination images. The performance of the proposed algorithm is evaluated against widely used object detectors using mAP at 0.5 as the metric. The contributions of the components are also studied, and this proposed approach, based on YOLOv5s with LFA, FFN, and CSDH, is compared to other detectors. The results show that the approach achieves a 1.7% improvement in accuracy, demonstrating effectiveness. A comparison with image-enhancement combined YOLO series algorithms highlights the superiority of the proposed approach. Ablation experiments validate the impact of each module, showing improvements of 0.5% each with Local Enhancement Aggregation (LEA) and CSDH. Overall, the algorithm excels across various object categories.
The real-world performance of the improvement was evaluated through the capture of two sets of authentic low-illumination scene images, utilizing Xiaomi 12 and iPhone 11 devices. Object detection was then conducted using both the baseline and enhanced models. It is evident that the enhanced model demonstrates superior performance in real-world scenarios, significantly outperforming the baseline model.
Contribution of the paper
The key contributions of this study can be summarized as follows:
- Improvements to the object-detection pipeline result in higher accuracy in low-light conditions in contrast to conventional methods, benefiting applications like nighttime monitoring and driving.
- Introduction of LFA, FFN neck, and CSDH head modules. LFA enhances feature extraction by combining channel information, FFN neck facilitates mutual learning between feature maps, and CSDH head improves classification and regression tasks.
- Experiments on the ExDark dataset show significant enhancements, with 0.5% to 1.7% improvement over the baseline. Comparison with low-light enhanced images highlights the focus on visual quality rather than object detection accuracy.
Conclusion
To sum up, this study presents a low-illumination environment-based object detector that tackles accuracy challenges. The introduced LEA module enhances low-level detail focus, the modified neck combines semantic information, and the revised detection head improves spatial features. Experimental results on the ExDark dataset demonstrate performance gains of up to 1.7% compared to the baseline. Moving forward, research efforts will focus on expanding datasets, optimizing for lightweight applications, and further enhancing the capabilities of the algorithm in the low-illumination scenarios.