In a paper published in the journal Scientific Reports, researchers introduced a novel approach to address the ongoing issue of marine ecological degradation and ocean pollution caused by underwater debris using a Full Stage Auxiliary (FSA) network detector. It incorporates auxiliary focal loss and a multi-attention module to enhance object detection accuracy in the complex underwater realm.
By optimizing gradient combinations and leveraging attention mechanisms, their method improves the identification of small objects amid cluttered backgrounds. Furthermore, incorporating an auxiliary focal loss function addresses sample imbalances, ultimately elevating overall detection precision. Their experimental findings underscore the efficacy of this approach for real-time submarine garbage detection in challenging underwater environments.
Background
Submarine garbage pollution is a growing concern for marine ecosystems worldwide. Poor waste management practices have introduced significant artificial waste into the marine environment, leading to severe ocean pollution. This pollution includes materials such as wood, fishing nets, glass, metals, and plastics, known for their durability and resistance to corrosion, making them long-lasting pollutants in the ocean. Effectively addressing submarine garbage pollution requires the development of efficient detection methods to identify and locate underwater debris quickly. This information is essential for assessing the extent and distribution of submarine garbage pollution and formulating effective pollution control policies.
Related Work
Prior studies have focused on detecting marine debris using remote sensing technology, especially floating plastic waste. However, specific research on identifying submarine garbage is limited. Some researchers have used You Only Look Once (YOLO) v3 and YOLOv4 to detect underwater objects with varying success. Deep learning-based object detection algorithms fall into one-stage and two-stage categories, with YOLO models demonstrating strong performance underwater. Despite their success, detecting submarine garbage is challenging due to blurriness and incomplete contours in underwater images.
Proposed Methodology
Architecture Overview: The FSA model's data augmentation process plays a crucial role in enhancing the model's robustness to diverse real-world scenarios. As the feature maps progress through the Full Stage Shortcut (FSS) modules, the model's ability to capture intricate details and context information is progressively refined, allowing it to excel in detecting objects of varying sizes and complexities.
The Criss-Cross Attention (CCA) module is a pivotal element, enabling the model to intelligently focus on small target regions while effectively filtering out irrelevant background noise. This mechanism significantly contributes to the model's precision in object detection tasks and reinforces its adaptability in challenging environments. Incorporating auxiliary heads for classification, location, and confidence regression alongside the primary head facilitates accurate detection. It enhances the model's versatility, making it a potent tool for many object recognition and localization tasks.
Attention & Focal Loss: The attention mechanism, powered by the CCA module, optimizes Graphics Processing Unit (GPU) utilization, enables larger batch sizes, and improves detection accuracy by emphasizing relevant areas while suppressing background regions. To combat category imbalance in one-stage object detection, the model incorporates the FSS module to generate auxiliary heads for training. These additional heads are trained alongside the primary head with different weightings, ensuring that coarse labels do not compromise the direct head's detection accuracy. The focal loss function is adapted to accommodate these auxiliary heads, further enhancing overall model accuracy and performance. This comprehensive approach and dataset contribute to improved submarine garbage detection, aiding pollution cleanup and recycling endeavors.
Experimental Results
The FSA Networks model demonstrates remarkable object detection capabilities in challenging underwater conditions. Model A, with Highway Separable (HS) and Full Separable (FS) modules, achieves a 52.5% mean Average Precision (mAP) with 464 layers and 121.22 million parameters, outperforming YOLOV7 series models in the accuracy-parameter trade-off. Model B, using FS modules for both backbone and head, slightly increases parameters and Giga Floating-Point Operations Per Second (GFLOPS), achieving a 0.2% mAP boost over HS modules.
Model C incorporates depthwise separable convolution, reducing complexity and parameters while maintaining accuracy. Model D (FSS module) shows a substantial +1.6% mAP improvement with Residual Neural Network (ResNet) -like structure and shortcuts. Models E and F, with CCA and auxiliary head, excel with 2.5% and 3.0% mAP gains over Model A, reaching a 55.5% mAP on the submarine garbage dataset. Despite category-specific limitations, the model's attention mechanisms and architectural enhancements consistently deliver high detection accuracy.
Conclusion
In summary, the one-stage FSA object detection model excels in detecting dense small objects, enhances generalization, and combats overfitting through various augmentation techniques. The CCA module emphasizes small object features by extracting deep abstract features effectively. An auxiliary focal loss function also improves overall detection accuracy, making FSA networks a state-of-the-art choice for real-time object detection. However, the model's increased parameters and computational demands due to attention modules and the potential impact of model width require further investigation and optimization. Further research is needed to reduce computational overhead and explore model width effects. Overall, FSA networks are a valuable advancement for underwater object detection.