In an article published in the journal Nature, researchers introduced a real-time anomalous behavior recognition approach for marine life using a lightweight deep learning model (Lite3D), object detection, and multitarget tracking. The goal was to monitor and assess the ecological health of oceans by analyzing the anomalous behavior of underwater creatures, acting as a biometer for the impact of global warming and pollution.
Background
Rising global temperatures and pollution pose severe threats to marine habitats and species. To address the ecological impact, understanding the behavior of underwater creatures is crucial. While previous studies used traditional algorithms for marine life behavior recognition, the present study employed deep learning, specifically Lite3D, a lightweight 3D convolutional neural network (CNN). Existing methods, like DCG-DTW, faced challenges in accurately recognizing individual fish behavior due to posture limitations and computationally intensive matching scores.
This study leveraged advancements in AI and deep learning to enhance behavior recognition. Lite3D, unlike its counterparts, extracted features automatically, improving accuracy and real-time processing. Traditional models like C3D and Two-Stream Convolutional Networks, designed for terrestrial action recognition, lacked the capacity to detect anomalous behavior of individual marine objects within an image frame. Lite3D addressed this gap by combining object detection and tracking for precise anomaly identification.
Previous AI-based marine behavior recognition models, such as C3D and BCS-YOLOv5, required manual feature extraction or explicit definition of behavior classes. In contrast, Lite3D employed a cut-paste-warp scheme, using regions-of-interest (ROIs) extracted from tracked fish, making it highly efficient and applicable to real-time edge computing.
This research stemmed from the urgent need to monitor and protect marine ecosystems, especially in regions experiencing severe coral bleaching. The researchers proposed Lite3D as an effective tool for AI-based behavior recognition, offering a promising solution to bridge the gaps in previous methodologies and contribute to the conservation and sustainable use of marine ecosystems.
Results
This study introduced a real-time anomalous behavior recognition approach for marine life, focusing on cobia and tilapia. The researchers used a private dataset with samples of behavior classes and employed the Lite3D lightweight deep learning model, incorporating object detection and multitarget tracking. Two loss functions - cross-entropy and focal loss - were tested during backpropagation training, with focal loss yielding better results, especially for the "grinding" class of tilapia.
The necessity of the cut-paste-warp scheme for generating ROI-only frames was justified by comparing results with and without the scheme. The scheme significantly improved the performance for the "grinding" class, demonstrating its effectiveness. Lite3D outperformed other models, including 3DCNN, C3D, and BCS-YOLOv5, in terms of precision, recall, F1-score, frames per second (FPS), and the number of free parameters. BCS-YOLOv5, solely based on object detection, exhibited lower precision due to more false-positive predictions.
Visual results from a test video with multiple fish frames demonstrated Lite3D's ability to differentiate between normal and anomalous behaviors. Lite3D successfully detected a fish returning to a "normal" swimming posture, outperforming other models in this aspect.
In terms of speed, Lite3D proved to be 3 to 5 times faster than the YOLOv4-tiny object detector, achieving 98 fps on RTX2080 and 39 fps on Jetson Xavier™ NX. This speed made Lite3D suitable for real-time edge-computing applications on remotely operated vehicles (ROV) or autonomous underwater vehicles (AUV).
Methodology
The Lite3D model was designed with four convolutional layers and three pooling layers, utilizing a lightweight architecture for real-time processing in underwater behavior recognition. It employed a 3D CNN to handle spatiotemporal data efficiently. Replacing fully connected (FC) layers with convolutional layers aimed to reduce trainable parameters, enhancing resilience to overfitting and improving generalization, especially with limited training data. Lite3D incorporated a cut-paste-warp scheme to generate ROI sequences for training and employs focal loss for effective learning, addressing imbalanced category numbers and focusing on challenging data.
The architecture included a multi-thread tracking algorithm working with YOLOv4_tiny for ROI sequence generation. The prediction phase involved tracking and predicting the positions of targets using a linear predictor. New targets were added, and lost targets were removed based on tracking performance.
Conclusion
In conclusion, Lite3D, a compact model, outperformed 3D CNN and C3D in underwater behavior recognition with only 1/57 of trainable weights. Its minimal size, 1/50 of counterparts, does not compromise performance. Lite3D preserved spatial and temporal information via warping and 3D convolutions. The method aided disease detection in marine life and offered potential for aquaculture.
Future improvements may involve Squeeze-and-Excitation Networks and computational optimization. Integration with detectors and tracking is also crucial for robust performance in complex underwater conditions.
Journal reference:
- Wang, J.-H., Hsu, T.-H., Lai, Y.-C., Peng, Y.-T., Chen, Z.-Y., Lin, Y.-R., Huang, C.-W., & Chiang, C.-P. (2023). Anomalous behavior recognition of underwater creatures using lite 3D full-convolution network. Scientific Reports, 13(1), 20051. https://doi.org/10.1038/s41598-023-47128-2, https://www.nature.com/articles/s41598-023-47128-2