In an article published in the journal Drones, researchers discussed the development of an optimized rigorous advanced cutting-edge model for leveraging protection to ecosystems (ORACLE), a state-of-the-art computer vision model designed for automated detection and tracking of wild birds using drone footage.
They addressed challenges like high altitudes and dynamic movement, achieving 91.89% mean average precision (mAP) at 50% intersection over union (IoU). ORACLE leveraged object detection and multi-object tracking (MOT) techniques to enhance avian species identification and monitoring for wildlife conservation.
Background
Advanced technologies have transformed wildlife study and conservation, providing researchers with powerful tools to monitor animal behavior in natural habitats. Traditionally, avian surveillance in areas like the Amvrakikos Gulf relied on manual methods such as physical monitoring and telescopes, which posed risks to wildlife and were limited by stress to birds and visibility constraints. Unmanned aerial vehicles (UAVs), or drones, now offer a versatile alternative by capturing high-resolution imagery and video from previously inaccessible perspectives.
This study introduced ORACLE, a cutting-edge computer vision model designed for automated bird detection and tracking using drone footage. ORACLE utilized deep learning to overcome challenges in remote surveillance, achieving precise identification of birds, including small- to medium-sized species, even in high-altitude and dynamic environments. This innovation promised significant advancements in wildlife conservation efforts.
An image depicting wildlife surveillance using drones over an islet, recorded at 40 m altitude.
Optimized Object Detection and Tracking Methodologies
The researchers focused on evaluating various you-only-look-once (YOLO) models for object detection tasks, specifically targeting small- to medium-sized objects in high-resolution drone footage. YOLO models were chosen for their efficiency and speed, for real-time applications. The authors evaluated multiple YOLO architectures (YOLOv5, YOLOv7, YOLOv8) using the Microsoft common objects in context (COCO) dataset, which included annotations for small, medium, and large objects.
The evaluation showed that YOLOv5x6 performed best among YOLOv5 models, achieving the highest mAP on the COCO dataset. YOLOv7 models, particularly with the P6 architecture, also demonstrated optimal performance. YOLOv8 models, known for their efficiency in detecting small- to medium-sized objects, were selected for further analysis due to their performance in handling such specialized tasks.
ORACLE was a five-layered model designed for advanced object detection and tracking in high-resolution drone footage. It employed state-of-the-art computer vision models to extract valuable analytics, such as wildlife counts and behavioral statistics, from the environment. The model progressed through layers including pre-processing, object detection, post-detection processing, tracking, and post-processing to achieve these analytics.
Each layer of ORACLE contributed uniquely to its functionality.
- Pre-processing layer: The pre-processing layer dynamically loaded models and enhanced frames with gamma correction.
- Object detection layer: The object detection layer utilized model-specific inference methods to detect objects across frames.
- Post-detection processing layer: The post-detection processing layer merged detections split during inference due to tiling.
- Tracking layer: The tracking layer assigned identifiers to tracked objects across frames.
- Post-processing layer: The post-processing layer generated visualizations and extracted insightful information from the environment.
To handle high-resolution images effectively, the researchers implemented an image-tiling approach. This technique segmented large images into smaller tiles, which were individually processed by the model. Post-inference, detections from these tiles were merged to reconstruct the complete image, ensuring no information loss during inference. The custom tiling algorithm addressed challenges associated with memory consumption and inference speed, particularly in handling large-scale images.
Fine-tuning of models, especially for small object detection in wildlife surveillance via drones, was critical for improving accuracy. The proposed approach involved transfer learning from pre-trained models, image augmentation techniques, and adjusting hyper-parameters during training. These methods were aimed at enhancing model performance and adapting it to the specific characteristics of the dataset.
For MOT, the authors employed advanced models like an omni-scale network (OSNet) trained on the multi-scene multi-time person re-identification (MSMT)17 dataset. OSNet, coupled with DeepOC-Sort for MOT with re-identification, proved effective in tracking small to medium-sized objects in dynamic environments.
The AMVRADIA Dataset
The AMVRADIA dataset, named after Greece's Amvrakikos Gulf, was developed to aid in wildlife protection through drone footage analysis. Initially comprising three videos, the dataset evolved to include numerous annotated images. The current dataset included 27,189 annotations across 1,104 4K resolution images, divided into "Initial" and "Augmented" sets.
The "Initial" dataset featured tiled images without augmentation, while the "Augmented" dataset employed zoom/crop techniques to enhance model performance. Despite potential blurriness and reduced accuracy from augmentation, these methods improved the model's overall effectiveness in detecting wildlife during inference.
Comprehensive Performance Analysis and Evaluation Results
The researchers analyzed the performance of various models on the AMVRADIA dataset, focusing on object detection and MOT of small- to medium-sized detections. During the training phase, the dataset was split into training, testing, and validation subsets in a 70-30% ratio. The evaluation methodology involved annotating videos with numerous objects and assessing both the tracker and object detector at multiple IoU thresholds. Correct tracks and changes in track identifications were counted to measure accuracy.
Different models were trained and evaluated, with YOLOv7, YOLOv7x, YOLOv8, and YOLOv8x-p2 showing varying results. YOLOv8x-p2 trained on augmented data performed best, demonstrating significant improvements in mAP scores compared to other models. The study revealed that models might behave differently with unique datasets.
For tracking evaluation, the best-performing models were used to assess the accuracy of object detection and tracking algorithms. YOLOv8x-p2 again showed superior performance, closely followed by YOLOv8x. The authors highlighted the challenge of false positives and missed tracks in estimating object counts, with YOLOv8x models providing the most accurate count estimates.
Visualization techniques were also implemented, displaying tracked objects in static boxes with zoom and additional information like estimated wildlife count, enhancing the understanding of model performance in real-world applications.
Conclusion
In conclusion, the ORACLE model represented a significant advancement in automated wildlife surveillance using drones and computer vision. By leveraging state-of-the-art object detection and MOT techniques, ORACLE achieved remarkable precision in identifying and tracking wild birds under challenging conditions.
The AMVRADIA dataset played a crucial role in training and evaluating the model, demonstrating its effectiveness in real-world applications. This study underscored the potential of advanced technologies in enhancing wildlife conservation efforts and set the stage for future developments, including expanded datasets and integration with additional artificial intelligence capabilities for comprehensive ecological monitoring.
Journal reference:
- Mpouziotas D, Karvelis P, Stylios C. Advanced Computer Vision Methods for Tracking Wild Birds from Drone Footage. Drones. 2024; 8(6):259. DOI: 10.3390/drones8060259, https://www.mdpi.com/2504-446X/8/6/259