In a paper published in the Journal of Imaging, researchers introduced an enhanced you only look once (YOLO) model combined with edge detection and image segmentation techniques to tackle the challenge of detecting overlapping shoeprints in noisy environments.
Traditional convolutional neural networks (CNN) struggled with this task due to complex textures and backgrounds. The new method demonstrated significant improvements in detection sensitivity and precision. Heatmaps from convolution layers illustrated the network's effective enhancements.
Background
Past work has shown that shoeprints are crucial for criminal detection, with accurate feature extraction vital for effective recognition. Edge detection, using methods like Sobel and Canny, plays a key role in identifying object boundaries and has evolved with deep learning techniques.
Recent advancements include using CNNs and attention mechanisms to improve edge detection accuracy and efficiency. YOLO, a leading object detection algorithm, has progressed through versions, with YOLOv8 introducing new modules to enhance performance.
Generating Overlapping Shoeprints
Due to the lack of a publicly available dataset for overlapping shoeprints, this study generated such samples from single shoeprint images from the German state criminal police offices and forensity AG. The original dataset contained 300 crime scene images and 1,175 reference images, which were used to create overlapping shoeprints by simulating various noises, positions, rotations, and overlapping relationships. This process involved generating 200 unlabelled images by layering semi-transparent shoeprint images over noise-filled backgrounds, which were then split into training and validation sets.
The image labeling process utilized the labelme software to annotate shoeprints with bounding boxes, which were converted to YOLO format for model compatibility. The analysts divided the dataset into 160 training images and 40 validation images. The edge detection was performed using the Canny algorithm, which involved applying Gaussian filtering and gradient computation to identify edges in the images. The team used the resulting edge-detected images alongside the original images to train the YOLO model.
With 168 layers and over 11 million parameters, the YOLOv8 model uses modules like cross-stage partial network (C2F) and spatial pyramid pooling fast (SPPF) to enhance gradient flow and processing speed. YOLOv8's architecture includes upsampling mechanisms to detect small targets and integrates deeper and shallower features for improved sensitivity in complex tasks.
Evaluation metrics included precision, recall, and mean average precision (mAP), with mAP50 and mAP50–95 providing insights into model performance at various overlap thresholds. Teams used class activation mapping (CAM) heatmaps to visualize activated regions within the network, offering insights into how the model focuses on different features during detection.
Experiment Results Overview
Three experiments were conducted using the Google Colab platform with a graphics card for training. In the first experiment (E1), the dataset was employed with various hyperparameters to identify effective values. The researchers set the image size to 640 × 640, batch size to 16, dropout was disabled, and the learning rate was constant at 0.01. They set momentum at 0.937 and weight decay at 0.0005.
The training lasted 1000 epochs but was halted early at 410 iterations due to no improvement in the last 50 epochs. This decision was guided by a hyperparameter named ‘patience,’ set to 50, aimed at reducing training time and cost. The total training duration was 0.2 hours.
In the second experiment (E2), the effectiveness of the ‘patience’ hyperparameter was evaluated. By removing it, training continued for the full 1000 epochs, lasting 0.5 hours. It allowed comparison between the number of training iterations and results. The performance metrics showed significant fluctuations in precision and recall, especially before 450 epochs, with a decrease in model convergence rate after 500 epochs.
The third experiment (E3) focused on the impact of edge detection technology on target recognition. Researchers used models and hyperparameters from E1 and E2; newly generated samples were tested. The YOLOv8 model achieved accuracy rates exceeding 85% for minor overlaps and over 70% for almost complete overlaps. Analysis of the log curves revealed fluctuations in precision and recall due to varying sample difficulty. The model converged quickly in the 410-training iteration scenario but exhibited instability in learning rates.
Results from edge detection were mixed. Although edge detection reduced training epochs from 340 to 230, it led to a decrease in evaluation parameters. mAP50 dropped from 0.966 to 0.957, and mAP50–95 decreased from 0.673 to 0.589. Precision and recall also fell.
The confusion matrices revealed that edge detection technology compromised image information, leading to more false positives and missed detections. While edge detection provided some insights, it also introduced challenges that affected model performance and accuracy.
Conclusion
To sum up, this study pioneered using a fully supervised neural network for detecting partially covered shoeprints in noisy environments, achieving over 85% confidence for partially obscured samples and over 70% for nearly fully covered samples. The study also created and publicly released a simulated dataset for future research. Despite limitations, such as the lack of varying shoeprint scale and the ineffectiveness of edge detection, the research set a foundation for further improvements and comparisons with other neural network models. Future work should explore complex network structures, diverse sample sources, and real-world validations.
Journal reference:
- Li, C., et al. (2024). Overlapping Shoeprint Detection by Edge Detection and Deep Learning. Journal of Imaging, 10:8, 186–186. DOI: 10.3390/jimaging10080186, https://www.mdpi.com/2313-433X/10/8/186