In an article published in the Decision Analytics Journal, researchers presented a deep learning model using the you-only-look-once (YOLO) algorithm to assist visually impaired individuals in detecting potholes through real-time camera data.
The model, integrated into an application, provided auditory or haptic feedback, enabling safer navigation. Achieving 82.7% accuracy and 30 frames per second (FPS) in live video, the model enhanced mobility and safety for visually impaired users by detecting nearby potholes.
Background
Object recognition systems have evolved significantly, driven by the need for high-speed and precise identification in various applications. The YOLO algorithm is very effective at detecting objects in real time, and it's used in things like self-driving cars and security systems.
Yet, visually impaired individuals face significant obstacles, particularly with identifying potholes—dangers that are challenging to spot due to their unpredictable shapes and sizes. Traditional methods like edge detection and template matching have not been very successful in solving this problem.
Research has revealed that deep learning models, particularly those utilizing YOLO, outperform traditional machine learning methods in detection accuracy. Yet, despite their advanced capabilities, many of these systems are either not easily portable or rely on internet access, which reduces their practicality for individuals with visual impairments.
This paper addressed these gaps by proposing a YOLOv5-based pothole detection system designed for real-time use on mobile devices, providing auditory or haptic feedback to enhance safety and independence for visually impaired travelers.
Methodology for Accessible Pothole Detection
The researchers introduced a mobile application designed to detect potholes on roads, specifically tailored for visually impaired users. This app leveraged the YOLOv5 algorithm for real-time object detection, integrating with Google text-to-speech (GTTS) to alert users of nearby potholes.
The application was unique in that it operated without a graphical user interface (GUI) and did not require user prompts; instead, it automatically recorded video via voice assistance upon app activation. When a pothole was detected, the user received an audible warning, enhancing road safety.
YOLOv5 was chosen for its efficiency and ability to process custom datasets quickly. It comprised three key components: the backbone, neck, and head. The backbone, based on cross-stage partial (CSP)-Darknet53—extracted image features. The neck, utilizing path aggregation network (PANet), enhanced feature representation.
Finally, the head generated the final predictions, including bounding box coordinates and class probabilities. Key processes included data augmentation, model training, and post-processing with techniques like non-maximum suppression to ensure accurate and robust pothole detection. The application did not require user registration and functioned effectively with standard mobile cameras, offering an accessible solution for road safety.
Results and Analysis
The experimental analysis focused on implementing the YOLOv5 model for pothole detection. The model was initially pre-trained on the common objects in context (COCO) dataset and later fine-tuned using a specialized dataset containing 9,240 images of potholes under various conditions. The dataset was divided into three subsets, 6,091 images for training, 2,094 for validation, and 1,055 for testing. The training process involved the use of txt format labels to help the model learn to detect and localize potholes accurately.
Data preprocessing and augmentation steps were crucial to enhancing the model's performance. To enhance the model's robustness under varying lighting and orientation conditions, techniques such as auto-orientation, contrast adjustment, image flipping, rotation, and saturation adjustments were utilized.
The YOLOv5 model was trained on Google Colab, employing the Adam optimizer along with a blend of loss functions, including binary cross-entropy, focal loss, and generalized intersection over union (GIoU) loss. The trained model was assessed using precision, recall, and mean average precision (mAP) metrics, resulting in a precision of 86.2%, a recall of 75.9%, and an mAP at 0.5 IoU of 82.5%. When tested on live video, the model detected potholes at a rate of 30 FPS with a resolution of 1280 × 720.
Conclusion
In conclusion, the researchers successfully developed a pothole detection model using the YOLO algorithm, achieving 82.7% accuracy and 30 FPS in real-time video. Integrated into a mobile application, the system provided auditory or haptic feedback to visually impaired users, enhancing their road safety.
While the current model was limited to detecting potholes, future improvements aim to expand its capabilities to identify additional hazards and enhance its performance across diverse road conditions, ensuring greater safety and independence for visually impaired individuals. Further development will also focus on optimizing the model for mobile devices and improving the detection range.
Journal reference:
- Paramarthalingam, A., Sivaraman, J., Theerthagiri, P., Vijayakumar, B., & Baskaran, V. (2024). A deep learning model to assist visually impaired in pothole detection using computer vision. Decision Analytics Journal, 100507. DOI: 10.1016/j.dajour.2024.100507, https://www.sciencedirect.com/science/article/pii/S2772662224001115