In a paper recently published in the journal Sustainability, researchers investigated the feasibility of using a deep learning (DL)-based approach for quick and accurate personal protective equipment (PPE) detection in hazardous work environments.
Background
PPE is primarily used to increase the protection level of workers at chemical, construction, and other hazardous sites. The equipment reduces the severity and probability of fatal accidents or injuries, improving worker safety. However, several workers do not comply with the PPE-wearing regulations at their workplaces temporarily due to negligence or lack of awareness, leading to both non-fatal and fatal injuries.
Manual monitoring of workers is erroneous and laborious, necessitating the development of intelligent monitoring systems that can detect PPE compliance by workers accurately and autonomously in real-time during working hours.
The proposed DL-based approach
In this paper, researchers investigated the feasibility of using a two-stage detector based on the faster region-based convolutional neural network (Faster R-CNN) model for accurate and real-time PPE detection. They trained and evaluated the proposed model performance using the four colored hardhats, vest, safety glass (CHVG) dataset containing 1699 annotated images.
The CHVG dataset consisted of eight classes, including a person’s head, body, vest, safety glasses, and hardhats of four different colors, including yellow, red, blue, and white. The share of persons, vests, glass, heads, red, yellow, blue, and white among the 1189 objects/images after data preprocessing were 40%, 18.25%, 4.28%, 6.05%, 10%, 12.53%, 4.54%, and 4.28%, respectively.
Additionally, the dataset was divided into validation, test, and training datasets, with 430, 172, and 115 images being utilized for training, validation, and testing, respectively. The training dataset was utilized during the training of the proposed model, the test dataset served as the base for model evaluation, and the validation dataset was used as a check to ensure the planned proceedings of the model training process.
The data collected from several public repositories and open sources was preprocessed and unified to provide coherent data for the proposed model. Image filtering, scaling, denoising, and augmentation were used to generate a homogeneous set of images, as the data utilized in this study was primarily comprised of images containing different PPE features.
Researchers used the Albumentations library for data augmentation to enhance the model performance using several augmentation techniques, such as hue-saturation value (HSV) alteration, mosaic, and image flipping.
Model training, validation, and evaluation
Researchers investigated both two-shot and single-shot detectors, including the Faster RCNN with the ResNet50 backbone and You Only Look Once version 5 (YOLOv5), respectively, to identify the most suitable architecture for PPE detection.
The YOLOv5 single-shot detector comprised a feature pyramid network (FPN), YOLOv3 detection head, and a CSPDarknet53 backbone, while the proposed faster RCNN two-shot detector consisted of a region-based detector and a region proposal network (RPN). The RPN primarily generates candidate object regions by assessing image regions at various aspect ratios and scales and then utilizes the region-based detector to refine and classify the generated candidate object regions.
In this proposed model, the ResNet50 network was employed for feature extraction from the data and then sending the extracted features to the region-based detector and RPN that utilize the faster RCNN network. Thus, this complex, faster RCNN architecture can ensure accurate detection results with improved speed.
Researchers used the Google Colab platform with Nvidia T4 Tensor Core GPU for this study. The entire process of evaluating and training the model was repeated until the best performance was achieved. In the evaluation process, the validation data were evaluated using mean average precision (mAP50), precision, and recall metrics, with the mAP50 being utilized for benchmark evaluation. Additionally, the speed of the model was also determined by measuring the inference time in seconds.
Eventually, the best-performing architecture/final Faster RCNN model was tested using different images in various environments, and its results were compared with the results of the YOLOv5 model to assess the performance of the proposed model in practical use cases. Both models were trained on a common dataset using the same dataset classes and hyperparameters.
Significance of the study
The proposed Faster RCNN model significantly outperformed the YOLOv5 model in PPE detection. Results showed that the proposed model achieved 96%, 68%, and 78% mAP50, precision, and recall, respectively, while the YOLOv5 model achieved 63.9%, 62.8%, and 55.3% mAP50, precision, and recall, respectively.
The model also displayed a significantly improved mAP of 96% and inference time of 0.17 s compared to the 89.84% mAP and 0.99 s inference time of YOLOX-m/, the best-performing model in the literature. Moreover, the trained Faster RCNN model attained an overall 96% mAP50 and over 50% recall and precision when it was used to classify the eight different classes, including blue, white, vest, yellow, glass, person, head, and red.
To summarize, the findings of this study demonstrated the feasibility of using the proposed Faster RCNN model to accurately localize and identify PPE with a high level of accuracy and shorter detection time in real-time environments. Moreover, the model was also consistent and stable across different confidence thresholds.
Journal reference:
- Ahmed, M. I., Saraireh, L., Rahman, A., Mhran, A., AlKhulaifi, D., Youldash, M., & Gollapalli, M. (2022). Personal Protective Equipment Detection: A Deep-Learning-Based Sustainable Approach. Sustainability, 15(18), 13990. https://doi.org/10.3390/su151813990, https://www.mdpi.com/2071-1050/15/18/13990