Enhancing Face Detection with Lightweight Precision: LAFD Algorithm

In an article recently submitted to the arxiv* server, researchers introduced Light and Accurate Face Detection (LAFD), a precise and lightweight face detection algorithm. LAFD was constructed upon Retinaface and utilized a modified MobileNetV3 backbone.

Study: Enhancing Face Detection with Lightweight Precision: LAFD Algorithm. Image credit: metamorworks/Shutterstock
Study: Enhancing Face Detection with Lightweight Precision: LAFD Algorithm. Image credit: metamorworks/Shutterstock

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

The present paper makes significant contributions by adjusting the convolutional kernel size and channel expansion and integrating the "Squeeze-and-Excitation (SE) attention mechanism. The Deformable Convolution Network (DCN) and the focal loss function were incorporated. Results from tests on the WIDERFACE dataset underscored LAFD's substantial accuracy improvements over Retinaface and LFFD. With enhancements reaching up to 8.3%, LAFD maintained its lightweight architecture, retaining a size of only 10.2MB.

Background

Face recognition plays a pivotal role in daily life, and it has evolved through early algorithms, the Adaptive Boosting framework, and the deep learning era. The initial modular matching approach used template images to identify faces, while the AdaBoost framework and Viola-Jones algorithm significantly improved accuracy and speed.

The advent of deep learning introduced techniques like Convolutional Neural Networks (CNN), leading to breakthroughs like Faceness-Net, a deep convolutional network-based algorithm achieving substantial detection improvements. In 2022, YOLOv7 models, including yolov7-tiny and yolov7-lite-s, delivered lightweight and accurate face detection. The Retinaface model aimed for fast detection but struggled with accuracy for complex faces.

Related work

In past studies, Retinaface was a lightweight single-stage face detection network that demonstrated notable performance by employing MobileNetV1 as its backbone network on the validation subsets of the WIDERFACE dataset. The central process of the algorithm involved putting the training dataset into the MobileNetV1 backbone, generating feature maps, performing feature fusion, and then extracting feature pyramid structures through the utilization of the context module. The feature pyramid layers encapsulated varying scales of face information.

In Retinaface, the Small Stage Headless Face Detector (SSH) was employed as the context module, enhancing the model's receptive field to boost the detection of small faces. Furthermore, Retinaface's architecture pre-defined multiple prior boxes allowing for the detection of faces across the image. Each pixel in the feature maps corresponded to two sizes of prior boxes, establishing a complex network of prior boxes designed to accommodate face detection at various positions.

With four types of head predictions – face classification, face box point regression, face key point regression, and 3D dense point regression – Retinaface refined its predictions and incorporated Smooth-L1 loss functions for precise estimation. However, the abundance of overlapping face boxes stemming from the numerous prior boxes posed a challenge. To address this, Retinaface was used along with non-maximum suppression (NMS), ensuring the selection of the most relevant face box. This multi-faceted approach formed the foundation of Retinaface's robust and efficient face detection mechanism.

Proposed method

The LAFD algorithm in the present study introduces significant improvements to its model across three key areas: the backbone network, context module, and loss function. By enhancing the MobileNetV3 backbone module, the channel expansion multiplier is augmented to facilitate greater image information extraction. Furthermore, integrating a 7x7 convolutional kernel widens the receptive field while incorporating the SE attention mechanism enhances feature extraction across various stages.

The method also introduces the DCN to effectively recognize irregular targets. Employing a combined approach of Cross-Entropy Loss and Focal Loss Function further bolsters model accuracy, particularly in recognizing small faces. However, challenges arise from excessive prior boxes during post-processing, potentially causing delays in NMS.

To mitigate this, the algorithm employs Cross-Entropy Loss in early training epochs and transitions to Focal Loss Function in later epochs, maintaining NMS efficiency while improving accuracy. This nuanced interplay between mitigating false recognition and enhancing small-face recognition highlights the method's intricacies, ultimately leading to heightened recall and average accuracy at the cost of reduced precision.

Experimental results

The WIDERFACE dataset, encompassing diverse and challenging facial variations, was employed. The training process utilized the PyTorch framework. Images were proportionally increased to a maximum length of 1560 or width of 1200 before model testing. A detection threshold of 0.5 and an Intersection over Union (IOU) value greater than 0.4 for NMS were applied for model outputs.

The LAFD model showcased substantial improvements with an average accuracy of 94.3%, 92.6%, and 86.2% on the WIDERFACE validation subsets, outperforming Retinaface by 3.6%, 4.4%, and 12.4%, respectively. Traditional methods, like Viola-Jones (V-J) and Deformable Part-based Model (DPM), exhibited lower accuracy due to limited feature extraction capabilities. Faceness-Net and ScaleFace had drawbacks in multi-size feature extraction and attention mechanism, respectively.

Single-stage detectors   SSH and Single Shot MultiBox Detector (SSD) showed better results. Larger models like FANet and TinaFace were less suitable for embedded scenarios, unlike LAFD. The lightweight YOLOv7-tiny performed comparably but with a smaller model size. 

Furthermore, ablation experiments were conducted, revealing the impact of each incorporated module on the deep convolutional network. The new backbone network, Focal Loss Function, deformable convolution, and resizing of test images all contributed positively to Retinaface. The integration of DCN and Focal Loss showed contradictory effects, favoring DCN for its higher accuracy improvement. The model employed a modified MobileNetV3 backbone network with DCN, resulting in a remarkable improvement of 3.3%, 3.9%, and 8.5% across subsets compared to Retinaface. Scaling input images to specific dimensions further boosted accuracy by 3.3%, 4.1%, and 12.4%, respectively, relative to Retinaface.

Conclusion

This work enhances the Retinaface single-stage lightweight face detection network by refining its MobileNet-V3 backbone. Modifications include the SE attention mechanism, Inverted Residuals Block's channel expansion multiplier, and convolution kernel size adjustments for better face detection performance. The Deformable Convolution Network replaces the original SSH layer convolution, and the Cross-Entropy loss function is substituted with the Focus Loss function. Input images are preprocessed by resizing them to 1560px in length and width or 1200px in width equally. Future work will explore the use of Generalized Intersection over Union (GIOU), Distance Intersection over Union (DIOU), and other 2D loss functions, investigate the interplay between the Focal Loss function and DCN, and further optimize the MobileNetV3 backbone network parameters.

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

Journal reference:
Silpaja Chandrasekar

Written by

Silpaja Chandrasekar

Dr. Silpaja Chandrasekar has a Ph.D. in Computer Science from Anna University, Chennai. Her research expertise lies in analyzing traffic parameters under challenging environmental conditions. Additionally, she has gained valuable exposure to diverse research areas, such as detection, tracking, classification, medical image analysis, cancer cell detection, chemistry, and Hamiltonian walks.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Chandrasekar, Silpaja. (2023, August 21). Enhancing Face Detection with Lightweight Precision: LAFD Algorithm. AZoAi. Retrieved on July 06, 2024 from https://www.azoai.com/news/20230811/Enhancing-Face-Detection-with-Lightweight-Precision-LAFD-Algorithm.aspx.

  • MLA

    Chandrasekar, Silpaja. "Enhancing Face Detection with Lightweight Precision: LAFD Algorithm". AZoAi. 06 July 2024. <https://www.azoai.com/news/20230811/Enhancing-Face-Detection-with-Lightweight-Precision-LAFD-Algorithm.aspx>.

  • Chicago

    Chandrasekar, Silpaja. "Enhancing Face Detection with Lightweight Precision: LAFD Algorithm". AZoAi. https://www.azoai.com/news/20230811/Enhancing-Face-Detection-with-Lightweight-Precision-LAFD-Algorithm.aspx. (accessed July 06, 2024).

  • Harvard

    Chandrasekar, Silpaja. 2023. Enhancing Face Detection with Lightweight Precision: LAFD Algorithm. AZoAi, viewed 06 July 2024, https://www.azoai.com/news/20230811/Enhancing-Face-Detection-with-Lightweight-Precision-LAFD-Algorithm.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Dynamic Bayesian Network Structure Learning with Improved Bacterial Foraging Optimization Algorithm