In a paper published in the journal Scientific Reports, researchers introduced shuffle network version 2 (ShuffleNet V2) and ShuffleNet V2 and convolutional block you only look once version 5 (SCB-YOLOv5) as an innovative solution for detecting standardized movements of gymnasts.
YOLOv5 enhanced this by integrating structures from ShuffleNet V2 and convolutional block attention modules (CBAM). Integrating structures from ShuffleNet V2 and CBAM significantly reduced weight while maintaining high detection precision, recall, and mean average precision (mAP) value. Extensive experiments demonstrated the effectiveness of SCB-YOLOv5 for on-site athlete action detection.
Background
Previous work has emphasized the importance of standardizing the movements of aerobic gymnasts to ensure their safety by highlighting the potential of digital sports development in integrating "digital" and "sports" realms. Using deep learning (DL) for human action recognition has witnessed extensive applications across diverse fields, presenting promising opportunities for recognizing athletes' actions. However, existing object detection methods often have computational complexities and high hardware demands, making them less feasible for practical sports education.
Model Overview: SCB-YOLOv5
The SCB-YOLOv5 model is tailored for detecting standardized movements of aerobic gymnasts and operates using a dataset comprising 216 individuals, including 19 teachers and 197 students. The model comprises five key components: input, backbone, neck, head, and predict.
Notably, it utilizes ShuffleNet v2 as its backbone for a lightweight design and integrates the CBAM at the base layer to capture additional feature information. The neck component incorporates a bidirectional feature pyramid network (BiFPN) to enable multi-scale feature fusion by integrating semantic information from the deep network into the shallow network.
The Mosaic-9 technique synthesizes images from four randomly selected images in the training set to enhance the training dataset. This augmentation strategy effectively increases the training data and improves the ability of the model to recognize objects in complex backgrounds. Moreover, the enhanced ShuffleNet V2 architecture integrates a feature blending step before global average pooling, enhancing network capacity and overall performance.
The integration of CBAM after ShuffleNet V2 further enhances the model by preserving high-level semantic information through focused attention on significant aspects of the image. Additionally, using the neck part of the BiFPN addresses the importance of analyzing images at multiple scales. The BiFPN facilitates comprehensive image analysis by aggregating features with different resolutions, ensuring a thorough examination of images with diverse information. By implementing a band-weighted feature fusion method, the BiFPN enables effective multi-scale feature fusion, contributing to exhaustive image analysis capabilities.
In summary, the SCB-YOLOv5 model represents a comprehensive approach to detecting standardized movements of aerobic gymnasts, leveraging state-of-the-art techniques and architectures. The model achieves high detection accuracy by combining lightweight design principles with advanced feature integration strategies while minimizing computational resources. The Mosaic-9 data augmentation technique enhances the robustness of the model by diversifying the training dataset, thereby improving its ability to generalize to various real-world scenarios.
Furthermore, successfully integrating enhanced ShuffleNet V2, CBAM, and BiFPN components underscores the adaptability of the model to complex visual data. These components work synergistically to capture and fuse multi-scale features effectively, enabling the model to accurately detect intricate movements and gestures exhibited by gymnasts. Overall, SCB-YOLOv5 represents a significant advancement in sports action recognition, offering practical solutions for real-world applications such as sports education, performance analysis, and athlete training.
Experimental Enhancement Analysis
Experiment settings involved conducting experiments on a Windows 10 system utilizing the PyTorch DL framework. The hardware configuration included an Intel Core i5-10400F central processing unit (CPU) and an NVIDIA GeForce ray tracing extensions (RTX) 1650 graphics card. The dataset was divided into training and validation sets in an 8:2 ratio. Model training encompassed 100 epochs with a batch size of 2 and an initial learning rate of 0.01.
Performance evaluation metrics included average precision (AP) and mAP for target detection assessment. AP measured the precision of a single category, while mAP represented the mean of AP values across all categories. Additionally, the F1-Score served as a balanced evaluation criterion, focusing on practical overall performance evaluations.
Comparative analysis with other detectors highlighted the superior performance of SCB-YOLOv5 in terms of mAP, showing a 3.53% improvement over YOLOv5. Detection demos effectively illustrated the accuracy of SCB-YOLOv5 in detecting standardized movements of aerobic gymnasts compared to alternative algorithms. The development of SCB-YOLOv5 involved documentation of the impact of the adjustments, particularly focusing on precision, recall, and mAP changes.
An ablation study assessed the impact of various improvements, including integrating ShuffleNet V2 backbone, attention mechanism, and BiFPN. Results indicated a notable increase in mAP value by 3.53 percentage points compared to the original model. Specifically, the integration of these enhancements contributed significantly to the overall performance enhancement. The ablation study emphasized the effectiveness of incorporating the ShuffleNet V2 backbone, attention mechanism, and BiFPN in improving model accuracy. It highlights the importance of these enhancements in achieving superior detection results.
Conclusion
To sum up, this study introduced a dataset for detecting aerobic athletes' actions. It proposed a lightweight algorithm, SCB-YOLOv5, to recognize and regulate these actions, aiming to innovate digital sports teaching processes. The results of multiple experiments showed that the enhanced model had a more significant impact on recognizing irregular hand and leg movements of athletes, outperforming other detectors. This finding was crucial for promoting the sustainable and healthy development of "internet + education."