In a paper published in the journal Scientific Reports, researchers presented a novel approach, a time convolution network with an attention mechanism for human activity recognition (TCN-Attention-HAR), to enhance human activity recognition using wearable sensors.
By addressing challenges like insufficient feature extraction and gradient explosion, the model effectively identified vital features and employed attention mechanisms to prioritize relevant information. Evaluations on wireless sensor data mining (WISDM), physical activity monitoring using pervasive accelerometers (PAMAP2), and University of Southern California-human activity dataset (USC-HAD) datasets showed significant performance improvements compared to other models. Additionally, knowledge distillation techniques resulted in a highly efficient student model with improved accuracy, highlighting the model's potential in human activity identification.
Related Work
Past work in HAR has seen a surge in research interest, driven by the widespread adoption of wearable sensor devices. HAR involves extracting activity features from sensor-generated time series data and finds applications in various domains like healthcare, smart homes, and human-computer interaction.
Traditional HAR methods, relying on machine learning techniques such as k-nearest neighbor and naive Bayes, often need help in feature extraction due to manual engineering and lack of deep representation. With the rise of deep learning, methods like convolutional neural networks (CNNs) and recurrent neural networks (RNNs) have significantly improved feature extraction. However, capturing human movements' diverse and temporal nature remains a challenge.
Activity Recognition Framework
The recognition process of human activities using a network model typically involves four key steps: data acquisition, data processing, model training, and model evaluation. Data acquisition entails collecting acceleration, angular velocity, and gravity signals through sensors during human activities.
Researchers address the time series prediction classification problem inherent in sensor-based human activity recognition by employing a sliding window method to segment input signal data into windows, with window width and step size determined experimentally.
Then, researchers feed the processed data into the TCN-Attention-HAR model for training. This model utilizes a time convolutional network to extract time-dependent information from the preprocessed data at different scales. The feature representation from each channel combines into a tensor, undergoes feature fusion, and passes through the attention layer. The attention mechanism strengthens time correlation within the TCN network, focusing on pertinent features while suppressing irrelevant information. The global average pooling layer processes local information, and the Softmax function applies to activity classification.
The proposed model's TCN module consists of three layers with different convolutional kernel sizes, enhancing the model's temporal causality and receptive field. Causal convolution strictly follows the temporal order of data, while dilated convolution expands the receptive field. Residual connections, dropout, and layer normalization mitigate gradient vanishing within the TCN. Initially used in machine translation, the attention mechanism enhances the model's focus on relevant features.
Knowledge distillation further improves model accuracy by transferring knowledge from a teacher to a student model, with the distillation loss coefficient controlling the transfer process. Overall, the proposed model integrates various components to effectively recognize human activities, demonstrating feature extraction and advancements in model optimization techniques.
Research Methodology Summary
The experiments section elaborates on the methodology and findings of the proposed model's evaluation using the WISDM, PAMAP2, and USC-HAD datasets, all representing real-world scenarios. Researchers have split this section into four primary segments: dataset introduction, data preprocessing, evaluation metrics, and results and discussions.
Researchers, operating on a Windows 11 system with an i7-11800H CPU and 64 GB of memory, utilized TensorFlow 2. x for model training and testing, employing three datasets—WISDM, Pamap2, and USC-HAD—to validate the model's effectiveness. For dataset introduction, they utilized three datasets—WISDM, Pamap2, and USC-HAD—to validate the model's effectiveness.
WISDM comprises acceleration data from smartphones worn by 36 participants, which capture various movements. Pamap2 focuses on physical activity, with recordings from accelerometers and gyroscopes during exercises performed by nine subjects. USC-HAD utilizes motion node sensors worn by 14 participants, which capture signals during different activities.
Data preprocessing involved applying data cleaning techniques to address noise and errors, followed by data normalization to handle variations in sensor values. Subsequently, the data underwent segmentation using a sliding window method, which is crucial for dividing data into training and test sets. Researchers selected the window size and overlap based on data frequency and activity patterns. Evaluation metrics included recall rate, accuracy, precision, and F1 score, providing insights into model performance. Researchers compared the performance of the proposed model with state-of-the-art methods on the three datasets.
Notably, the TAHAR-Teacher model demonstrated state-of-the-art performance across all datasets, attributed to its robust feature extraction and temporal correlation capabilities. Additionally, the impact of the TCN and attention mechanisms on model performance and knowledge distillation's effectiveness were thoroughly analyzed and discussed. Overall, the experiments underscored the effectiveness of the proposed model in human activity recognition, showcasing advancements in feature extraction and optimization techniques.
Conclusion
To sum up, this paper presented a deep learning model for human activity recognition using wearable sensing data. Researchers constructed a TCN-attention-HAR model by combining TCN and the Attention mechanism. They also utilized the knowledge distillation mechanism to reduce model parameters while maintaining competitive performance.
Experimental results among different models on three public datasets demonstrated that the proposed TRHAR exhibited favorable classification and recognition performance. This research held significant practical value in human activity recognition and provided valuable insights for future research.