In a paper published in the journal Sensors, researchers explored recent developments in facial emotion recognition (FER), focusing on neural network models. The study traces the evolution of effective architectures, favoring convolutional neural networks (CNNs) over alternatives such as recurrent neural networks (RNNs) and generative adversarial networks (GANs).
Background
In recent years, FER has gained significant attention as an automated process. FER systems target identifying emotions and their intensities and classifying genuine or simulated expressions. Employing various artificial neural networks (ANNs), FER has outperformed traditional methods such as local binary pattern (LBP) or histogram of oriented gradients (HOG) with support vector machines (SVM), random forests or k-nearest neighbors (KNN).
ANNs, including deep neural networks (DNNs), enable accurate subject-independent emotion detection by analyzing diverse training data, even considering skeletal movements. FER's practical applications span healthcare, business, security, education, and manufacturing. While Ekman and Friesen's six fundamental emotions can be recognized, discerning their authenticity and voluntary control remains challenging. Facial Action Coding by Ekman and Friesen defines 46 Action Units (AUs) linked to facial muscles, forming expressions. Neural network-based FER involves face detection, feature extraction, and emotion prediction, exploiting deep learning's capability to extract features.
Databases for advancing FER systems
FER system advancements heavily rely on facial expression databases. A comprehensive dataset is essential for automated systems targeting specific emotion classes. High classification rates are achieved, yet obtaining maximum accuracy demands larger training datasets, capturing the wide spectrum of emotions a person experiences. As emotion variety grows, neural networks require diverse training data, preventing biases and improving performance.
Medical conditions can confound recognition due to facial muscle paralysis, potentially leading to misdiagnoses. Notably, different databases can yield varied classification rates using the same neural network architecture. Presently, numerous databases support emotion recognition, varying in image size, posture, lighting, and subject count. Controlled environments simulate expressions, while natural settings capture real-world dynamics.
Cultural norms affect expressions, influencing recognition accuracy based on skin color or ethnicity. Emotion recognition encompasses spontaneous and in-the-wild datasets, with micro-expressions posing challenges. Microexpressions, fleeting and hidden, demand precise motion tracking and recognition. Databases now address micro-expression recognition, which is vital for understanding human behavior, offering insights into emotional states and reactions.
Neural network dynamics: Empowering FER
Neural networks have pervaded diverse domains such as computer vision, deep learning, and natural language processing, contributing to artificial intelligence (AI) advancements. These networks strike a balance between processing time and accurate classification, bolstered by complex architectures adept at identifying specific features.
A neural network encompasses three key phases: training (backpropagation), validation (unbiased model evaluation), and testing (forward propagation). Notably, in computer vision, neural networks excel in image classification, face identification, and emotion recognition. Their utility extends to medical diagnoses, user interaction, and beyond. Neural network types vary for face identification and emotion recognition, with CNNs being prominent. Inception networks, visual geometric group (VGG) architectures, residual neural networks (ResNets), EfficientNet, NasNet-Large, and CapsNets contribute significantly.
Transfer learning accelerates development, with CNNs proving the most efficient. GANs enhance neural networks' cognitive simulation abilities. RNNs, especially LSTM, handle sequence-based emotion recognition. Despite individual variations and context challenges, FER systems play a vital role in social interactions, friend-enemy differentiation, and human-computer interaction enhancement.
In FER advancements, a critical criterion for evaluating real-world solutions pertains to the authenticity of emotions, whether spontaneous or staged. While certain systems boast favorable recognition percentages, their efficacy often diminishes outside controlled settings. The aim of technological progress in FER systems is to enhance human-to-human and human-to-environment interaction, paralleling human emotional intelligence. Integrating emotional intelligence within AI systems facilitates nuanced emotional input comprehension and proportional responses, thus fueling their adoption in healthcare, education, social IoT, and standalone applications such as driver assistance.
Practical FER applications typically share traits such as employing multiple databases, recognizing basic emotions, and enabling real-time functionality. However, substantial efforts in automatic emotion recognition have been directed towards general databases, focusing on standard emotions. Yet, current models, though advancing, remain imperfect and necessitate ongoing research to ensure responsible usage. The valence-arousal emotion model, assessing emotions on a scale of pleasantness and physiological intensity, has also gained traction in a limited number of studies.
FER systems: Challenges, architectures, and applications
The current study examines FER systems using neural networks, analyzing challenges, architectures, and applications. Existing reviews miss certain network types and innovations. Advancements such as patient emotion monitoring through neural networks are noted. CNN-based techniques and deep learning models (CNNs, GANs, GNNs, and RNNs) are explored. While CNNs dominate, GNNs and RNNs show potential.
Although FER's practical uses extend across fields, with a shift to multimodal accuracy, database diversity, and cultural context challenges persist. Neural networks aim for natural AI interaction, but ethical and cultural concerns require regulation.
Conclusion
In summary, researchers explore recent trends in FER using neural networks for image analysis. It examines current datasets, deep learning models, and research in the field. While AI lacks advanced empathy and contextual understanding of human feelings, integrating emotional intelligence into solutions is key to success.
To optimize real-time applications, researchers are exploring new techniques and overcoming training challenges. The development of real-time multimodal emotion recognition systems is predicted to capture researchers' interest. Despite progress, technical limitations persist in FER systems. Continuous adjustments in the field's technology have the potential to revolutionize emotion science by accurately tracking people's movements in context.