In an article published in the journal Scientific Data, researchers from China proposed a novel facial dataset called Shenzhen University Emotion and Age Dataset (SZU-EmoDage) with faces of Chinese individuals with different ages, expressions, and dynamic emotions. The dataset is synthesized using a deep learning technique called StyleGAN and is validated by human participants, who rated the emotional categories, dimensions, and authenticity of the faces.
Background
Facial stimuli are widely used in psychological research to explore cognitive and emotional processing in healthy individuals and individuals with disorders. However, existing facial datasets have some limitations related to the authenticity of facial expressions, the diversity of facial ages, and the availability of dynamic facial expressions. Moreover, most facial datasets are based on Caucasian faces, which may introduce cross-cultural differences in emotion recognition. Therefore, there is a need for a more natural and standard facial dataset that can capture the variations of facial expressions and ages.
About the Research
This paper uses a generative adversarial network (GAN)-based deep learning technique called the StyleGAN model for generating the facial image dataset of Chinese people. StyleGAN is a state-of-the-art model that can manipulate facial features and generate high-quality realistic and diverse images of faces. The study introduced the concept of facial action units (AUs) into StyleGAN to achieve expression editing. AUs are the contraction or relaxation of one or more muscles of the face, and facial expressions can be decomposed into a combination of multiple AUs.
The researchers used the following techniques to generate a facial dataset:
StyleGAN-based AU Editing: They used a StyleGAN encoder to extract the latent vector of a neutral face image and an AU encoder to extract the AU vector of a target expression image. Then, they fused the style features of the neutral face with the expression features of the target face and obtained a new latent vector that can generate a face image with the desired expression.
Interpolation of Latent Vector: Linear interpolation between the latent vectors of two different expressions or ages was performed to obtain intermediate latent vectors that can generate faces with intermediate expressions or ages.
Style-based Age Transform (SAM) model: This model was used to generate faces of different ages ranging from 10 to 70 years old, by mapping a neutral face image to a latent vector and then transforming it to another latent vector that can generate a face image with the desired age.
GAN Prior Embedded Network (GPEN) model: GPEN was used to increase the resolution and quality of the facial images.
First AI-based Facial Dataset
The facial dataset developed in this study contains facial images of 120 individuals (60 women and 60 men) with six basic emotions (happiness, anger, fear, sadness, disgust, and surprise), various ages (10, 20, 30, 40, 50, 60, and 70 years old), and dynamic emotions. The dataset has 840 static emotional faces, 840 aging faces, and 720 dynamic emotional faces. It is the first-of-its-kind facial dataset synthesized using artificial intelligence (AI) technologies for face perception study and surpasses the existing face datasets in terms of the diversity of faces and the authenticity of expressions across various age groups.
The authors show that their dataset creation techniques can generate high-quality and natural facial expressions, with minimal artifacts and distortions. This method can create diverse and realistic faces of different ages, with smooth and gradual changes.
The facial dataset is evaluated by comparing the expression categories of the faces with the categories labeled by the hired human participants. The matching rates were higher than 70% for most emotions, except for anger and fear, and the reliability and efficacy of the dataset were validated by conducting a comparative analysis with other facial expression editing methods and existing Chinese facial datasets.
The SZU-EmoDage dataset and the proposed method have various applications. It is a valuable resource for exploring the cognitive and emotional processing of faces, especially in cross-cultural, dynamic, and age-related aspects. It can also be used as a tool for detecting individual differences and mental disorders via facial expression recognition.
Furthermore, the dataset can improve facial perception technology, such as face recognition, face editing, and face animation. The method can also be extended to generate new facial datasets related to social attributes, such as facial attractiveness, trustworthiness, and dominance.
Limitations and Conclusion
The authors acknowledge the following limitations in their dataset:
Lack of Diversity: The dataset only has Chinese faces and excludes other ethnicities, races, or cultures. This may restrict the cross-cultural applicability and comparability of the dataset.
Basic Emotion Model: The dataset assumes six basic emotions that are universally recognized. However, this model has been challenged by alternative theories proposing more complex and context-dependent emotional expressions.
In summary, the SZU-EmoDage dataset is the first facial dataset synthesized using AI techniques for face perception study. The dataset contains the faces of 120 Chinese individuals with different expressions, ages, and dynamic emotions. It was synthesized using a StyleGAN-based editing model, which can manipulate facial features and generate realistic and diverse faces. Moreover, the dataset was validated by human participants, who rated the emotional categories, dimensions, and authenticity of the faces.