In an article published in the journal Nature, researchers addressed equity concerns in autoregressive language models used for human-artificial intelligence (AI) communication. They introduced a novel framework based on deliberative democracy and science communication studies. The authors analyzed 20,000 dialogues involving 3,290 participants with diverse backgrounds and opinions on climate change and the Black Lives Matter (BLM) movement.
Background
Intelligent assistants, including conversational AI systems like Alexa and Siri, have become integral to daily life, aiding in information retrieval, decision-making, and education. As these systems play an increasingly crucial role in various sectors, questions arise about their design and impact on user experiences, diversity, equity, and inclusion (DEI). Previous research has delved into AI fairness and performance disparities, particularly in recognizing speech from different demographic groups. Yet, the assessment of conversational AI systems, particularly regarding user experiences, learning outcomes, and dialogue styles, has been a relatively underexplored domain.
This paper addressed these gaps by introducing a comprehensive framework, drawing on deliberative democracy and science communication studies, to assess equity in conversational AI. The study centered around the examination of OpenAI's generative pre-trained transformer (GPT)-3, a significant large language model (LLM), and implemented an algorithmic audit to address three specific research queries. It investigated how user experiences and learning outcomes varied among different social groups when engaging with GPT-3 on crucial science and social issues such as climate change and the BLM movement.
Furthermore, researchers explored how GPT-3 conversed with diverse social groups on these issues and examined the correlation between conversational differences and user experiences. By operationalizing this theoretical framework, the study pioneered an audit of GPT-3's conversations with various social groups, aiming to provide empirical evidence and insights into the equity of conversational AI systems in addressing crucial societal topics.
Methods
The researchers employed an algorithm auditing approach to investigate the equity of human-AI dialogues using OpenAI's GPT-3, an LLM, as the conversational AI system. Data collection spanned from December 2021 to February 2022, involving 3,290 participants engaged in dialogues on crucial topics like climate change and the BLM movement.
The research methodology involved three main phases: an initial pre-dialogue survey, interactive dialogues, and a subsequent post-dialogue survey. The pre-dialogue survey collected participant demographics and attitudes toward the assigned topic. Participants were then directed to an interface for organic dialogues with GPT-3, generating 26,211 rounds of conversations. The post-dialogue survey assessed user experiences, covering aspects such as chatbot ratings, satisfaction, learning experiences, and intentions to continue or recommend the interaction.
The analysis employed ordinary least squares (OLS) regressions to explore the connections among demographics, attitudes, and user experiences. Quantile regressions provided a robust check on OLS results due to the bimodal distribution of user experience data. To comprehend GPT-3's conversational attributes, the researchers utilized advanced techniques such as structural topic modeling (STM) and linguistic analyses employing linguistic inquiry and word count (LIWC) software. This strategic approach enabled a comprehensive examination of GPT-3's dialogue features, unveiling intricate patterns and insights into its linguistic behavior during interactions. Stance detection analyses were also conducted to gain insights into developing more supportive attitudes among participants.
The study aimed to explore three key research inquiries: variations in user experience across diverse social groups; how GPT-3 engaged with different social groups; and the connections between conversational features and user experiences. The results aimed to contribute valuable insights into the equity of conversational AI systems, particularly in addressing crucial societal topics.
Results
The study revealed significant disparities in user experiences with GPT-3 based on demographic attributes, particularly for opinion and education minority groups. In conversations about climate change and BLM, opinion minorities and individuals with lower education levels reported worse experiences, lower satisfaction, and decreased intention to continue chatting. However, despite the negative experiences, both minority groups exhibited positive attitudinal changes toward the discussed issues post-chat.
In climate change discussions, GPT-3 provided more scientific justifications to education minorities, while in BLM conversations, it tended to give preference-based responses to opinion and education minorities. Moreover, GPT-3 used fewer positive sentiments when conversing with these minority groups, potentially influenced by intrinsic biases and extrinsic factors introduced by user inputs.
Analyzing the relationships between GPT-3's conversational styles and user experiences revealed a positive connection. The presence of positive emotions and a higher word count in GPT-3's responses correlated positively with improved user experiences, increased satisfaction, and enhanced learning outcomes. Conversely, the use of negative words was associated with poorer learning experiences and decreased willingness to continue or recommend the interaction. These results emphasize the significance of customizing conversational AI responses to improve user engagement and overall experience.
Discussion
The authors proposed an analytical framework to evaluate equity in conversational AI, emphasizing the need to assess how LLMs respond to diverse opinions, extending beyond traditional demographic factors. Examining GPT-3's dialogues on climate change and BLM, the researchers uncovered a trade-off between dissatisfactory user experiences and positive attitudinal changes, suggesting a potential dilemma in AI designs.
Participants from opinion and education minority groups reported worse experiences but exhibited the largest attitudinal changes post-chat. The authors urged AI designers to balance discomfort in user experience with positive educational outcomes. Additionally, findings emphasized the importance of considering nuanced response styles for different social groups, contributing to the ongoing discourse on equity in conversational AI and urging further exploration as new models emerge.
Conclusion
In conclusion, researchers introduced a novel framework rooted in deliberative democracy and science communication to assess equity in conversational AI, using GPT-3. They identified disparities in user experiences among opinion and education minority groups, highlighting a trade-off with positive attitudinal changes. The authors emphasized the need for AI designers to balance user satisfaction and educational impact. GPT-3's differential responses to diverse social groups underscored the importance of nuanced conversational styles. The findings contributed to ongoing discussions on equity in conversational AI, urging further exploration and consideration of diverse perspectives in system design.
Journal reference:
- Chen, K., Shao, A., Burapacheep, J., & Li, Y. (2024). Conversational AI and equity through assessing GPT-3’s communication with diverse social groups on contentious topics. Scientific Reports, 14(1), 1561. https://doi.org/10.1038/s41598-024-51969-w, https://www.nature.com/articles/s41598-024-51969-w