In a paper published in the journal Proceedings of the National Academy of Sciences of the United States of America, researchers detailed how chat generative pre-trained transformer 4 (ChatGPT-4) chatbots exhibited human-like behavioral and personality traits in Turing test scenarios and classic behavioral games.
They adapted their behavior based on experience and context, often displaying altruistic and cooperative tendencies. These findings suggested that chatbots acted as if they were maximizing their own and their partner's payoffs, distinguishing them from typical human behaviors.
Related Work
Past work has delved into the emulation of human behavior by artificial intelligence (AI) through the lens of the Turing test. Traditional assessments have focused on AI's ability to mimic human responses in tasks like essay writing or factual question answering. However, recent advancements have shifted towards evaluating AI's behavioral tendencies and personality traits through interactive games and psychological surveys.
Additionally, previous research has highlighted challenges in accurately assessing AI behavior, particularly in determining how AI can replicate nuanced human emotions and social cues. Furthermore, the ethical implications of AI's ability to emulate human behavior, such as privacy, manipulation, and bias, have also been the subject of concern in past studies.
AI Behavior Assessment Study
The study designed interactive sessions to assess the behavioral tendencies of AI chatbots. It employed classic behavioral economics games and survey questions akin to those given to human participants. Researchers aimed to compare chatbots' behavior with humans and determine which payoff function best predicts the chatbots' actions.
This evaluation focused on the ChatGPT chatbot developed by OpenAI, specifically examining two versions: the application programming interface (API) version known as GPT-3.5-Turbo (referred to as ChatGPT-3) and the API version based on GPT-4 (denoted as ChatGPT-4). Additionally, researchers included the subscription-based web version (Plus) and the freely available web version (Free) for comparative analysis.
Researchers utilized a public Big Five test response database and the mobile lab (MobLab) classroom economics experiment platform to gather human subject data, encompassing a diverse pool of 108,314 subjects from more than 50 countries. These datasets, spanning multiple years and primarily comprising college and high school students, provided valuable insights into human behavior across various demographics.
Researchers administered the openness, conscientiousness, extraversion, agreeableness, and neuroticism (OCEAN) Big Five questionnaire to each chatbot for personality profiling. Subsequently, they presented each chatbot with six interactive games to illuminate different behavioral traits, including altruism, fairness, trust, reciprocity, risk aversion, cooperation, and strategic reasoning.
Each chatbot participated in individual sessions, answering survey questions and playing various roles in each game 30 times. Researchers simulated their behaviors in each game role because they could not provide monetary compensation to the chatbots. The supplementary materials offer detailed procedures for collecting chatbots' responses.
Personality Profiles and Behaviors
The study delves into the personality profiles and behavioral tendencies of AI chatbots, focusing on ChatGPT-3 and ChatGPT-4. It compares the chatbots' Big Five personality traits with human subjects, revealing similarities and differences across various dimensions. ChatGPT-4 demonstrates substantial alignment with human respondents, while ChatGPT-3 exhibits slightly lower openness scores. The study provides further details on the personality profiles of both chatbots, offering insights into how their distinct personalities relate to human behavior.
The research employs a formal Turing test methodology to assess the chatbots' behavior in classic behavioral economics games. Results indicate that ChatGPT-4 is often perceived as more human-like or tied with human responses, while ChatGPT-3 is less likely to be identified as human-like. Detailed analyses across individual games shed light on specific performance differences between the chatbots, highlighting areas where each excels or falls short compared to human players.
Comparative analyses delve into various dimensions of behavior, including altruism, fairness, trust, cooperation, and risk aversion. Both chatbots demonstrate tendencies toward altruistic behavior, fairness, and collaboration, with ChatGPT-4 generally exhibiting higher levels of altruism and cooperation than ChatGPT-3. Differences in risk preferences are also observed, with ChatGPT-4 displaying a consistent and neutral risk preference, while ChatGPT-3 tends towards risk aversion, particularly in unexpected contexts.
The study explores how framing and context influence AI behavior, similar to their impact on human decision-making. Prompting chatbots to explain their decisions or assume specific roles can significantly alter their behavior in strategic settings. Moreover, the research investigates how experience in different game roles influences the chatbots' decisions, akin to learning from past experiences. These findings provide valuable insights into the adaptability and responsiveness of AI chatbots in various scenarios.
The analyses extend beyond individual game performances, considering broader implications for understanding AI decision-making and predicting behavior in new settings. The study offers a systematic approach to inferring preferences and rationalizing AI actions by estimating utility functions that best predict chatbots' behaviors. The research contributes to a deeper understanding of AI behavior and its implications for human-computer interaction and decision-making contexts.
Conclusion
In conclusion, the study comprehensively examined AI chatbots' personality profiles and behavioral tendencies, notably ChatGPT-3 and ChatGPT-4. Through detailed analyses and formal Turing tests, researchers assessed the chatbots' abilities to mimic human behavior in classic behavioral economics games. Both chatbots demonstrated distinct tendencies in dimensions such as altruism, fairness, trust, cooperation, and risk aversion, with ChatGPT-4 generally displaying higher levels of altruism and cooperation.
The research also investigated how framing, context, and experience influenced AI behavior, offering valuable insights into their adaptability and responsiveness in diverse scenarios. These findings contribute significantly to understanding AI behavior and its implications for human-computer interaction and decision-making contexts.