AI Chatbots Shine in Creativity, But Humans Still Hold Their Ground

Download PDF Copy

By Dr Silpaja Chandrasekar, PhDReviewed by Susha Cheriyedath, M.Sc.Sep 17 2023

In a paper published in the journal Scientific Reports, researchers compared human creativity to that of three artificial intelligence (AI) chatbots using the alternate uses task (AUT). The study involved 256 human participants to generate creative uses for everyday objects. Remarkably, on average, the AI chatbots consistently exceeded humans in generating creative responses. However, the best human ideas still equaled or surpassed those of the chatbots. This highlights the potential of AI to enhance creativity while emphasizing the unique complexity of human creativity that may be challenging for AI to fully replicate or exceed. The study offers insights into the evolving relationship between human and machine creativity in the context of AI's impact on creative work.

*Study: AI Chatbots Shine in Creativity, But Humans Still Hold Their Ground. Image credit: khunkornStudio/Shutterstock*

Background

The rise of generative AI tools has raised questions about their impact on society and human creativity. This includes concerns about employment, education, legal issues, and the nature of creativity itself. AI has shown promise in areas like chess and art by challenging conventional ideas about creativity.

Creativity is traditionally defined as the ability to generate original and useful ideas that are often evaluated through tasks measuring divergent thinking. Divergent thinking involves producing many ideas that are assessed by criteria like fluency, flexibility, originality, and elaboration. This study compares human and AI chatbot performance in a divergent thinking task to explore whether AI's vast memory and quick database access enhance originality. It highlights the importance of associative thinking and executive control in creativity.

Proposed Method

Participants: Data from human participants for the AUT were gathered through Prolific, in which 279 participants were included in the study after successfully passing attention checks. The average age of the participants was 30.4 years, and they hailed from the United Kingdom, United States, Canada, and Ireland. None of the participants reported any history of head injuries, current medication usage, or ongoing mental health issues. Ethical guidelines were followed with approval from the Ethics Committee for Human Sciences at the University of Turku.

AI Chatbots: Three AI chatbots, namely, ChatGPT3.5 (referred to as ChatGPT3), ChatGPT4, and Copy.Ai, were tested using the AUT. The chatbots underwent testing a total of 11 times, with each session involving four different object prompts, resulting in a total of 132 observations.

Procedure: The AUT comprised tasks involving four object probes: rope, box, pencil, and candle. Participants were instructed to prioritize quality over quantity and create original and creative usage for these objects. Each object was presented for 30 seconds, during which participants entered their ideas. AI chatbots were instructed to generate a specific number of ideas and limit their responses to 1-3 words to match human responses.

Scoring: The semantic distance between object names and responses was computed using five semantic models. Subjective creativity/originality ratings were collected from six human raters using a 5-point scale. Inter-rater reliability was high. Separate linear mixed-effect analyses compared human and AI performance by considering factors like group (human vs. AI), object, and fluency (number of responses). Post-hoc pairwise comparisons were adjusted for multiple comparisons.

Statistical Analyses: The analyses involved linear mixed-effect models considering fixed effects (Group, Object, and their interactions) and covariates like Fluency. Type III analysis of variance results were obtained, and post-hoc pairwise comparisons were adjusted for multiple comparisons with the Multivariate t-distribution (mvt) method.

Study Results

Descriptive statistics and correlations: The descriptive statistics are presented for humans and AI chatbots, averaged across all four object prompts. There is a moderate correlation between semantic distance and human subjective ratings in both mean and max scores.

Overall Human and AI Performance: AI outperformed humans in both semantic distance and subjective rating mean scores. The AI achieved higher mean scores, while fluency hurt mean scores and a positive effect on max scores. AI chatbots consistently provide more unusual and logical responses compared to some human responses.

Differentiating Performance Between AI Chatbots and Objects: ChatGPT3 and ChatGPT4 obtained higher mean semantic distance scores than humans, but there were no significant differences between the AI chatbots. No statistically significant differences between humans and AI chatbots in max scores existed. ChatGPT4 outperformed humans across most objects in human subjective ratings, while ChatGPT3 and Copy.AI performed similarly and better than humans. However, this superiority did not extend to responses to the pencil and candle.

Conclusion

To summarize, this study indicates that AI chatbots have achieved creative capabilities at least on par with the average human in the commonly used AUT test for divergent thinking. Although AI generally outperforms humans, the top-performing humans can still compete. It is important to note that AI technology is rapidly advancing, and results may change over time. The primary weakness in human performance lies in the higher prevalence of poor-quality ideas, which are absent in chatbot responses due to variations in human performance and motivational factors. This study focuses on divergent thinking within the AUT task to recognize that creativity is a multifaceted concept.

Journal reference:

Koivisto, M., & Grassini, S. (2023). Best humans still outperform artificial intelligence in a creative divergent thinking task. Scientific Reports, 13:1, 13601. https://doi.org/10.1038/s41598-023-40858-3, https://www.nature.com/articles/s41598-023-40858-3

Posted in: AI Research News

Comments (0)

Written by

Silpaja Chandrasekar

Dr. Silpaja Chandrasekar has a Ph.D. in Computer Science from Anna University, Chennai. Her research expertise lies in analyzing traffic parameters under challenging environmental conditions. Additionally, she has gained valuable exposure to diverse research areas, such as detection, tracking, classification, medical image analysis, cancer cell detection, chemistry, and Hamiltonian walks.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Chandrasekar, Silpaja. (2023, September 17). AI Chatbots Shine in Creativity, But Humans Still Hold Their Ground. AZoAi. Retrieved on July 12, 2025 from https://www.azoai.com/news/20230917/AI-Chatbots-Shine-in-Creativity-But-Humans-Still-Hold-Their-Ground.aspx.
MLA
Chandrasekar, Silpaja. "AI Chatbots Shine in Creativity, But Humans Still Hold Their Ground". AZoAi. 12 July 2025. <https://www.azoai.com/news/20230917/AI-Chatbots-Shine-in-Creativity-But-Humans-Still-Hold-Their-Ground.aspx>.
Chicago
Chandrasekar, Silpaja. "AI Chatbots Shine in Creativity, But Humans Still Hold Their Ground". AZoAi. https://www.azoai.com/news/20230917/AI-Chatbots-Shine-in-Creativity-But-Humans-Still-Hold-Their-Ground.aspx. (accessed July 12, 2025).
Harvard
Chandrasekar, Silpaja. 2023. AI Chatbots Shine in Creativity, But Humans Still Hold Their Ground. AZoAi, viewed 12 July 2025, https://www.azoai.com/news/20230917/AI-Chatbots-Shine-in-Creativity-But-Humans-Still-Hold-Their-Ground.aspx.