AI Chatbots Shine in Creativity, But Humans Still Hold Their Ground

In a paper published in the journal Scientific Reports, researchers compared human creativity to that of three artificial intelligence (AI) chatbots using the alternate uses task (AUT). The study involved 256 human participants to generate creative uses for everyday objects. Remarkably, on average, the AI chatbots consistently exceeded humans in generating creative responses. However, the best human ideas still equaled or surpassed those of the chatbots. This highlights the potential of AI to enhance creativity while emphasizing the unique complexity of human creativity that may be challenging for AI to fully replicate or exceed. The study offers insights into the evolving relationship between human and machine creativity in the context of AI's impact on creative work.

Study: AI Chatbots Shine in Creativity, But Humans Still Hold Their Ground. Image credit: khunkornStudio/Shutterstock
Study: AI Chatbots Shine in Creativity, But Humans Still Hold Their Ground. Image credit: khunkornStudio/Shutterstock

Background

The rise of generative AI tools has raised questions about their impact on society and human creativity. This includes concerns about employment, education, legal issues, and the nature of creativity itself. AI has shown promise in areas like chess and art by challenging conventional ideas about creativity.

Creativity is traditionally defined as the ability to generate original and useful ideas that are often evaluated through tasks measuring divergent thinking. Divergent thinking involves producing many ideas that are assessed by criteria like fluency, flexibility, originality, and elaboration. This study compares human and AI chatbot performance in a divergent thinking task to explore whether AI's vast memory and quick database access enhance originality. It highlights the importance of associative thinking and executive control in creativity.

Proposed Method

Participants: Data from human participants for the AUT were gathered through Prolific, in which 279 participants were included in the study after successfully passing attention checks. The average age of the participants was 30.4 years, and they hailed from the United Kingdom, United States, Canada, and Ireland. None of the participants reported any history of head injuries, current medication usage, or ongoing mental health issues. Ethical guidelines were followed with approval from the Ethics Committee for Human Sciences at the University of Turku.

AI Chatbots: Three AI chatbots, namely, ChatGPT3.5 (referred to as ChatGPT3), ChatGPT4, and Copy.Ai, were tested using the AUT. The chatbots underwent testing a total of 11 times, with each session involving four different object prompts, resulting in a total of 132 observations.

Procedure: The AUT comprised tasks involving four object probes: rope, box, pencil, and candle. Participants were instructed to prioritize quality over quantity and create original and creative usage for these objects. Each object was presented for 30 seconds, during which participants entered their ideas. AI chatbots were instructed to generate a specific number of ideas and limit their responses to 1-3 words to match human responses.

Scoring: The semantic distance between object names and responses was computed using five semantic models. Subjective creativity/originality ratings were collected from six human raters using a 5-point scale. Inter-rater reliability was high. Separate linear mixed-effect analyses compared human and AI performance by considering factors like group (human vs. AI), object, and fluency (number of responses). Post-hoc pairwise comparisons were adjusted for multiple comparisons.

Statistical Analyses: The analyses involved linear mixed-effect models considering fixed effects (Group, Object, and their interactions) and covariates like Fluency. Type III analysis of variance results were obtained, and post-hoc pairwise comparisons were adjusted for multiple comparisons with the Multivariate t-distribution (mvt) method.

Study Results

Descriptive statistics and correlations: The descriptive statistics are presented for humans and AI chatbots, averaged across all four object prompts. There is a moderate correlation between semantic distance and human subjective ratings in both mean and max scores.

Overall Human and AI Performance: AI outperformed humans in both semantic distance and subjective rating mean scores. The AI achieved higher mean scores, while fluency hurt mean scores and a positive effect on max scores. AI chatbots consistently provide more unusual and logical responses compared to some human responses.

Differentiating Performance Between AI Chatbots and Objects: ChatGPT3 and ChatGPT4 obtained higher mean semantic distance scores than humans, but there were no significant differences between the AI chatbots. No statistically significant differences between humans and AI chatbots in max scores existed. ChatGPT4 outperformed humans across most objects in human subjective ratings, while ChatGPT3 and Copy.AI performed similarly and better than humans. However, this superiority did not extend to responses to the pencil and candle.

Conclusion

To summarize, this study indicates that AI chatbots have achieved creative capabilities at least on par with the average human in the commonly used AUT test for divergent thinking. Although AI generally outperforms humans, the top-performing humans can still compete. It is important to note that AI technology is rapidly advancing, and results may change over time. The primary weakness in human performance lies in the higher prevalence of poor-quality ideas, which are absent in chatbot responses due to variations in human performance and motivational factors. This study focuses on divergent thinking within the AUT task to recognize that creativity is a multifaceted concept.

Journal reference:
Silpaja Chandrasekar

Written by

Silpaja Chandrasekar

Dr. Silpaja Chandrasekar has a Ph.D. in Computer Science from Anna University, Chennai. Her research expertise lies in analyzing traffic parameters under challenging environmental conditions. Additionally, she has gained valuable exposure to diverse research areas, such as detection, tracking, classification, medical image analysis, cancer cell detection, chemistry, and Hamiltonian walks.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Chandrasekar, Silpaja. (2023, September 17). AI Chatbots Shine in Creativity, But Humans Still Hold Their Ground. AZoAi. Retrieved on September 18, 2024 from https://www.azoai.com/news/20230917/AI-Chatbots-Shine-in-Creativity-But-Humans-Still-Hold-Their-Ground.aspx.

  • MLA

    Chandrasekar, Silpaja. "AI Chatbots Shine in Creativity, But Humans Still Hold Their Ground". AZoAi. 18 September 2024. <https://www.azoai.com/news/20230917/AI-Chatbots-Shine-in-Creativity-But-Humans-Still-Hold-Their-Ground.aspx>.

  • Chicago

    Chandrasekar, Silpaja. "AI Chatbots Shine in Creativity, But Humans Still Hold Their Ground". AZoAi. https://www.azoai.com/news/20230917/AI-Chatbots-Shine-in-Creativity-But-Humans-Still-Hold-Their-Ground.aspx. (accessed September 18, 2024).

  • Harvard

    Chandrasekar, Silpaja. 2023. AI Chatbots Shine in Creativity, But Humans Still Hold Their Ground. AZoAi, viewed 18 September 2024, https://www.azoai.com/news/20230917/AI-Chatbots-Shine-in-Creativity-But-Humans-Still-Hold-Their-Ground.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Redefining Educational Assessments in the Age of Generative AI