Large Language Models demonstrate creative abilities comparable to humans, excelling in problem-solving and divergent thinking, while showing potential for collaborative innovation across industries.
Research: Large Language Models show both individual and collective creativity comparable to humans. Image Credit: Shutterstock AI
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.
In a paper recently posted on the arXiv preprint* server, researchers comprehensively investigated the creative potential of large language models (LLMs) and how they compare to human creativity. Their goal was to assess both the individual creativity of LLMs and their collective creativity when multiple outputs are generated for a single task. This research is significant because it highlights the potential impact of LLMs in various creative fields, providing insights into their future applications in the workforce.
Advancement in Artificial Intelligence Techniques
LLMs like generative pre-trained transformer 3.5 (GPT-3.5) by OpenAI and Claude by Anthropic AI represent a significant advancement in artificial intelligence (AI) technology. These models use deep learning transformer architectures to generate human-like text. Trained on large datasets containing diverse linguistic patterns, LLMs excel in a variety of language tasks, from generating text to solving complex problems. Their increasing use in industries such as content creation, customer service, and education shows their potential to automate routine tasks and support creative functions traditionally performed by humans. However, the study found that LLMs still lag in producing diverse responses compared to humans, a factor crucial for certain creative fields.
While creativity has traditionally been viewed as a uniquely human quality, advancements in AI have led scientists and academia to explore how LLMs can participate in creative processes. This study explored how LLMs can generate new ideas, solve problems, and create narratives, highlighting their potential roles in creative tasks. The authors also noted trade-offs between novelty and usefulness in LLM-generated ideas, which underscores the complexity of evaluating creativity.
Investigating AI’s Capabilities in Various Creative Tasks
In this paper, the authors evaluated the creativity of several LLMs, including GPT-3.5, GPT-4, Claude, Qwen, and SparkDesk, under controlled conditions and compared their performance to humans. They assessed creativity in three domains: problem-solving, divergent thinking, and creative writing. Thirteen tasks were designed to measure novelty, usefulness, originality, and flexibility. A group of 467 human participants completed these tasks as part of their master’s degree admission process, while responses from the LLMs were collected using OpenAI’s API and other platforms to ensure fairness.
The study used the consensual assessment technique to validate the results, where human judges rated the responses according to predefined criteria. It also explored collective creativity by instructing LLMs to generate multiple responses for each task, mimicking real-world scenarios where users often seek diverse AI-generated ideas. Additionally, the researchers minimized bias by designing the tasks to avoid favoring specific prompts. While GPT-4 and Claude ranked in the 52nd percentile, the overall average for all LLMs was lower, at the 46th percentile, indicating a varied performance across models.
Key Findings of Benchmarking LLMs' Responses Against Humans
The outcomes showed that, on average, LLMs ranked in the 46th percentile compared to human participants across all creative tasks. However, Claude and GPT-4 performed notably better, ranking in the 52nd percentile, indicating a commendable level of creativity. GPT-3.5, on the other hand, performed poorly, ranking in the 37th percentile. The authors indicated that while LLMs excelled in problem-solving and divergent thinking, they struggled with creative writing, typically falling below the 50th percentile in this area.
In terms of collective creativity, the study found that when LLMs were asked to generate multiple responses, their combined output was equivalent to that of a group of 8 to 10 humans. The authors noted that every two additional responses generated by an LLM equated to the contribution of one extra human in brainstorming scenarios. This suggests that LLMs can act as effective collaborators in creative processes, particularly in environments where brainstorming and idea generation are crucial.
Additionally, the researchers highlighted performance variability across tasks. For example, LLMs produced an average of 8.85 valid ideas in divergent thinking tasks, significantly surpassing the human average of 3.68 ideas. While LLMs produced more ideas, the novelty and usefulness of these ideas varied, reflecting both the potential and limitations of LLMs in creative applications. However, their outputs showed less diversity compared to humans, which may limit their effectiveness in domains requiring a wide range of perspectives.
Applications Across Industries
This research has significant implications for industries that depend on creativity and innovation. LLMs can be valuable for content creators, marketers, and educators, helping them generate ideas and solutions more efficiently. As technology continues to improve, it has the potential to revolutionize team collaboration by assisting in brainstorming sessions, generating creative content, and solving complex problems. For example, in advertising, LLMs could help develop campaigns or create slogans that resonate with target audiences. Nevertheless, the limited diversity in LLM outputs highlights the need for human oversight to ensure a broad range of ideas is considered.
The authors also suggest that LLMs can enhance human creativity in collaborative settings. LLMs can make brainstorming sessions more dynamic and productive by offering diverse ideas and perspectives. This capability is especially relevant in corporate environments where innovation is key to staying competitive.
Conclusion and Future Directions
In conclusion, this study provides valuable insights into LLMs' creative abilities and their potential to complement human creativity across various domains. While LLMs demonstrate impressive capabilities in generating ideas and solving problems, they still rely on human judgment to evaluate the novelty and usefulness of their outputs. The study also highlights the importance of balancing AI-generated creativity with human input, particularly in areas requiring diverse perspectives. Maintaining a balance between AI-generated creativity and human oversight is crucial, ensuring that both can be effectively leveraged in the future of work.
The outcomes emphasize the importance of continuing AI-human collaboration, paving the way for advancements in creativity and productivity. As LLMs continue to advance, further research is needed to explore the best ways to integrate AI into creative workflows. Understanding how to optimize LLMs in collaborative environments will be essential for maximizing their potential while preserving the unique contributions of human creativity.
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.
Journal reference:
- Preliminary scientific report.
Sun, L., Yuan, Y., Yao, Y., Li, Y., Zhang, H., Xie, X., Wang, X., Luo, F., & Stillwell, D. (2024). Large Language Models show both individual and collective creativity comparable to humans. ArXiv. https://arxiv.org/abs/2412.03151