In an article published in the journal PLOS One, researchers examined Chat generative pre-trained transformer (GPT) 3.5's ability to produce humor compared to humans. In two studies, human participants rated the funniness of jokes and satirical headlines generated by ChatGPT 3.5 and humans. The findings indicated that ChatGPT 3.5's humor was rated as equally funny or funnier than that produced by humans, regardless of the comedic task or the expertise of the human comedy writers.
Study: Is ChatGPT 3.5 Funnier than Humans? Image Credit: This image was created with the assistance of DALL·E 3
Background
The ability of large language models (LLMs) like OpenAI’s ChatGPT to generate humor is an intriguing and underexplored area. While humor necessitates a delicate balance of surprise and benignity, LLMs lack emotional perception, raising questions about their capacity for humor production. Existing research highlights ChatGPT’s wide-ranging competencies but also notes its tendency to present false information as fact. This issue is less critical in comedy, where accuracy is secondary to entertainment value.
Past studies have offered mixed and anecdotal evidence on ChatGPT's humor capabilities, lacking comprehensive, comparative evaluations. This paper addressed these gaps by systematically comparing the quality of jokes produced by ChatGPT 3.5 to those created by humans. Employing standardized comedic tasks and assessing humor through laypeople's evaluations, the study aimed to provide empirical insights into how LLM-generated humor stacked up against human creativity, potentially informing the entertainment industry and our understanding of artificial creativity.
ChatGPT 3.5 and Laypeople in Humor Production
In this study, ChatGPT 3.5's humor production abilities were compared to those of laypeople using three diverse tasks: acronym completion, fill-in-the-blank, and roast jokes. Participants from Amazon Mechanical Turk (MTurk) were recruited via CloudResearch.com. 123 initially participated, though 18 were excluded for using external sources, resulting in a final sample of 105. Each participant generated humorous responses to nine prompts across the three tasks, yielding 945 human-produced jokes.
ChatGPT 3.5 was given the same tasks, producing 20 humorous responses per prompt, resulting in 180 artificial intelligence (AI)-generated jokes. The study then recruited 200 additional MTurk workers to rate the funniness of these responses. Each rater evaluated 54 jokes, 27 human-produced and 27 AI-produced, on a 7-point Likert scale, ensuring unbiased assessments by not disclosing the source of each joke.
The researchers aimed to empirically compare the quality of humor produced by ChatGPT 3.5 and humans. The tasks and rating procedures were pre-registered and approved by the University of Southern California Institutional Review Board, with all data and materials available on the Open Science Framework (OSF). The authors sought to provide systematic insights into ChatGPT 3.5's humor production capabilities compared to human creativity.
Comparative Analysis
AI-generated responses were rated funnier than human responses, with significant differences across all tasks. ChatGPT outperformed 73% of humans in the acronym task, 63% in fill-in-the-blank, and 87% in roast jokes. Additionally, 69.5% of participants preferred AI-generated humor. Variance analysis showed less agreement on AI-generated roast jokes, indicating mixed reactions.
Demographic factors did not significantly influence preferences, though right-leaning participants produced slightly less funny jokes. Despite lacking emotions, ChatGPT excelled in humor production, especially in aggressive roast jokes, challenging expectations of AI limitations in generating potentially offensive content.
ChatGPT 3.5 and The Onion in Satirical Humor
In this study, the authors compared ChatGPT 3.5's ability to produce satirical news headlines with those of professional comedy writers from The Onion, focusing on the local news section to ensure timeless and comparable topics.
A total of 217 students from the University of Southern California participated in the study. Participants rated the funniness of 10 headlines, five from The Onion and five from ChatGPT, on a seven-point scale, without knowing the source of each headline to prevent bias. ChatGPT was prompted to generate 20 new headlines in the style of The Onion’s ‘Local’ section.
The study was pre-registered, and materials, including the collected data and pre-registration details, were available on the Open Science Framework. The ethics of the study were approved by the University of Southern California Institutional Review Board. This study aimed to benchmark ChatGPT’s humor against professional standards within the comedic industry, providing insights into the AI's capability to produce satirical content.
Comparative Analysis
Participants found no significant difference in funniness when comparing ChatGPT 3.5’s satirical headlines to those by professional writers at The Onion. The top four headlines included two from each source, with ChatGPT producing the highest-rated one. Variances in funniness ratings were statistically insignificant.
Participants who sought out comedy and read satirical news rated headlines as funnier overall, regardless of the source. While 48.8% preferred The Onion’s headlines, 36.9% favored ChatGPT’s, and 14.3% showed no preference. No evidence indicated that ChatGPT reproduced existing headlines. This study highlighted that ChatGPT’s humor is comparable to professional standards, suggesting significant economic implications for comedy writing. Future research should explore the use of LLMs in other comedy formats like script writing and meme generation.
Conclusion
In conclusion, the researchers found that ChatGPT 3.5 produced humor that was as funny or funnier than jokes from laypeople and professional comedy writers. This challenged the notion that emotional perception is necessary for humor creation. The findings highlighted ChatGPT's potential in the comedy industry and suggested future research on AI's understanding of humor and its practical applications in personal and professional contexts.
Article Revisions
- Jul 11 2024 - Featured image replaced with an image that was created with the assistance of DALL·E 3