In a surprising study, AI-generated therapy responses were rated as more effective and empathic than those from human therapists—raising big questions about the future of mental health care.
Research: When ELIZA meets therapists: A Turing test for the heart and mind. Image Credit: Roman Samborskyi / Shutterstock
When it comes to comparing responses written by psychotherapists to those written by ChatGPT, the latter are generally rated higher, according to a study published on February 12, 2025, in the open-access journal PLOS Mental Health by S. Gabe Hatch, H. Dorian Hatch, and colleagues from multiple institutions, including The Ohio State University and Hatch Data and Mental Health.
Given some of the benefits of working with generative artificial intelligence (AI), the question of whether machines could be therapists has received increased attention. Although previous research has found that humans can struggle to tell the difference between machine and human responses, recent findings suggest that AI can write empathically. The generated content is rated highly by both mental health professionals and voluntary service users, and it is often favored over content written by professionals.
In their new study involving over 830 participants, Hatch and colleagues showed that, although differences in language patterns were noticed, individuals could rarely identify whether responses were written by ChatGPT or by therapists when presented with 18 couples' therapy vignettes. This finding echoes Alan Turing's prediction that humans could not tell the difference between a machine's and a human's responses. However, the study also revealed an attribution bias—responses believed to be from therapists were rated higher, even when they were actually generated by ChatGPT. In addition, the responses written by ChatGPT were generally rated higher across the five "common factors" of psychotherapy, which include therapeutic alliance, empathy, expectations, cultural competence, and therapist effects.
Further analysis revealed that the responses generated by ChatGPT were generally longer than those written by the therapists. After controlling for length, ChatGPT continued to respond with more nouns and adjectives than therapists. Considering that nouns can be used to describe people, places, and things, and adjectives can be used to provide more context, this could mean that ChatGPT provides greater contextualization than the therapists. This heightened contextualization, rather than length alone, may have led respondents to rate the ChatGPT responses higher on the common factors of therapy (components that are common to all modalities of therapy in order to achieve desired results).
According to the authors, these results may be an early indication that ChatGPT has the potential to improve psychotherapeutic processes. In particular, this work may lead to the development of different methods of testing and creating psychotherapeutic interventions. However, the study also highlights important ethical concerns, including the need for professional oversight when integrating AI into mental health care. Given the mounting evidence suggesting that generative AI can be helpful in therapeutic settings and the likelihood that it might be integrated into therapeutic settings sooner rather than later, the authors call for mental health experts to expand their technical literacy in order to ensure that AI models are being carefully trained and supervised by responsible professionals, thus improving both the quality and accessibility of care while mitigating potential risks.
The authors add: "Since the invention of ELIZA nearly sixty years ago, researchers have debated whether AI could play the role of a therapist. Although many important lingering questions remain, our findings indicate the answer may be 'Yes.' We hope our work galvanizes both the public and mental health practitioners to critically assess not only the feasibility but also the ethics and long-term implications of integrating AI into mental health treatment."