Generative Chatbots Amplify False Memories in Witness Interviews, Posing New Ethical Risks

Generative chatbots not only create and reinforce false memories but also increase users' confidence in these distorted recollections, raising critical concerns about their potential misuse in law enforcement and the need for immediate ethical guidelines in AI applications.

Study: Conversational AI Powered by Large Language Models Amplifies False Memories in Witness Interviews

Study: Conversational AI Powered by Large Language Models Amplifies False Memories in Witness Interviews

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

In an article recently posted to the arXiv preprint* server, researchers at the Massachusetts Institute of Technology and the University of California, Irvine, have found that participants interacting with a generative chatbot were significantly more prone to false memories during simulated crime witness interviews, inducing approximately three times as many false memories as the control group. Even after one week, the number of false memories remained constant, but participants' confidence stayed significantly higher.

The study highlighted the ethical risks of using advanced artificial intelligence (AI) in sensitive contexts like police interviews.

Background

Past work on false memories has demonstrated that human recollections are reconstructive and susceptible to external influences, such as suggestive questioning, with significant implications for legal settings.

Studies by Loftus and others have shown how question-wording and misinformation can distort memory, while neuroimaging has revealed that true and false memories activate similar brain regions.

More recently, research has raised concerns about AI's potential to induce false memories, especially as AI systems, including large language models, are increasingly integrated into daily life and human interactions. However, the specific impact of AI-driven dialogue systems on memory formation remains an emerging area of study, warranting further exploration.

Two-phase Memory Experiment

The study followed a two-phase experimental procedure. Participants watched a two-and-a-half-minute silent video of a robbery in the first phase. They then rated their emotional state using a self-assessment manikin (SAM) scale and completed a filler activity (a brief Pac-Man game).

Participants were randomly assigned to one of four conditions: control, survey-based, pre-scripted chatbot, or generative chatbot. The experimental conditions involved misleading questions aimed at inducing false memories.

After interacting with the experimental condition, participants assessed their cognitive load using the NASA task load index (NASA TLX) and then took a memory test with questions about the video. The session lasted between 30-45 minutes.

The second phase occurred one week later, with participants completing an online survey, recalling the video, and answering the same follow-up questions. This phase allowed researchers to assess not only the persistence of false memories but also any changes in the participants' confidence levels. This session lasted 10-20 minutes, and responses were compared to those from the first phase to assess memory retention and confidence changes.

Technically, the pre-scripted and generative chatbots were implemented using a web interface powered by generative pre-trained transformer 4 (GPT-4). The generative chatbot provided feedback based on participants' answers, reinforcing or correcting responses using specific prompts.

These prompts aimed to simulate a more authentic and dynamic interaction, occasionally introducing additional details to influence the user's recall of the video.

Participants were recruited through Prolific, pre-screened for fluency in English, and balanced by gender. Of the 200 participants recruited, six did not complete the second phase, and 39 others were excluded due to failed attention checks. Statistical analyses, including Kruskal-Wallis and Wilcoxon tests, were used to evaluate false memories and confidence scores.

Generative Chatbot Misinformation

The study's results indicated that short-term interactions with generative chatbots significantly increased the occurrence of false memories and elevated users' confidence in these false memories compared to other methods. The generative chatbot had the most substantial misinformation effect, misleading 36.4% of participants.

The survey-based intervention also caused false memories in 21.6% of participants. Users who were less familiar with chatbots but more knowledgeable about AI and interested in crime investigations were particularly prone to false memories.

A one-way Kruskal–Wallis's test revealed that the generative chatbot produced significantly more immediate false memories than the control, survey-based, and pre-scripted chatbot interventions.

While all interventions led to more false memories than the control, the generative chatbot induced almost three times as many false memories as the control group. However, no significant differences were found between the survey-based and pre-scripted chatbot conditions.

Regarding confidence in false memories, the intervention conditions, including the generative chatbot, significantly increased participants' confidence compared to the control.

However, confidence in true memories did not differ significantly between conditions. Notably, false memories induced by the generative chatbot remained stable after one week, unlike the control and survey-based conditions, which showed an increase in false memories over time.

Further analysis revealed that familiarity with AI technology and interest in crime investigations were key moderating factors in false memory formation. Participants unfamiliar with chatbots but experienced with AI were likelier to develop false memories. In contrast, variables like age, gender, and cognitive workload did not show a significant impact, as indicated by the study’s mixed-effects regression model.

Conclusion

The study provided empirical evidence of AI's influence, especially generative chatbots, on false memory formation. As AI systems become more sophisticated and widely used, it is crucial to consider their impact on cognitive processes.

The findings underscored the need for caution and the development of ethical guidelines for AI applications in sensitive contexts. This research highlights the importance of balancing AI technology's benefits with preserving human memory and decision-making integrity. Further research is necessary to address these concerns.

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

Journal reference:
  • Preliminary scientific report. Chan, S., et al. (2024). Conversational AI Powered by Large Language Models Amplifies False Memories in Witness Interviews. ArXiv.org, DOI: 10.48550/arXiv.2408.04681,  https://arxiv.org/abs/2408.04681
Silpaja Chandrasekar

Written by

Silpaja Chandrasekar

Dr. Silpaja Chandrasekar has a Ph.D. in Computer Science from Anna University, Chennai. Her research expertise lies in analyzing traffic parameters under challenging environmental conditions. Additionally, she has gained valuable exposure to diverse research areas, such as detection, tracking, classification, medical image analysis, cancer cell detection, chemistry, and Hamiltonian walks.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Chandrasekar, Silpaja. (2024, September 08). Generative Chatbots Amplify False Memories in Witness Interviews, Posing New Ethical Risks. AZoAi. Retrieved on September 17, 2024 from https://www.azoai.com/news/20240908/Generative-Chatbots-Amplify-False-Memories-in-Witness-Interviews-Posing-New-Ethical-Risks.aspx.

  • MLA

    Chandrasekar, Silpaja. "Generative Chatbots Amplify False Memories in Witness Interviews, Posing New Ethical Risks". AZoAi. 17 September 2024. <https://www.azoai.com/news/20240908/Generative-Chatbots-Amplify-False-Memories-in-Witness-Interviews-Posing-New-Ethical-Risks.aspx>.

  • Chicago

    Chandrasekar, Silpaja. "Generative Chatbots Amplify False Memories in Witness Interviews, Posing New Ethical Risks". AZoAi. https://www.azoai.com/news/20240908/Generative-Chatbots-Amplify-False-Memories-in-Witness-Interviews-Posing-New-Ethical-Risks.aspx. (accessed September 17, 2024).

  • Harvard

    Chandrasekar, Silpaja. 2024. Generative Chatbots Amplify False Memories in Witness Interviews, Posing New Ethical Risks. AZoAi, viewed 17 September 2024, https://www.azoai.com/news/20240908/Generative-Chatbots-Amplify-False-Memories-in-Witness-Interviews-Posing-New-Ethical-Risks.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.