In a recent submission to the arXiv* server, researchers comprehensively examined the detection of large language models (LLMs)-generated misinformation.
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.
Background
The emergence of LLMs, including models such as Chat Generative Pre-Trained Transformer (ChatGPT) and Meta’s language model (Llama), has marked a significant milestone in computational social science (CSS). While LLMs have opened doors to extensive studies of human language and behavior, concerns about their potential misuse for disinformation have arisen. As these models advance in generating highly convincing human-like content, the risk of their exploitation for the creation of misleading information on a large scale becomes evident. Recent research has highlighted this concern, acknowledging the cost-effectiveness and effectiveness of AI-generated disinformation.
Existing models for disinformation detection
In text generation, there has been a transition from small language models (SLMs) to LLMs with billions of parameters, resulting in significant advancements. Models such as the Language Model for Dialogue Application (LaMDA), Bloom, Pathways Language Model (PaLM), and the generative pre-trained transformer (GPT) family have demonstrated the ability to produce human-level responses. However, the format of input prompts can influence performance, and advanced prompt engineering techniques are crucial to guide LLMs toward more accurate and higher-quality responses.
Before the rise of LLMs, disinformation detection was primarily centered around SLMs such as bidirectional encoder representations from transformers (BERT), GPT-2, and text-to-text transfer transformers (T5). Deep learning has played a pivotal role in detecting disinformation, with models such as the hybrid deep model for fake news (CSI) and FakeBERT employing neural networks to identify textual features indicative of disinformation. The introduction of LLMs, with their vast parameters, has significantly complicated disinformation detection, given their ability to produce natural, human-like text. This shift raises critical questions about the effectiveness of existing disinformation detection methods designed around SLMs.
The current study aims to detect LLM-generated misinformation by addressing three research questions (RQs):
- RQ1: Are current disinformation detection techniques suitable for LLM-generated disinformation?
- RQ2: If not, can LLMs themselves be adapted for detection?
- RQ3: If both approaches fall short, what alternative solutions can be explored?
Dataset for disinformation detection
Researchers created a human-written fake news dataset known as Dhuman. It is a benchmark dataset for disinformation detection, comprising 21,417 real news articles from Reuters and 23,525 fake news articles from unreliable sources flagged by fact-checking websites.
From Dhuman, three LLM-generated fake news datasets are constructed using different zero-shot prompt techniques: Dgpt std, Dgpt mix, and Dgpt cot. The dataset Dgpt std involves minimal modifications to human-written disinformation, maintaining its original content while enhancing its tone and vocabulary. Dgpt mix combines true and fake news to create more complex disinformation, and Dgpt cot employs chain-of-thought (CoT) prompts to guide ChatGPT in generating disinformation that mimics human cognitive processes. These new datasets are introduced as valuable resources for future research in LLM-generated disinformation detection.
To validate the generated disinformation, a comparative analysis between samples from Dhuman and Dgpt std is conducted. Linguistic and semantic similarities are examined using Linguistic Inquiry and Word Count (LIWC) and t-SNE. The linguistic analysis shows that LLM-generated disinformation exhibits increased prosocial language, political and ethical themes, and logical coherence while reducing emotional language, profanities, and colloquialisms. Semantically, human-written and ChatGPT-generated disinformation have significant overlap, indicating similar semantic meanings.
Experiments and results
Evaluated the disinformation detection on the collected datasets, focusing on assessing the performance of detection models in distinguishing human-written and LLM-generated disinformation.
Existing Technique (RQ1): The study reveals that current state-of-the-art disinformation detection models struggle when faced with advanced disinformation. A variant of the BERT model (RoBERTa) is employed for detection, initially tested on human-written disinformation from Dhuman and later challenged with LLM-generated disinformation. The performance of Dhuman exhibits excellence, with a minimal misclassification rate. However, when tested on LLM-generated datasets (Dgpt std, Dgpt mix, and Dgpt cot), the model faces challenges. Dgpt mix and Dgpt cot, which involve more complex disinformation generation, exhibit high misclassification rates. Additionally, the model showcases political bias in its classification, particularly misclassifying center-leaning disinformation as true news.
LLMs (RQ2): Researchers demonstrated that LLMs struggle to identify self-generated disinformation effectively. They conducted experiments using ChatGPT to assess the proficiency of LLMs in identifying disinformation generated by LLMs. ChatGPT's responses vary in length and complexity when prompted to detect misleading information. The study finds that GPT-4 performs slightly better than GPT-3.5 in identifying LLM-generated disinformation, and prompting ChatGPT to provide detailed explanations improves its performance. However, ChatGPT's performance is generally inferior to the fine-tuned RoBERTa model in detecting disinformation.
Proposed Solution (RQ3): Researchers introduced a novel approach to detect LLM-generated disinformation, focusing on complex disinformation blending genuine and misleading content. A structured CoT prompt is designed to guide ChatGPT step by step in analyzing and fact-checking key content elements. An ablation study assesses the impact of contextual elements on detection performance.
Results indicate that GPT-4 consistently performs better than GPT-3.5 across various configurations of the CoT prompts. The study highlights the significance of contextual elements such as events and time for disinformation detection. Overall, advanced prompts paired with LLMs show promise for effectively countering LLM-generated disinformation.
Conclusion
In summary, researchers examined the detection of LLM-generated disinformation using ChatGPT to create three distinct datasets. The research shows that existing techniques, including LLMs, struggle to consistently identify this disinformation. To address this challenge, advanced prompts are introduced, significantly improving detection.
Future research may explore other LLM-generated disinformation types, such as false connections and manipulated content, while advanced prompting methods, such as CoT-self-consistency, offer promising avenues for further improvement.
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.