Factuality Fine-Tuning: Mitigating Hallucinations in Large Pre-trained Language Models

Download PDF Copy

By Dr Silpaja Chandrasekar, PhDReviewed by Susha Cheriyedath, M.Sc.Nov 17 2023

In an article recently submitted to the ArXiV* server, researchers addressed a crucial issue prevalent in large pre-trained language models (LLMs). While highly fluent and creative, these models often produce factually inaccurate information, referred to as 'hallucinations,' posing risks of spreading misinformation. To counter this, the researchers focused on fine-tuning LLMs for improved factual accuracy without relying on labor-intensive human labeling.

*Study: Factuality Fine-Tuning: Mitigating Hallucinations in Large Pre-trained Language Models. Image credit: Sippapas somboonkarn/Shutterstock*

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.

They capitalized on recent Natural Language Processing (NLP) advancements: one, evaluating text factuality by gauging consistency with external knowledge bases or model confidence scores, and two, employing the direct preference optimization (DPO) algorithm. This algorithm allowed them to fine-tune language models by ranking preferences for different responses rather than solely relying on supervised learning. Their approach, using automatically generated factuality preference rankings via retrieval systems or their novel retrieval-free method, notably enhanced the factuality of Llama-2 in diverse open-ended settings.

Background

Recent advancements in LLMs, especially those utilizing reinforcement learning from human feedback (RLHF), have created sophisticated dialogue agents. Despite being trained on vast datasets and fine-tuned for various tasks, these models often produce confidently inaccurate statements. Even high-performing models like Generative Pre-trained Transformer 3.5 (GPT-3.5) frequently generate false citations when asked about study authors.

Interestingly, these models display uncertainty in more straightforward question-answering scenarios, highlighting their recognition of knowledge limitations. An essential challenge is defining an objective that reliably captures factual accuracy. Traditional goals, such as maximum likelihood, may inadvertently promote spreading probability across multiple responses, including incorrect ones, leading to factual errors—particularly evident when handling unfamiliar queries.

Factuality Enhancement via Preference-Based Tuning

To enhance factuality directly, researchers adopt an RL framework centered on preferences over potential responses in the pursuit of fine-tuning language models. This approach requires understanding RL in the context of language models and the specific algorithm employed by DPO.

RL has successfully refined language models by extracting intricate behaviors from their pre-trained weights. In RL, a language model policy produces a distribution over responses given an input query. The objective is to maximize the average reward of outputs, typically determined by a reward function evaluating the desirability of input-output pairs. However, solely optimizing for reward can lead to overoptimization, where models exploit reward function nuances not aligned with intended behavior. Common RL objectives aim to maximize rewards while imposing a penalty that discourages substantial deviation from the pre-trained reference model.

DPO simplifies RL for language models by leveraging preference pairs rather than explicit reward functions. These pairs involve prompts and candidate responses, signifying a preference for one response. DPO operates by optimizing a classification loss directly on these preference pairs, facilitating stable learning from fixed preference datasets without the need for explicit reward function fitting or policy sampling loops during training.

Constructing preferences encouraging factuality in long-form text poses a challenge. Researchers propose two approaches: reference-based truthfulness estimation and reference-free confidence-based truthfulness estimation. The reference-based method evaluates text consistency with reliable external references, while the reference-free approach utilizes model confidence as a proxy for truthfulness. These methods assess the truthfulness of generated claims and prefer the response that exhibits higher truthfulness.

The reference-based approach uses measures like FactScore, which extracts atomic claims from text and assesses their alignment with reference texts like Wikipedia. However, this method relies on access to high-quality reference texts, limiting its applicability in domains needing clear ground truth documents. On the other hand, the reference-free method leverages the well-calibrated nature of language models, using model confidence in generated answers to estimate truthfulness without relying on external knowledge.

Researchers use these truthfulness estimators to generate preference datasets for factuality tuning from unlabeled prompts. They sample multiple candidate responses for each prompt, assess their truthfulness scores using the chosen estimator, and designate the response with higher truthfulness as preferred. This preference dataset is then utilized in the DPO pipeline to fine-tune the model, optimizing it for improved response factuality.

Results

The study conducts thorough experiments to understand the acquisition of factuality through preference-based reinforcement learning. The research delves into two fundamental models: FactTune with reference-based metric (FactScore) (FactTune-FS), fine-tuned with a reference-based metric, and FactTune-Model Confidence (FactTune-MC), refined using a confidence-based score, within the Llama-1-7b dataset. The experiments span two tasks: generating biographies and medical question-answering, each showcasing distinct datasets, entities, and responses per prompt. The assessment focuses on the models' factual accuracy, baselines, and evaluation metrics to ensure the validity of factuality enhancements across diverse domains.

The evaluation of factuality enhancements, mainly through FactTune-FS and FactTune-MC, reveals promising results across biographical and medical question-answering tasks. FactTune-FS consistently demonstrates a substantial increase in factual accuracy compared to RLHF models and decoding-based baselines, showing remarkable improvements while maintaining or marginally altering the volume of correct information generated. Similarly, FactTune-MC showcases significant error reduction and factual improvement in RLHF models without any external reference information, highlighting its effectiveness as a reference-free method for factuality enhancement.

Furthermore, the study delves beyond quantitative metrics, exploring qualitative changes in model generations post-factuality fine-tuning. Notably, samples refined by FactTune-FS and FactTune-MC tend to exhibit more objective, direct sentences, departing from the conversational style in the Supervised Fine-Tuning (SFT) model. Additionally, these distilled samples portray more superficial sentence structures and lack casual phrases, indicating a shift toward factual accuracy at the expense of conversational tone—a trade-off crucial for improving response actuality.

Conclusion

To summarize, this paper introduces a practical approach to refining language models' factual accuracy in generating long-form text. It explores two methods for truthfulness assessment, one leveraging external references and another innovatively using the model's uncertainty.

Both approaches notably reduce factual errors without heavy reliance on reference texts when applied in fine-tuning. The study suggests new benchmark tasks, explores ways to enhance RLHF-tuned dialogue models, and advocates for combining factuality tuning with existing methods. Scaling this strategy to larger models and datasets is a promising avenue for further mitigating factual errors.

Journal reference:

Preliminary scientific report. Tian, K., et al. (2023). Fine-tuning Language Models for Factuality. ArXiv. https://doi.org/10.48550/arXiv.2311.08401, https://arxiv.org/abs/2311.08401

Posted in: AI Research News

Comments (0)

Written by

Silpaja Chandrasekar

Dr. Silpaja Chandrasekar has a Ph.D. in Computer Science from Anna University, Chennai. Her research expertise lies in analyzing traffic parameters under challenging environmental conditions. Additionally, she has gained valuable exposure to diverse research areas, such as detection, tracking, classification, medical image analysis, cancer cell detection, chemistry, and Hamiltonian walks.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Chandrasekar, Silpaja. (2023, November 17). Factuality Fine-Tuning: Mitigating Hallucinations in Large Pre-trained Language Models. AZoAi. Retrieved on July 02, 2025 from https://www.azoai.com/news/20231117/Factuality-Fine-Tuning-Mitigating-Hallucinations-in-Large-Pre-trained-Language-Models.aspx.
MLA
Chandrasekar, Silpaja. "Factuality Fine-Tuning: Mitigating Hallucinations in Large Pre-trained Language Models". AZoAi. 02 July 2025. <https://www.azoai.com/news/20231117/Factuality-Fine-Tuning-Mitigating-Hallucinations-in-Large-Pre-trained-Language-Models.aspx>.
Chicago
Chandrasekar, Silpaja. "Factuality Fine-Tuning: Mitigating Hallucinations in Large Pre-trained Language Models". AZoAi. https://www.azoai.com/news/20231117/Factuality-Fine-Tuning-Mitigating-Hallucinations-in-Large-Pre-trained-Language-Models.aspx. (accessed July 02, 2025).
Harvard
Chandrasekar, Silpaja. 2023. Factuality Fine-Tuning: Mitigating Hallucinations in Large Pre-trained Language Models. AZoAi, viewed 02 July 2025, https://www.azoai.com/news/20231117/Factuality-Fine-Tuning-Mitigating-Hallucinations-in-Large-Pre-trained-Language-Models.aspx.