Researchers Boost Large Language Model Factual Accuracy With Novel Integrative Decoding Approach

Download PDF Copy

By Dr Silpaja Chandrasekar, PhDReviewed by Joel ScanlonOct 9 2024

A groundbreaking method called integrative decoding elevates the factual accuracy of AI-generated content by combining multiple sampled responses into a cohesive, reliable output—pushing the boundaries of language model performance on open-ended tasks.

Research: Integrative Decoding: Improve Factuality via Implicit Self-consistency. Image Credit: a-image / Shutterstock

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.

In an article recently submitted to the arXiv preprint* server, researchers presented a novel approach called integrative decoding (ID) to enhance the factual accuracy of large language models.

They highlighted that self-consistency-based approaches typically faced strict constraints on task formats, limiting their applicability. The ID method overcomes these limitations by constructing a set of inputs, each formed by concatenating a previously sampled response with the original prompt and processing them concurrently to select the next token through aggregated predictions.

The extensive evaluation demonstrated substantial factuality improvements, including a +11.2% increase in TruthfulQA, +15.4% in biographies, and +8.5% in LongFact benchmarks, indicating ID's scalability with repeated sampling.

Related Work

Past work highlighted that while large language models (LLMs) excel in various tasks, they often produce plausible yet factually incorrect statements, known as hallucinations.

Researchers have identified issues such as inaccurate self-assessment and overconfidence in responses. Previous self-consistency methods imposed strict task constraints, limiting their effectiveness in open-ended generation tasks. Different studies have proposed methods for hallucination detection and improving factuality, with self-consistency-driven approaches showing significant promise.

Recent efforts have adopted self-consistency methods for open-ended tasks while exploring alternative decoding strategies, such as contrastive and integrative decoding, to enhance factual accuracy in language models.

Enhancing Factual Consistency

With no need of retrieving external knowledge and additional training, integrative decoding consistently improves the factuality performance over six series of large language models, with substantial improvements on the TruthfulQA, Biographies, and LongFact datasets (see Table 2 for detailed evaluation results).

This study demonstrated that self-consistency among responses generated by LLMs can effectively indicate factuality and assist in detecting hallucinations. The factuality score was quantified by averaging the consistency of each statement with multiple sampled responses, following a formal mathematical objective that combines factuality and coherence.

The researchers proposed a decoding objective that combines factuality and coherence, weighted by a constant (λ), driving the generation of new outputs. By sampling various responses and processing them concurrently, the integrative decoding method selected the next token based on the integrated predictions, enhancing factual accuracy and contextual coherence in the generated text.

To formalize this, the factuality score for a given statement can be calculated by averaging the probabilities that other responses to the same prompt support the statement.

These individual scores can also be used to derive the overall factuality score for the response, providing a comprehensive measure of its reliability.

The decoding objective builds on insights from self-consistency to generate a new output that closely aligns with multiple sampled responses while maintaining coherence. This is further represented mathematically in the study as a dual objective that maximizes both factuality and coherence simultaneously.

This objective can be represented mathematically, combining a measure of factuality and a standard coherence metric into a single function that drives the decoding process.

The proposed method involves sampling multiple responses and using them to create new inputs. These inputs are processed concurrently to select the next token based on the integrated predictions of all sampled responses. By using this integrative approach, ID enhances factual consistency while maintaining contextual relevance, making it applicable to a wide range of generation tasks.

Integrative Decoding Improves Accuracy

The experiments used various metrics to evaluate the performance of different LLMs on open-ended generation benchmarks, specifically truthful question answering (TruthfulQA), biographies, and LongFact.

TruthfulQA consists of 817 questions where human responses may be misleading; the evaluation employed generative pre-trained transformer 4 (GPT-4) to assess truthfulness and informativeness, with the product of these scores (T*I) as a key metric.

Biographies required generating bullet-point achievements for computer scientists, and factual accuracy was evaluated using Wikipedia references, with metrics including accuracy percentage and the number of correct statements.

LongFact involved generating detailed object descriptions exceeding a thousand tokens, where the evaluation assessed the truthfulness of split atomic facts, using LLaMA3.1-70B-Instruct for fact division and GPT-4 for assessment.

The methods compared included greedy decoding and contrasting layers alongside ensemble-based methods like universal self-consistency (USC) and self-reflection (SR).

The workflow of integrative decoding: (1) sample multiple responses from the LLM; (2) form a set of new inputs by concatenating a sampled response and the original prompt; they are concurrently processed for decoding, with the next token being selected by integrating their predicted logits at each inference step. This strategy essentially incorporates the overall consistency with all sampled responses in its decoding objective.

The results indicated that integrative decoding led to significant factuality improvements across all LLMs tested, with the most pronounced gains observed in LLaMA3 and Gemma2.

Integrative decoding maintained a strong balance between factuality and informativeness, especially in the LongFact benchmark, where it notably improved recall metrics without sacrificing accuracy.

The findings highlighted the challenges of enhancing factuality in long-form generation tasks. Existing baseline methods like document language (DoLa), USC, and SR demonstrated limited effectiveness, often resulting in performance degradation for certain LLMs.

In contrast, integrative decoding proved robust across various benchmarks. It showcased greater generality in long-form tasks, significantly outperforming other methods and establishing itself as a superior approach for enhancing factuality in open-ended generation tasks.

Conclusion

To sum up, the paper introduced ID, a decoding algorithm that incorporated self-consistency into its objective, achieving notable improvements in factuality across six series of LLMs on three open-ended generation benchmarks.

ID demonstrated potential for continuous enhancement with an increasing number of sampled responses, suggesting the possibility of "inference-time scaling laws" for improving LLM performance.

While ID faced increased computational costs during inference, it remained comparable with other self-consistency-based methods.

Future work aimed to enhance efficiency by integrating speculative decoding with ID, focusing on "difficult" decoding steps.

The implementation of ID involved making locally optimal decisions at each step, with the potential to explore more accurate approximations, such as beam search.

Journal reference:

Preliminary scientific report. Cheng, Y., Liang, X., Gong, Y., Xiao, W., Wang, S., Zhang, Y., Hou, W., Xu, K., Liu, W., Li, W., Jiao, J., Chen, Q., Cheng, P., & Xiong, W. (2024). Integrative Decoding: Improve Factuality via Implicit Self-consistency. ArXiv. https://arxiv.org/abs/2410.01556

Posted in: AI Research News

Comments (0)

Written by

Silpaja Chandrasekar

Dr. Silpaja Chandrasekar has a Ph.D. in Computer Science from Anna University, Chennai. Her research expertise lies in analyzing traffic parameters under challenging environmental conditions. Additionally, she has gained valuable exposure to diverse research areas, such as detection, tracking, classification, medical image analysis, cancer cell detection, chemistry, and Hamiltonian walks.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Chandrasekar, Silpaja. (2024, October 09). Researchers Boost Large Language Model Factual Accuracy With Novel Integrative Decoding Approach. AZoAi. Retrieved on April 26, 2025 from https://www.azoai.com/news/20241009/Researchers-Boost-Large-Language-Model-Factual-Accuracy-With-Novel-Integrative-Decoding-Approach.aspx.
MLA
Chandrasekar, Silpaja. "Researchers Boost Large Language Model Factual Accuracy With Novel Integrative Decoding Approach". AZoAi. 26 April 2025. <https://www.azoai.com/news/20241009/Researchers-Boost-Large-Language-Model-Factual-Accuracy-With-Novel-Integrative-Decoding-Approach.aspx>.
Chicago
Chandrasekar, Silpaja. "Researchers Boost Large Language Model Factual Accuracy With Novel Integrative Decoding Approach". AZoAi. https://www.azoai.com/news/20241009/Researchers-Boost-Large-Language-Model-Factual-Accuracy-With-Novel-Integrative-Decoding-Approach.aspx. (accessed April 26, 2025).
Harvard
Chandrasekar, Silpaja. 2024. Researchers Boost Large Language Model Factual Accuracy With Novel Integrative Decoding Approach. AZoAi, viewed 26 April 2025, https://www.azoai.com/news/20241009/Researchers-Boost-Large-Language-Model-Factual-Accuracy-With-Novel-Integrative-Decoding-Approach.aspx.