Using Semantic Entropy to Detect Hallucinations

In a paper published in the journal Nature, researchers discussed how large language model (LLM) systems, like chat generative pre-trained transformer (ChatGPT) or Gemini, often generated false and unsubstantiated answers, posing risks in fields like law, journalism, and medicine. Despite efforts to improve truthfulness, these systems remained unreliable.

Study: Using Semantic Entropy to Detect Hallucinations. Image Credit: Ball SivaPhoto/Shutterstock
Study: Using Semantic Entropy to Detect Hallucinations. Image Credit: Ball SivaPhoto/Shutterstock

The analysts proposed an entropy-based uncertainty estimator to detect confabulations and arbitrary and incorrect outputs by assessing meaning rather than specific word sequences. This method worked across datasets and tasks without prior data, helping users recognize when extra caution was needed with LLMs, enhancing their reliability and potential applications.

Related Work

Past work has highlighted that ‘hallucinations’ are a critical problem for natural language generation systems using LLMs like ChatGPT and Gemini, making users unable to trust outputs. Hallucinations, defined as nonsensical or unfaithful content to the provided source, include various failures of faithfulness and factuality. The disadvantages of hallucinations in LLMs include eroding user trust, which hinders their adoption in fields like law, journalism, and medicine, and the potential harm caused by inaccurate or fabricated information leading to serious consequences.

Semantic Entropy Overview

Semantic entropy builds on probabilistic tools for uncertainty estimation and can be applied directly to any LLM or similar foundation model without requiring architectural modifications. The 'discrete' variant of semantic uncertainty can be used even when the predicted probabilities for the generations are unavailable, for example, because access to the model's internals is limited. Uncertainty in machine learning aims to detect confabulations in LLMs, using the principle that the model will be uncertain about generations for which its output will be arbitrary.

One measure of uncertainty is the predictive entropy of the output distribution, which measures the information one has about the output given the input. A low predictive entropy indicates a heavily concentrated output distribution, whereas a high one suggests that many possible outputs are similarly likely. The analysis does not distinguish between aleatoric and epistemic uncertainty. Joint probabilities of sequences of tokens produced by generative LLMs need to be considered for computing entropies, and length normalization is used when comparing the log probabilities of generated sequences.

Detecting confabulations relies more on the LLM's uncertainty about meanings rather than specific word choices. Semantic uncertainty assesses the model's ambiguity regarding the intended meaning of its outputs, employing methods like clustering sequences based on bidirectional entailment and estimating entropy from shared-meaning probabilities.

Once the team identifies the classes of generated sequences that convey equivalent meanings, they estimate the likelihood that a sequence generated by the LLM belongs to a specific class by summing the probabilities of all potential sequences of tokens that could express that same meaning. This estimation ensures that not every possible meaning class is accessible and instead samples from the sequence-generating distribution induced by the model. For scenarios where sequence probabilities are unavailable, 'discrete' semantic entropy approximates the proportion of sampled answers belonging to each cluster, assuming that each output was equally probable and converging to the same estimator as sampling increases.

Semantic entropy is designed to detect confabulations and can improve model accuracy by refusing to answer questions when semantic uncertainty is high. Applying semantic entropy to various datasets captures the opportunities created by LLMs to produce free-form sentences as answers. This method's evaluation uses metrics like area under the receiver operating characteristic curve (AUROC), rejection accuracy, and area under rejection accuracy curve (AURAC), with correctness determined automatically using GPT-4 for sentence-length generations.

Detecting Confabulations Effectively

Semantic entropy and its discrete variant demonstrate superior performance over existing baselines in detecting confabulations during sentence-length and paragraph-length generations. Results across multiple datasets illustrate how semantic entropy outperforms naive entropy and supervised embedding regression methods in identifying errors indicative of confabulations.

Based on the AUROC and AURAC metrics, the evaluations reveal consistent improvements across different model sizes and datasets, highlighting semantic entropy's effectiveness in discerning when model outputs are likely incorrect due to semantic inconsistencies. Applied to biographical data generated by GPT-4, the discrete variant of semantic entropy continues to excel, demonstrating higher AUROC and AURAC scores compared to self-check and adapted P(True) baselines, particularly in rejecting answers prone to confabulations while maintaining answer accuracy.

Conclusion

To sum up, the approach effectively leveraged entropy-based uncertainty estimators to detect confabulations in LLMs like ChatGPT1 and Gemini2. The method enhanced reliability across diverse datasets and tasks by focusing on semantic meaning rather than specific word sequences. This capability addressed the challenge of hallucinatory outputs and opened new avenues for leveraging language models with greater confidence and applicability in real-world settings.

Journal reference:
Silpaja Chandrasekar

Written by

Silpaja Chandrasekar

Dr. Silpaja Chandrasekar has a Ph.D. in Computer Science from Anna University, Chennai. Her research expertise lies in analyzing traffic parameters under challenging environmental conditions. Additionally, she has gained valuable exposure to diverse research areas, such as detection, tracking, classification, medical image analysis, cancer cell detection, chemistry, and Hamiltonian walks.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Chandrasekar, Silpaja. (2024, June 27). Using Semantic Entropy to Detect Hallucinations. AZoAi. Retrieved on December 11, 2024 from https://www.azoai.com/news/20240627/Using-Semantic-Entropy-to-Detect-Hallucinations.aspx.

  • MLA

    Chandrasekar, Silpaja. "Using Semantic Entropy to Detect Hallucinations". AZoAi. 11 December 2024. <https://www.azoai.com/news/20240627/Using-Semantic-Entropy-to-Detect-Hallucinations.aspx>.

  • Chicago

    Chandrasekar, Silpaja. "Using Semantic Entropy to Detect Hallucinations". AZoAi. https://www.azoai.com/news/20240627/Using-Semantic-Entropy-to-Detect-Hallucinations.aspx. (accessed December 11, 2024).

  • Harvard

    Chandrasekar, Silpaja. 2024. Using Semantic Entropy to Detect Hallucinations. AZoAi, viewed 11 December 2024, https://www.azoai.com/news/20240627/Using-Semantic-Entropy-to-Detect-Hallucinations.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Microsoft's SAGEval Framework Aligns AI Text Quality with Human Judgment