Turning AI's complex outputs into simple, understandable insights, EXPLINGO empowers users to trust predictions and make informed decisions.
Sample NARRATOR inputs and outputs. The items in blue make up the prompt passed to the NARRATOR to transform ML explanations into narratives for a house-pricing example. The item in green is a narrative the NARRATOR LLM may generate based on this prompt. Some components, like the output instructions, are provided by DSPy.
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.
Machine-learning models can make mistakes and be challenging to use, so scientists have developed explanation methods to help users understand when and how they should trust a model's predictions.
However, these explanations are often complex, perhaps containing information about hundreds of model features. They are sometimes presented as multifaceted visualizations that can be difficult for users without machine-learning expertise to comprehend fully.
To help people understand AI explanations, MIT researchers used large language models (LLMs) to transform plot-based explanations into plain language. The researchers designed their system to build upon existing explanation methods, such as SHAP, ensuring theoretical grounding while minimizing inaccuracies.
They developed a two-part system that converts a machine-learning explanation into a paragraph of human-readable text and then automatically evaluates the quality of the narrative. Hence, an end-user knows whether to trust it.
By prompting the system with a few example explanations, the researchers can customize its narrative descriptions to meet users' preferences or the requirements of specific applications. This ability to adapt narratives based on user-provided examples is a cornerstone of their approach, enabling EXPLINGO to be tailored to diverse use cases.
In the long run, the researchers hope to build upon this technique by enabling users to ask model follow-up questions about how it made predictions in real-world settings. Such advancements could empower users to critically assess model predictions in high-stakes scenarios.
"Our goal with this research was to take the first step toward allowing users to have full-blown conversations with machine-learning models about the reasons they made certain predictions, so they can make better decisions about whether to listen to the model," says Alexandra Zytek, an electrical engineering and computer science (EECS) graduate student and lead author of a paper on this technique.
She is joined on the paper by Sara Pido, an MIT postdoc; Sarah Alnegheimish, an EECS graduate student; Laure Berti-Équille, a research director at the French National Research Institute for Sustainable Development; and senior author Kalyan Veeramachaneni, a principal research scientist in the Laboratory for Information and Decision Systems. The research will be presented at the IEEE Big Data Conference.
Elucidating explanations
The researchers focused on a popular type of machine-learning explanation called SHAP. In a SHAP explanation, a value is assigned to every feature the model uses to make a prediction. For instance, if a model predicts house prices, one feature might be the house's location. The location would be assigned a positive or negative value representing how much that feature modified the model's overall prediction.
SHAP explanations are often presented as bar plots showing which features are most or least important. However, that bar plot quickly becomes unwieldy for a model with more than 100 features.
"As researchers, we have to make a lot of choices about what we are going to present visually. If we choose to show only the top 10, people might wonder what happened to another feature that isn't in the plot. Using natural language unburdens us from having to make those choices," Veeramachaneni says.
However, rather than utilizing a large language model to generate an explanation in natural language, the researchers use the LLM to transform an existing SHAP explanation into a readable narrative. This ensures that the theoretical integrity of the original explanation is preserved while benefiting from LLMs’ natural language capabilities.
Zytek explains that having the LLM handle only the natural language part of the process limits the opportunity to introduce inaccuracies into the explanation.
Their system, called EXPLINGO, is divided into two pieces that work together.
The first component, called NARRATOR, uses an LLM to create narrative descriptions of SHAP explanations that meet user preferences. The LLM will mimic that style when generating text by initially feeding NARRATOR three to five written examples of narrative explanations.
"Rather than having the user try to define what type of explanation they are looking for, it is easier to just have them write what they want to see," says Zytek.
This allows NARRATOR to be easily customized for new use cases by showing it a different set of manually written examples.
After NARRATOR creates a plain-language explanation, the second component, GRADER, uses an LLM to rate the narrative on four metrics: conciseness, accuracy, completeness, and fluency. GRADER automatically prompts the LLM with the text from NARRATOR and the SHAP explanation it describes. This automated grading serves as a safeguard to filter out low-quality narratives, particularly in high-stakes settings.
"We find that, even when an LLM makes a mistake doing a task, it often won't make a mistake when checking or validating that task," she says.
Users can also customize GRADER to give different weights to each metric. For instance, applications in safety-critical industries might prioritize accuracy and completeness over fluency.
"You could imagine, in a high-stakes case, weighting accuracy and completeness much higher than fluency, for example," she adds.
Analyzing narratives
One of the biggest challenges for Zytek and her colleagues was adjusting the LLM, so it generated natural-sounding narratives. The more guidelines they added to the control style, the more likely the LLM would introduce errors into the explanation.
"A lot of prompt tuning went into finding and fixing each mistake one at a time," she says.
To test their system, the researchers took nine machine-learning datasets with explanations and had different users write narratives for each dataset. These datasets ranged from housing prices to mushroom toxicity, showcasing EXPLINGO’s versatility. This allowed them to evaluate NARRATOR's ability to mimic unique styles. They used GRADER to score each narrative explanation on all four metrics.
Ultimately, the researchers found that their system could generate high-quality narrative explanations and effectively mimic different writing styles.
Their results show that providing a few manually written example explanations significantly improves the narrative style. However, those examples must be written carefully -including comparative words, like "larger," can cause GRADER to mark accurate explanations as incorrect.
Building on these results, the researchers want to explore techniques that could help their system better handle comparative words. Additionally, they aim to provide richer contextual information in narratives, such as statistical benchmarks or rationalizations, to enhance their usefulness.
In the long run, they hope to use this work as a stepping stone toward an interactive system where the user can ask a model follow-up questions about an explanation.
"That would help with decision-making in a lot of ways. Suppose people disagree with a model's prediction. In that case, we want them to be able to quickly figure out if their intuition is correct, or if the model's intuition is correct, and where that difference is coming from," Zytek says.
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.
Source:
Journal reference:
- Preliminary scientific report.
Zytek, A., Pido, S., Alnegheimish, S., & Veeramachaneni, K. (2024). Explingo: Explaining AI Predictions using Large Language Models. ArXiv. https://arxiv.org/abs/2412.05145