A new AI method from PNNL enables researchers to determine when to trust model predictions and when to retrain, making faster, smarter materials discovery possible with confidence.
All weights from the MACE-MP-0 interaction head were frozen during training, and only weights from the readout layers were updated.
It's obvious when a dog has been poorly trained. It doesn't respond appropriately to commands. It pushes boundaries and behaves unpredictably.
The same is true with a poorly trained artificial intelligence (AI) model. Only with AI is it not always easy to identify what went wrong with the training.
Research scientists worldwide are collaborating with various AI models that have been trained on experimental and theoretical data. The goal is to predict a material's properties before incurring the time and expense of creating and testing it. They are using AI to design better medicines and industrial chemicals in a fraction of the time it takes for experimental trial and error.
But how can they trust the answers that AI models provide? It's not just an academic question. Millions of investment dollars can ride on whether AI model predictions are reliable.
Now, a research team from the Department of Energy's Pacific Northwest National Laboratory has developed a method to assess the effectiveness of a class of AI models known as neural network potentials in determining how well they have been trained. Furthermore, it can identify when a prediction falls outside the boundaries of its training and where it needs more training to improve—a process called active learning.
The research team, led by PNNL data scientists Jenna Bilbrey Pope and Sutanay Choudhury, describes how the new uncertainty quantification method works in a research article published in NPJ Computational Materials.
The team is also making the method publicly available on GitHub as part of its larger repository, called Scalable Neural Network Atomic Potentials (SNAP), to anyone who wishes to apply it to their own work.
"We noticed that some uncertainty models tend to be overconfident, even when the actual error in prediction is high," said Bilbrey Pope. "This is common for most deep neural networks. But a model trained with SNAP gives a metric that mitigates this overconfidence. Ideally, you'd want to look at both prediction uncertainty and training data uncertainty to assess your overall model performance."
Instilling trust in AI model training to speed discovery
Research scientists aim to leverage the speed of AI predictions; however, currently, there is a tradeoff between speed and accuracy. An AI model can indeed make predictions in seconds that might take a supercomputer 12 hours to compute using traditional computationally intensive methods. But chemists and materials scientists still see AI as a black box.
The PNNL data science team's uncertainty measurement provides a way to understand how much they should trust an AI prediction.
"AI should be able to accurately detect its knowledge boundaries," said Choudhury. "We want our AI models to come with a confidence guarantee. We want to be able to make statements such as 'This prediction provides 85% confidence that catalyst A is better than catalyst B, based on your requirements.'"
In their published study, the researchers chose to benchmark their uncertainty method with one of the most advanced foundation models for atomistic materials chemistry, called MACE. The researchers calculated how well the model is trained to calculate the energy of specific families of materials. These calculations are essential to understanding how well the AI model can approximate the more time- and energy-intensive methods that run on supercomputers. The results indicate the types of simulations for which it can be confidently stated that the answers are accurate.
This kind of trust and confidence in predictions is crucial for realizing the potential of incorporating AI workflows into everyday laboratory work and creating autonomous laboratories where AI becomes a trusted lab assistant, the researchers added.
"We have worked to make it possible to 'wrap' any neural network potentials for chemistry into our framework," said Choudhury. "Then in a SNAP, they suddenly have the power of being uncertainty aware."
Now, if only puppies could be trained in a snap.
In addition to Bilbrey and Choudhury, PNNL data scientists Jesun S. Firoz and Mal-Soon Lee contributed to the study. This work was supported by the "Transferring Exascale Computational Chemistry to Cloud Computing Environment and Emerging Hardware Technologies" (TEC4) project, which is funded by the U.S. Department of Energy's Office of Science, Office of Basic Energy Sciences.
Source:
Journal reference:
- Bilbrey, J. A., Firoz, J. S., Lee, M., & Choudhury, S. (2025). Uncertainty quantification for neural network potential foundation models. Npj Computational Materials, 11(1), 1-8. DOI:10.1038/s41524-025-01572-y, https://www.nature.com/articles/s41524-025-01572-y