Los Alamos AI Breakthrough Neutralizes Adversarial Attacks and Restores Trust in Neural Networks

Download PDF Copy

Los Alamos National LaboratoryMar 11 2025

Los Alamos scientists unveil LoRID, a cutting-edge AI defense that wipes out adversarial threats without compromising data integrity—setting a new gold standard for secure and trustworthy neural networks.

Research: LoRID: Low-Rank Iterative Diffusion for Adversarial Purification. Image Credit: Shutterstock AI

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.

Neural networks, a type of artificial intelligence modeled on the connectivity of the human brain, are driving critical breakthroughs across a wide range of scientific domains. However, these models face significant threats from adversarial attacks, which can derail predictions and produce incorrect information. Los Alamos National Laboratory researchers have pioneered a novel purification strategy that counteracts adversarial assaults and preserves neural networks' robust performance.

"Adversarial attacks to AI systems can take the form of tiny, near invisible tweaks to input images, subtle modifications that can steer the model toward the outcome an attacker wants," said Manish Bhattarai, Los Alamos computer scientist. "Such vulnerabilities allow malicious actors to flood digital channels with deceptive or harmful content under the guise of genuine outputs, posing a direct threat to trust and reliability in AI-driven technologies."

The Low-Rank Iterative Diffusion (LoRID) method removes adversarial interventions from input data

by harnessing the power of generative denoising diffusion processes in tandem with advanced tensor decomposition techniques. In a series of tests on benchmarking datasets, LoRID achieved unparalleled accuracy in neutralizing adversarial noise in attack scenarios, potentially advancing a more secure, reliable AI capability.

Defeating dangerous noise

Diffusion is a technique for training AI models by adding noise to data and then teaching the models to remove it. By learning to clean up the noise, the AI model effectively learns the underlying structure of the data, enabling it to generate realistic samples on its own. In diffusion-based purification, the model leverages its learned representation of "clean" data to identify and eliminate any adversarial interference introduced into the input.

Unfortunately, applying too many noise-purifying steps can strip away essential details from the data - imagine scrubbing a photo so aggressively that it loses clarity - while too few steps leave room for harmful perturbations to linger. The LoRID method navigates this trade-off by employing multiple rounds of denoising at the earlier phases of the diffusion process, helping the model eliminate precisely the right amount of noise without compromising the meaningful content of the data, thereby fortifying the model against attacks.

Crucially, adversarial inputs often reveal subtle "low-rank" signatures - patterns that can slip past complex defenses. By weaving in a technique called tensor factorization, LoRID pinpoints these low-rank aspects, bolstering the model's defense in extensive adversarial attack regimes.

The team tested LoRID using widely recognized benchmark datasets such as CIFAR-10, CIFAR-100, Celeb-HQ, and ImageNet, evaluating its performance against state-of-the-art black-box and white-box adversarial attacks. In white-box attacks, adversaries have full knowledge of the AI model's architecture and parameters. In black-box attacks, they only see inputs and outputs, with the model's internal workings hidden. Across every test, LoRID consistently outperformed other methods, particularly in terms of robust accuracy - the key indicator of a model's reliability when under adversarial threat.

Venado helps unlock efficiency, results

The team ran the LoRID models on Venado, the Lab's newest AI-capable supercomputer, to test various state-of-the-art vision models against black-box and white-box adversarial attacks.

By harnessing multiple Venado nodes for several weeks - an ambitious effort given the massive compute requirements - they became the first group to undertake such a comprehensive analysis. Venado's power turned months of simulation into mere hours, slashing the total development timeline from years to just one month and significantly reducing computational costs.

Robust purification methods can enhance AI security wherever neural network or machine learning applications are applied, including potentially in the Laboratory's national security mission.

"Our method has set a new benchmark in state-of-the-art performance across renowned datasets, excelling under both white-box and black-box attack scenarios," said Minh Vu, Los Alamos AI researcher. "This achievement means we can now purify the data - whether sourced privately or publicly - before using it to train foundational models, ensuring their safety and integrity while consistently delivering accurate results."

The team presented their work and results at the prestigious AAAI Conference on Artificial Intelligence, known as AAAI-2025, hosted by the Association for the Advancement of Artificial Intelligence.

Funding: This work was supported by the Laboratory Directed Research and Development program at Los Alamos.

Source:

Los Alamos National Laboratory

Journal reference:

Preliminary scientific report. Zollicoffer, G., Vu, M., Nebgen, B., Castorena, J., Alexandrov, B., & Bhattarai, M. (2024). LoRID: Low-Rank Iterative Diffusion for Adversarial Purification. ArXiv. https://arxiv.org/abs/2409.08255

Posted in: AI Research News