CYBERSECEVAL 3 Security Benchmark Evaluates Risks in LLMs

In an article recently posted to the Meta Research website, researchers introduced cybersecurity evaluation (CYBERSECEVAL 3), a new set of security benchmarks for evaluating large language models (LLMs). This update assessed eight risks in two main categories: risks to third parties, developers, and end users of applications.

The paper highlighted new focus areas on offensive security capabilities, including automated social engineering and scaling autonomous cyber operations. The benchmarks were applied to LLM meta-artificial intelligence 3 (Llama 3) and other state-of-the-art LLMs to contextualize risks with and without mitigations.

Study: CYBERSECEVAL 3 Security Benchmark Evaluates Risks in LLMs. Image Credit: Pungu x/Shutterstock.com
Study: CYBERSECEVAL 3 Security Benchmark Evaluates Risks in LLMs. Image Credit: Pungu x/Shutterstock.com

Background

Previous work has established methods for assessing LLMs' security capabilities, focusing on risks to third parties and application developers. Studies have explored LLMs' potential for aiding spear-phishing attacks, enhancing manual cyber operations, and performing autonomous cyber operations. Notable contributions include evaluating prompt injection vulnerabilities and assessing malicious code execution risks.

LLM Risk Assessment

The analysts assessed four risks to third parties from LLMs: automated social engineering, scaling manual offensive cyber operations, autonomous offensive cyber operations, and autonomous software vulnerability discovery and exploitation. The Llama 3 405b evaluation for spear-phishing showed it could automate convincing phishing content but was less effective than models like generative pre-trained transformer 4 (GPT-4) Turbo and Qwen 2-72b-instruct.

Llama 3 achieved moderate scores in phishing simulations, indicating it could scale phishing efforts but is unlikely to pose a higher risk than other models. Additionally, the role of Llama 3 405b in scaling manual cyber operations was examined, revealing no significant improvement in attacker performance compared to traditional methods.

In a capture-the-flag simulation with 62 volunteers, Llama 3 405b did not significantly enhance the capabilities of novice or expert attackers. Despite some reported benefits, such as reduced mental effort, overall performance improvements were negligible. Llama Guard 3 has been released, which can identify and block misuse of Llama 3 models in cyberattacks, helping to mitigate potential threats while maintaining model safety.

Autonomous Cyber Capabilities

The assessment of Llama 3 70b and 405b models for autonomous offensive cyber operations revealed limited effectiveness. In simulations of ransomware attacks, these models performed poorly in exploit execution and maintaining access despite managing reconnaissance and vulnerability identification. Results showed that Llama 3 70b completed over half of low-sophistication challenges but struggled with more complex tasks.

The potential for autonomous software vulnerability discovery and exploitation by LLMs, including Llama 3, remains constrained due to limited program reasoning capabilities and complex program structures. Testing of Llama 3 405b demonstrated some success in specific vulnerability challenges, outperforming GPT-4 Turbo in certain tasks, but it did not show breakthrough capabilities. To mitigate misuse, deploying Llama Guard 3 is recommended for detecting and blocking cyberattack aid requests.

Llama 3 Cybersecurity Risks

The assessment of Llama 3 models in the context of cybersecurity risks revealed several key concerns for application developers and end-users. These risks include prompt injection attacks, where malicious inputs alter the model's behavior; the potential for models to execute harmful code in attached interpreters; the generation of insecure code; and the risk of models facilitating cyberattacks.

Testing demonstrated that Llama 3, particularly in its 70b and 405b versions, performs comparable to GPT-4 in prompt injection attacks but can still be vulnerable to certain exploitation techniques. The models also tend to generate insecure code, though introducing guardrails such as prompt guards and code shields reduces these risks.

Researchers highly recommend deploying Llama Guard 3 to mitigate these vulnerabilities. This guardrail system detects and blocks malicious inputs, prevents insecure code generation, and limits the models' ability to facilitate cyberattacks. Despite their effectiveness, developers must use them alongside secure coding practices and robust sandboxing techniques to ensure comprehensive protection against potential misuse.

Cybersecurity Guardrails Overview

Several guardrails are recommended to mitigate cybersecurity risks associated with Llama 3. Prompt guard helps reduce the risk of prompt injection attacks by classifying inputs as jailbreak, injection, or benign. It achieves a 97.5% recall rate for detecting jailbreak prompts and a 71.4% detection rate for indirect injections with minimal false positives. Code shield is an inference-time filtering tool that prevents insecure code from entering production systems.

It uses the insecure code detector (ICD) to analyze code patterns across various languages, achieving a 96% precision and 79% recall, with most scans completed in under 70ms. Llama Guard, a fine-tuned version of Llama 3, focuses on preventing compliance with prompts that could facilitate malicious activities. It significantly reduces safety violations but may increase false refusal rates, particularly when used as input and output filters. Together, these tools enhance the security of Llama 3 applications by addressing prompt injections, insecure code, and compliance with potentially harmful prompts.

Conclusion

To summarize, CYBERSECEVAL 3, a new benchmark suite for assessing cybersecurity risks from LLMs, was released, extending CYBERSECEVAL 1 and CYBERSECEVAL 2. The effectiveness of CYBERSECEVAL was demonstrated by evaluating Llama 3 and a select set of contemporary state-of-the-art models against a broad range of cybersecurity risks. The released mitigations could improve multiple risks for Llama 3 and other models.

Journal reference:
Silpaja Chandrasekar

Written by

Silpaja Chandrasekar

Dr. Silpaja Chandrasekar has a Ph.D. in Computer Science from Anna University, Chennai. Her research expertise lies in analyzing traffic parameters under challenging environmental conditions. Additionally, she has gained valuable exposure to diverse research areas, such as detection, tracking, classification, medical image analysis, cancer cell detection, chemistry, and Hamiltonian walks.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Chandrasekar, Silpaja. (2024, August 05). CYBERSECEVAL 3 Security Benchmark Evaluates Risks in LLMs. AZoAi. Retrieved on January 15, 2025 from https://www.azoai.com/news/20240805/CYBERSECEVAL-3-Security-Benchmark-Evaluates-Risks-in-LLMs.aspx.

  • MLA

    Chandrasekar, Silpaja. "CYBERSECEVAL 3 Security Benchmark Evaluates Risks in LLMs". AZoAi. 15 January 2025. <https://www.azoai.com/news/20240805/CYBERSECEVAL-3-Security-Benchmark-Evaluates-Risks-in-LLMs.aspx>.

  • Chicago

    Chandrasekar, Silpaja. "CYBERSECEVAL 3 Security Benchmark Evaluates Risks in LLMs". AZoAi. https://www.azoai.com/news/20240805/CYBERSECEVAL-3-Security-Benchmark-Evaluates-Risks-in-LLMs.aspx. (accessed January 15, 2025).

  • Harvard

    Chandrasekar, Silpaja. 2024. CYBERSECEVAL 3 Security Benchmark Evaluates Risks in LLMs. AZoAi, viewed 15 January 2025, https://www.azoai.com/news/20240805/CYBERSECEVAL-3-Security-Benchmark-Evaluates-Risks-in-LLMs.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
MIT Researchers Transform AI Fairness with Targeted Data Debiasing