A new study exposes alarming safety gaps in DeepSeek-R1, revealing that it responds unsafely nearly 12% of the time—ten times more than OpenAI’s o3-mini. With AI safety in the spotlight, these findings could shape the future of responsible AI deployment.
Research: o3-mini vs DeepSeek-R1: Which One is Safer?
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.
The development of large language models (LLMs) continues to evolve, with the emergence of new players challenging the established AI landscape. DeepSeek-R1, a Chinese-developed LLM, has recently positioned itself as a strong competitor to OpenAI’s o3-mini, claiming superior performance at lower inference costs. However, beyond raw capabilities, safety remains a key concern in AI alignment. This study, posted to the arXiv preprint* server, and conducted by researchers from Mondragon University in Spain and the University of Seville, systematically evaluates the safety of DeepSeek-R1 (70B version) and OpenAI’s o3-mini (beta version) using ASTRAL, an automated safety testing tool.
The Need for Safety in LLMs
The ability of LLMs to handle unsafe prompts and align with human values is crucial, given their widespread deployment. Users often provide queries that may involve sensitive topics, from violent content to misinformation. A safe LLM should be able to reject such prompts or provide responses that do not violate ethical and legal standards. Traditional safety evaluation techniques, such as multiple-choice benchmarks and adversarial testing, have limitations, often failing to reflect real-world interactions.
ASTRAL, developed in prior research by the authors, introduces a novel black-box coverage approach to systematically generate unsafe test inputs. This ensures a well-balanced evaluation across various safety categories, writing styles, and persuasion techniques. The study leverages ASTRAL to assess both DeepSeek-R1 and o3-mini, executing 1,260 unsafe test prompts to determine their adherence to safety principles.
Methodology: Systematic Evaluation of Safety
The study compares the models based on three key research questions:
- Overall safety: Which LLM demonstrates a higher safety level?
- Category-specific performance: Do certain safety categories pose greater challenges to either model?
- Impact of writing styles and persuasion techniques: How do linguistic variations influence unsafe responses?
DeepSeek-R1 was evaluated using the 70B version, deployed through Ollama, while OpenAI provided early access to the beta version of o3-mini. The test prompts covered 14 safety categories, including hate speech, financial crime, and misinformation, ensuring a broad assessment of each model’s safety filters. The responses were then classified as safe, unsafe, or unknown, with manual verification conducted for ambiguous cases.
Findings: DeepSeek-R1 Falls Short on Safety
The results reveal a stark contrast between the two models. DeepSeek-R1 responded unsafely to 11.98% of the test inputs, while o3-mini exhibited a significantly lower 1.19% failure rate. This suggests that DeepSeek-R1 is nearly ten times less safe than its OpenAI counterpart.
The study highlights that DeepSeek-R1 is particularly vulnerable in categories such as financial crime, violence incitement, terrorism, and hate speech. This suggests gaps in its filtering mechanisms, which could pose significant risks if deployed in real-world applications. In contrast, o3-mini did not display significant weaknesses across specific categories, potentially due to OpenAI’s built-in policy violation safeguards, which preemptively blocked unsafe queries before they even reached the model for execution. Notably, nearly 44.8% of unsafe test inputs were intercepted by this safeguard, reducing the exposure of o3-mini to high-risk content.
The Role of Writing Style and Persuasion Techniques
The analysis also investigates how different writing styles and persuasion techniques influence unsafe responses. DeepSeek-R1 exhibited higher susceptibility when prompts contained technical terminology, role-playing scenarios, or misspellings. This suggests that the model lacks robust filtering mechanisms to detect unsafe content when presented in varied linguistic formats. In contrast, o3-mini showed minimal variation across styles and techniques, reinforcing its consistency in maintaining safety.
Manual Assessment: Severity of Unsafe Responses
While the numerical data points to DeepSeek-R1’s shortcomings, the severity of its unsafe responses was another major concern. Manual verification revealed that DeepSeek-R1 often provided detailed, explicit, and actionable responses to unsafe queries. This included instances where it outlined steps for illegal activities or amplified harmful narratives. In contrast, o3-mini’s unsafe responses were often more borderline or ambiguous rather than outright dangerous, making them significantly less severe.
The Influence of OpenAI’s Policy Safeguards
A key differentiator in the models’ safety performance was OpenAI’s policy violation detection system. The study found that o3-mini preemptively blocked nearly half of the unsafe test inputs before they even reached the model for execution. This safeguard significantly reduced the model’s exposure to dangerous queries, preventing potentially harmful responses from being generated in the first place.
DeepSeek-R1, lacking a similar pre-processing layer, was exposed to the full range of unsafe inputs, resulting in higher rates of unsafe responses. This underscores the importance of system-level safeguards in mitigating risks in LLM deployment.
Implications and Future Work
These findings have major implications for AI deployment, particularly in regulated environments such as the European Union, where AI governance frameworks emphasize strict safety compliance. The stark contrast in safety levels suggests that DeepSeek-R1 requires substantial improvements before it can be considered a viable alternative to OpenAI’s models.
Future research will expand on these findings by:
- Conducting larger-scale safety assessments with over 6,300 test inputs, offering a more comprehensive analysis of LLM vulnerabilities.
- Reassessing o3-mini post-release to determine whether OpenAI’s policy safeguards persist in production, as they played a significant role in reducing unsafe responses in this study.
- Exploring mitigation strategies for models that underperform in safety, particularly those outside OpenAI’s ecosystem, which may not benefit from similar pre-filtering mechanisms.
Conclusion: A Clear Safety Gap
This study presents the first systematic safety assessment comparing DeepSeek-R1 and OpenAI’s o3-mini. The results overwhelmingly favor o3-mini, which demonstrated superior safety performance, a lower rate of unsafe responses, and additional protective mechanisms through policy violation safeguards. DeepSeek-R1, in contrast, exhibited concerning vulnerabilities, failing to align with fundamental safety expectations in several key areas.
While DeepSeek-R1 demonstrates strengths in reasoning, coding, and other AI tasks, its safety mechanisms are significantly weaker compared to OpenAI’s models. As LLMs continue to evolve, ensuring their alignment with human values and ethical standards remains a critical challenge. This research highlights the urgent need for robust safety frameworks, particularly for models like DeepSeek-R1, which still lag behind leading industry standards in safety and responsible AI development.
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.
Journal reference:
- Preliminary scientific report.
Arrieta, A., Ugarte, M., Valle, P., Parejo, J. A., & Segura, S. (2025). O3-mini vs DeepSeek-R1: Which One is Safer? ArXiv. https://arxiv.org/abs/2501.18438