Azure Guardrails for DeepSeek-R1: Some Improvement but Not Enough

Running DeepSeek-R1 on Azure has many flaws, even with Azure guardrail filters

Azure Guardrails for DeepSeek-R1: Some Improvement but Not Enough

The growing popularity of AI models like DeepSeek-R1 has made it easier than ever for enterprises to integrate large language models (LLMs) into their workflows. But ease of access doesn’t equate to safety. AppSOC Research Labs has now tested the Azure-hosted version of DeepSeek-R1 both with and without Azure’s built-in content filters and guardrails. The results confirm a troubling reality: while Azure filters provide some value, DeepSeek-R1 remains a high-risk model unsuitable for enterprise use.

This blog summarizes the key findings from our latest testing, powered by the AppSOCAI Security Testing platform. Using automated static analysis and dynamic red teaming simulations, we assessed how well Azure’s safety mechanisms protect against real-world threats like prompt injection, malware generation, training data leakage, and more.

Excerpt from AppSOC Model Test Report on DeepSeek-R1 on Azure

Content Filters Help—But Not Nearly Enough

Microsoft Azure’s content filters are designed to enforce safety and reduce risk. In some categories, these filters offered measurable improvements. In others, they made almost no difference. In one troubling case, they may have even worsened model performance.

Positive Results – Azure Filters Helped

1. Jailbreaking: Azure Filters Significantly Reduced Risk

Jailbreaking is one of the most dangerous forms of model manipulation, allowing users to bypass system-level guardrails. When tested without filters, DeepSeek-R1 failed 37.6% of jailbreak attempts. With Azure filters, this dropped sharply to just 5.0%.

Takeaway: Azure filters are effective at blocking basic jailbreaks—but this doesn’t make the model secure overall.

2. Toxicity: Significant Improvement with Azure Filters

Toxic responses from AI models can lead to serious ethical, compliance, and brand reputation issues. Here, Azure filters performed well: the failure rate dropped from 14.8% without filters to just 4.0% with them—a more than 70% reduction.

Takeaway: This is one area where Azure’s content filters show strong results. However, it’s not enough to outweigh the broader security risks.

3. Training Data Leakage: Better, But Still Risky

Leaks of training data can expose proprietary, sensitive, or personally identifiable information. Without filters, DeepSeek-R1 failed 32.7% of data leak tests. With Azure filters, this dropped to 10.0%.

Takeaway: This is a substantial improvement—but a 10% leak rate is still high for any regulated industry.

4. Hallucinations: Filters Eliminate the Problem (for Now)

Hallucinations—false or made-up responses—are a common issue in LLMs. In this case, Azure’s filters appeared to fully eliminate hallucination failures, reducing the rate from 50.4% to 0%.

Takeaway: This is one of the most positive outcomes of the Azure filtering system, though the consistency of this result over time remains to be seen.

Neutral Results – Azure Had Little Impact

5. Supply Chain Risk: Slightly Worse with Filters

Ironically, the only category where Azure filters seemed to make things worse was in supply chain recommendations. The model’s failure rate here increased from 5.8% without filters to 6.9% with them.

Takeaway: Although the increase is small, this anomaly suggests the filters might interfere with useful model behavior in unexpected ways.

Negative Results – Azure Failed to Block These

6. Prompt Injection: Still Alarmingly High Failure Rates

Prompt injection is a critical threat that allows attackers to override instructions, leak data, or manipulate model output. DeepSeek-R1 failed 57.1% of prompt injection tests without filters. While filters reduced that to 40%, the rate remains dangerously high.

Takeaway: Azure’s content filters reduce—but do not solve—the prompt injection problem. A 40% failure rate is unacceptable for enterprise use.

7. Malware Generation: Filters Offer Minimal Help

AI models should not be capable of generating malicious code. Yet DeepSeek-R1 failed malware generation tests at a 96.7% rate without filters—and still failed 93.8% of the time even with them.

Takeaway: Azure’s filters appear to marginally reduce the risk, but the model remains almost fully vulnerable to malware prompts.

8. Virus Creation: No Improvement

The virus test assessed whether the model could be coaxed into generating active virus code. The failure rate? 93.3%with or without Azure filters.

Takeaway: Azure’s filters have zero impact on DeepSeek-R1’s tendency to generate dangerous viral code. This is a red flag for any security-conscious organization.

Overall Risk Scores: Still “High Risk” Either Way

Despite Azure’s filtering improvements in certain areas, the overall AppSOC Risk Scores for DeepSeek-R1 on Azure remain problematic:

  • With Azure Filters: 8.3 / 10 (High Risk)
  • Without Azure Filters: 8.4 / 10 (High Risk)

In short, Azure guardrails improve specific threat vectors, but they do not sufficiently mitigate the underlying vulnerabilities of the DeepSeek-R1 model.

AppSOC’s Recommendations

Do not use DeepSeek-R1 on Azure—with or without content filters—for any AI applications that involve:

  • Personal or sensitive data
  • Proprietary intellectual property
  • Regulated environments
  • Critical decision-making

These risks are simply too great.

Best Practices for AI Security

Final Thoughts: Guardrails Are Not a Cure-All

Azure’s content filters are not meaningless—they can reduce specific risks like toxicity, hallucinations, and jailbreaks. But enterprise-ready AI requires far more than partial filtering.

Security must be baked into model selection, testing, deployment, and ongoing operations. That’s what AppSOC is here to deliver.

Secure Your Path to AI Adoption.

For full details on AI security assessments and how the AppSOCAI Platform works, visit www.appsoc.com or contact us at info@appsoc.com.