Something like this? The detailed version is here.
You want a “query validation” layer that does two different jobs:
- Content safety: detect toxic, hateful, sexual, violent, self-harm content.
- Security: detect prompt injection, jailbreaks, indirect injection via RAG documents, data exfiltration attempts, and unsafe tool use.
Most products do one of these well. Fewer do both. The clean way to think about it is: detectors produce signals, and your app enforces policy (especially around tools and data access). OWASP explicitly recommends validating all inputs (user + external content), separating instructions from data, and monitoring outputs. (OWASP Cheat Sheet Series)
The UK NCSC also stresses that prompt injection is structurally different from SQLi and you should focus on reducing likelihood and impact, not expecting a perfect filter. (NCSC)
What “validate user queries” actually needs to include
If you only scan the user’s typed message, you miss the most common real-world attacks.
A practical scope is bidirectional and includes:
- Inbound: the user message.
- RAG context / documents: retrieved chunks can contain “hidden instructions” (indirect prompt injection). Azure calls these “document attacks.” (Microsoft Learn)
- Tool boundary: validate tool calls before execution and tool outputs before feeding them back into the model.
- Outbound: scan the model’s response for policy violations and sensitive data leakage.
That “RAG/doc + tools + outbound” part is what separates “content moderation” from “LLM security.”
Services that specialize in content + prompt-injection style security
1) Cloud-managed guardrail services (strong default if you’re on that cloud)
Google Cloud Model Armor
- Positioned as runtime protection against prompt injection, harmful content, sensitive data leaks, and malicious URLs. (Google Cloud)
- Also supports document screening (PDF, DOCX, PPTX, XLSX, CSV, TXT, etc.), which matters for RAG and email/doc ingestion. (Google Cloud Documentation)
Azure AI Content Safety Prompt Shields
- Designed to detect prompt injection and jailbreak-like behavior for user prompts and documents. It explicitly separates “user prompt attacks” from “document attacks.” (Microsoft Learn)
Amazon Bedrock Guardrails (prompt attack filter)
- Covers harmful content filtering and prompt attack detection. (AWS Document)
- Important operational detail: prompt-attack filtering requires input tags so the service can identify what is “user input.” Without tags, the prompt attack filter will not work for certain inference APIs. (AWS Document)
When these are a good fit: you want one vendor to cover a lot quickly, and you can accept their cloud integration model.
2) Edge / gateway “AI firewall” products (WAF-like choke point)
These sit between your app and model providers. They are good for central policy, logging, abuse throttling, and “turn it on for many teams.”
Cloudflare AI Gateway Guardrails
- AI Gateway is a proxy between your app and model providers and can apply guardrails for safety. (The Cloudflare Blog)
- Their Guardrails feature runs content evaluations using Llama Guard 3 (as documented in usage considerations). (Cloudflare Docs)
Akamai Firewall for AI
- Markets prevention of prompt injections, jailbreaks, and unauthorized data extraction, plus review/filtering of generated responses for harmful content. (Akamai)
When these are a good fit: you need an “AI security gateway” pattern across many apps and providers.
3) Specialist “LLM security API” vendors (focused on prompt attacks + leakage)
Lakera Guard
- Provides real-time detection of “prompt attacks” so your app can block, warn, or flag requests. (docs.lakera.ai)
- There are also common integration patterns through gateways (example: Kong plugin documentation). (Kong Docs)
When these are a good fit: you want a cloud-agnostic API focused on prompt injection defenses and fast integration.
Open-source options you can self-host (useful as an “inner ring”)
If you want a cheaper always-on layer, or you need full control.
Protect AI LLM Guard
- Open-source toolkit: sanitization, harmful language detection, data leakage prevention, and resistance against prompt injection. (GitHub)
(Protect AI also has a separate prompt injection detector project called Rebuff, but treat it as a detector, not a complete solution.) (GitHub)
NVIDIA NeMo Guardrails
- Open-source toolkit to add programmable “rails” that constrain model behavior and flow. (GitHub)
Meta Purple Llama and Llama Guard 3
- Llama Guard 3 is a family of input/output moderation models. Useful if you want to run classification locally. (GitHub)
OpenAI Moderation endpoint (content safety)
- A straightforward moderation API for text and images. This is mainly for content safety, not full prompt-injection defense. (OpenAI Platform)
A realistic pattern: open-source for baseline checks + one managed service for higher-precision screening.
“Comparison pages” and high-signal resources
There is no single perfect “LLM firewall comparison” page yet. The most useful “comparison” resources are:
Practical guidance (what to implement)
- OWASP LLM Prompt Injection Prevention Cheat Sheet (checklist-style mitigations, input + external content + output monitoring). (OWASP Cheat Sheet Series)
- OWASP GenAI Top 10: LLM01 Prompt Injection (risk definition and failure modes). (OWASP Gen AI Security Project)
- NCSC: “Prompt injection is not SQL injection” (good mental model for what mitigations can and cannot do). (NCSC)
Curated lists of defenses and tools
tldrsec/prompt-injection-defenses (catalog of defenses and references). (GitHub)
How to compare products in your own environment (recommended)
Instead of trusting marketing, run a small evaluation harness:
- promptfoo guides for guardrail testing and red teaming, including a specific guide for testing Google Model Armor templates. (Promptfoo)
- garak open-source LLM vulnerability scanner (probes for injection, leakage, toxicity, etc.). (GitHub)
- PyRIT (Microsoft) for automated risk identification and red teaming workflows. (GitHub)
- BIPIA benchmark for indirect prompt injection (RAG/document-style attacks). (arXiv)
- AgentDojo benchmark for tool-using agents under injection attacks (useful if you have tools and actions). (arXiv)
A simple, easy architecture that works in practice
If you want “easy-to-understand” and “works in production,” do this:
- Normalize + segment inputs
- Treat these channels separately:
USER_TEXT, RAG_CONTEXT, TOOL_OUTPUT, MODEL_OUTPUT.
- This matters because “document attacks” come from RAG and emails, not just user text. (Microsoft Learn)
- Run two classes of checks
- Content safety checks (toxicity, hate, sexual, violence, self-harm).
- Security checks (prompt injection/jailbreak patterns, data exfil attempts, malicious URLs, etc.). Model Armor explicitly bundles this mix. (Google Cloud)
- Enforce a deterministic policy for tools
- Even the best detector will miss attacks sometimes.
- The main security win is: tools are only callable if allowed, with validated arguments, scoped to the user/tenant.
- Validate outbound responses
- Scan for sensitive data leakage and policy violations before the user sees it.
- Edge products and Model Armor-style services explicitly market input-and-output protection. (Akamai)
Quick shortlist of “start here” options
If you want a short list to evaluate first:
- Google Model Armor if you want prompt + response + document scanning in one managed service. (Google Cloud Documentation)
- Azure Prompt Shields if “document attacks” and Azure alignment matter. (Microsoft Learn)
- Bedrock Guardrails if you are on AWS and will implement correct input tagging. (AWS Document)
- Cloudflare AI Gateway Guardrails or Akamai Firewall for AI if you want a central gateway layer across providers. (The Cloudflare Blog)
- Lakera Guard if you want a dedicated prompt-attack defense API. (docs.lakera.ai)
- LLM Guard + garak + promptfoo if you want a self-hostable baseline plus an evaluation harness. (GitHub)
Summary
- “Validate user queries” should include user text + RAG/docs + tool I/O + model output, not just the first prompt. (Microsoft Learn)
- Strong managed services: Model Armor, Azure Prompt Shields, Bedrock Guardrails. Bedrock prompt-attack filtering needs input tags. (Google Cloud)
- Gateway options: Cloudflare AI Gateway Guardrails, Akamai Firewall for AI. (The Cloudflare Blog)
- Best “comparison” approach: run an eval harness using promptfoo + garak + PyRIT, and include indirect injection with BIPIA and tool-agent tests with AgentDojo. (Promptfoo)