Back to glossary
Concepts

Guardrails

Defensive layers around an LLM that filter input and output to prevent harm or drift.

Definition

Guardrails are defensive layers around a language model that filter input and output to prevent unsafe, off-topic, or otherwise undesirable behavior. Common guardrails include topic classifiers, profanity filters, prompt injection detectors, output schema validators, and refusal policies. Production AI applications combine several guardrails layered before and after the model call.

Example

Customer support chatbot: input passes through a topic classifier (refuses non-support questions); output passes through a PII detector (redacts emails) before display.

When to use

Any user-facing AI application. Mandatory in regulated industries.

Also known as

llm guardrails

Related terms

Free Chrome Extension

Stop rewriting prompts. Start shipping.

Works with ChatGPT, Claude, Gemini, Grok, Midjourney, Ideogram, Veo3 & Kling. 5.0★ on the Chrome Web Store.

Add to Chrome — Free