Guardrails

Definition

Guardrails are defensive layers around a language model that filter input and output to prevent unsafe, off-topic, or otherwise undesirable behavior. Common guardrails include topic classifiers, profanity filters, prompt injection detectors, output schema validators, and refusal policies. Production AI applications combine several guardrails layered before and after the model call.

Example

Customer support chatbot: input passes through a topic classifier (refuses non-support questions); output passes through a PII detector (redacts emails) before display.

When to use

Any user-facing AI application. Mandatory in regulated industries.

Also known as

llm guardrails

Definition

Example

When to use

Also known as

Related terms

Stop rewriting prompts. Start shipping.