Definition
Guardrails are defensive layers around a language model that filter input and output to prevent unsafe, off-topic, or otherwise undesirable behavior. Common guardrails include topic classifiers, profanity filters, prompt injection detectors, output schema validators, and refusal policies. Production AI applications combine several guardrails layered before and after the model call.
Example
Customer support chatbot: input passes through a topic classifier (refuses non-support questions); output passes through a PII detector (redacts emails) before display.
When to use
Any user-facing AI application. Mandatory in regulated industries.
Also known as
llm guardrails