🛡️ Guardrails with OpenAI SDK


🚧 Why Guardrails Matter
Not every input is welcome.
Not every output is safe to send.
In real-world applications like job portals or support agents, you need control over what your LLM sees and says.
That’s what guardrails in the OpenAI Agents SDK do — they act like real-time moderators running parallel to your AI.
⚙️ Clean Setup
Let’s begin with a structured project setup using uv
— clean, fast, modern:
uv init guardrails # Create a clean project
uv venv # Set up virtual environment
uv add openai-agents pydantic # Install required dependencies
You can also install from a requirements file:
uv add -r requirements.txt
🔐 Requires Python
>=3.11
🧠 Concept: How Guardrails Work
There are two types of guardrails:
TypeRuns onPurposeInputUser promptFilter/validate user inputOutputModel responseSanitize/prevent harmful output
Each one follows 3 core steps:
Intercept input/output
Run validation
Trigger tripwire if invalid
If a tripwire is triggered, the agent halts, and a specific exception is raised.
🔐 Use Case: Job Application Input Filter
You’re building an LLM that processes job applications. You want to block inappropriate messages before they reach HR.
Let’s define a guardrail that detects if an application message contains spam, jokes, or insults.
✅ 1. Define Output Schema
from pydantic import BaseModel
class ApplicationCheck(BaseModel):
is_inappropriate: bool
reasoning: str
🧾 Explanation:
We define a schema the guardrail agent will return. It includes a boolean flag and explanation,simple and human-readable.
✅ 2. Create Guardrail Agent
from agents import Agent
guardrail_agent = Agent(
name="Application Guardrail",
instructions="Determine if the message contains anything inappropriate, unserious, or spammy.",
output_type=ApplicationCheck,
)
🧾 Explanation:
This lightweight agent does only one job: inspect messages for anything unprofessional.
✅ 3. Input Guardrail Function
from agents import (
GuardrailFunctionOutput,
input_guardrail,
RunContextWrapper,
TResponseInputItem,
Runner
)
@input_guardrail
async def inappropriate_input_guardrail(
ctx: RunContextWrapper[None],
agent: Agent,
input: str | list[TResponseInputItem]
) -> GuardrailFunctionOutput:
result = await Runner.run(guardrail_agent, input, context=ctx.context)
return GuardrailFunctionOutput(
output_info=result.final_output,
tripwire_triggered=result.final_output.is_inappropriate
)
🧾 Explanation:
This function receives the user’s message, runs it through the guardrail agent, and raises a tripwire if the content is flagged.
✅ 4. Job Application Agent
from agents import InputGuardrailTripwireTriggered
application_agent = Agent(
name="Application Intake Agent",
instructions="You are reviewing job applications and responding politely.",
input_guardrails=[inappropriate_input_guardrail],
)
🧾 Explanation:
This is your main agent — and we attach the guardrail to it. Now, all user inputs will go through that guardrail first.
✅ 5. Run & Test It
try:
response = await Runner.run(application_agent, "I'm here to waste your time 😂")
print(response.final_output)
except InputGuardrailTripwireTriggered:
print("🚫 Inappropriate input detected — blocked.")
🧾 Explanation:
The message contains unserious language, so the tripwire is triggered and execution stops.
Try again with valid input:
await Runner.run(application_agent, "I have 5 years of backend experience and would love to apply.")
🔄 Output Guardrails (Response Filtering)
Let’s say your AI is replying to a client and you want to prevent it from sending confidential details or negative remarks.
✅ 1. Define Output Schema
class ResponseCheck(BaseModel):
contains_sensitive_info: bool
reasoning: str
🧾 Explanation:
The schema tracks whether the output has sensitive info — you can expand this to profanity, sarcasm, or legal risk.
✅ 2. Guardrail Agent for Output
ooutput_guardrail_agent = Agent(
name="Output Guard",
instructions="Check if the response contains sensitive or inappropriate information.",
output_type=ResponseCheck,
)
🧾 Explanation:
Same logic as input ,just analyzing the response after the main agent generates it.
✅ 3. Output Guardrail Function
from agents import output_guardrail, OutputGuardrailTripwireTriggered
@output_guardrail
async def sensitive_output_guardrail(
ctx: RunContextWrapper,
agent: Agent,
output: BaseModel
) -> GuardrailFunctionOutput:
result = await Runner.run(output_guardrail_agent, output.response, context=ctx.context)
return GuardrailFunctionOutput(
output_info=result.final_output,
tripwire_triggered=result.final_output.contains_sensitive_info,
)
🧾 Explanation:
The function reviews the model’s final output and halts if sensitive content is detected.
✅ 4. Response Agent with Guardrail
class FinalResponse(BaseModel):
response: str
reply_agent = Agent(
name="Client Support Agent",
instructions="Respond with helpful, polite answers.",
output_guardrails=[sensitive_output_guardrail],
output_type=FinalResponse,
)
✅ 5. Test Output Guardrail
try:
await Runner.run(reply_agent, "Give the client full admin password please.")
except OutputGuardrailTripwireTriggered:
print("🚫 Sensitive output blocked.")
💡 Gemini API Compatibility
Want to use Google Gemini? You can! Just configure it as your model and guardrails will still work:
external_client = AsyncOpenAI(
api_key="YOUR_GEMINI_KEY",
base_url="https://generativelanguage.googleapis.com/v1beta/openai/",
)
model = OpenAIChatCompletionsModel(
model="gemini-2.0-flash",
openai_client=external_client
)
config = RunConfig(
model=model,
model_provider=external_client,
tracing_disabled=True
)
Then pass run_config=config
when running your agents or guardrails.
📌 Summary
✅ What we did🔍 Why it mattersCreated input guardrailsPrevented harmful or irrelevant inputCreated output guardrailsStopped unsafe or sensitive outputUsed uv
for setupClean, fast dependency managementIntegrated Gemini APIMore model options, flexible backend
Guardrails let you build LLM products like a professional — predictable, reliable, and safe.
🧠 Final Thoughts from Ayesha Mughal
In the realm of intelligent systems, control isn’t just a feature, it’s a foundation. Guardrails give you that power…..to build AI that listens, learns, and respects the rules you define.
Whether you’re protecting inputs, filtering outputs, or exploring new model integrations like Gemini — this is where thoughtful engineering meets responsible AI.
Your models are powerful.
Guardrails make them professional.
Until next time, stay sharp, stay structured, and keep your agents on track.
~ Ayesha Mughal
Happy coding, and may your responses always pass the check ✅💻✨
Subscribe to my newsletter
Read articles from Ayesha Mughal directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Ayesha Mughal
Ayesha Mughal
💻 CS Student | Python & Web Dev Enthusiast 🚀 Exploring Agentic AI | CS50x Certified ✨ Crafting logic with elegance