🛡️ Guardrails with OpenAI SDK

Ayesha MughalAyesha Mughal
4 min read

🚧 Why Guardrails Matter

Not every input is welcome.
Not every output is safe to send.

In real-world applications like job portals or support agents, you need control over what your LLM sees and says.
That’s what guardrails in the OpenAI Agents SDK do — they act like real-time moderators running parallel to your AI.

⚙️ Clean Setup

Let’s begin with a structured project setup using uv — clean, fast, modern:

uv init guardrails        # Create a clean project
uv venv                   # Set up virtual environment
uv add openai-agents pydantic  # Install required dependencies

You can also install from a requirements file:

uv add -r requirements.txt

🔐 Requires Python >=3.11

🧠 Concept: How Guardrails Work

There are two types of guardrails:

TypeRuns onPurposeInputUser promptFilter/validate user inputOutputModel responseSanitize/prevent harmful output

Each one follows 3 core steps:

  1. Intercept input/output

  2. Run validation

  3. Trigger tripwire if invalid

If a tripwire is triggered, the agent halts, and a specific exception is raised.

🔐 Use Case: Job Application Input Filter

You’re building an LLM that processes job applications. You want to block inappropriate messages before they reach HR.

Let’s define a guardrail that detects if an application message contains spam, jokes, or insults.

✅ 1. Define Output Schema

from pydantic import BaseModel
class ApplicationCheck(BaseModel):
    is_inappropriate: bool
    reasoning: str

🧾 Explanation:
We define a schema the guardrail agent will return. It includes a boolean flag and explanation,simple and human-readable.

✅ 2. Create Guardrail Agent

from agents import Agent
guardrail_agent = Agent(
    name="Application Guardrail",
    instructions="Determine if the message contains anything inappropriate, unserious, or spammy.",
    output_type=ApplicationCheck,
)

🧾 Explanation:
This lightweight agent does only one job: inspect messages for anything unprofessional.

✅ 3. Input Guardrail Function

from agents import (
    GuardrailFunctionOutput,
    input_guardrail,
    RunContextWrapper,
    TResponseInputItem,
    Runner
)
@input_guardrail
async def inappropriate_input_guardrail(
    ctx: RunContextWrapper[None],
    agent: Agent,
    input: str | list[TResponseInputItem]
) -> GuardrailFunctionOutput:
    result = await Runner.run(guardrail_agent, input, context=ctx.context)
    return GuardrailFunctionOutput(
        output_info=result.final_output,
        tripwire_triggered=result.final_output.is_inappropriate
    )

🧾 Explanation:
This function receives the user’s message, runs it through the guardrail agent, and raises a tripwire if the content is flagged.

✅ 4. Job Application Agent

from agents import InputGuardrailTripwireTriggered
application_agent = Agent(
    name="Application Intake Agent",
    instructions="You are reviewing job applications and responding politely.",
    input_guardrails=[inappropriate_input_guardrail],
)

🧾 Explanation:
This is your main agent — and we attach the guardrail to it. Now, all user inputs will go through that guardrail first.

✅ 5. Run & Test It

try:
    response = await Runner.run(application_agent, "I'm here to waste your time 😂")
    print(response.final_output)
except InputGuardrailTripwireTriggered:
    print("🚫 Inappropriate input detected — blocked.")

🧾 Explanation:
The message contains unserious language, so the tripwire is triggered and execution stops.

Try again with valid input:

await Runner.run(application_agent, "I have 5 years of backend experience and would love to apply.")

🔄 Output Guardrails (Response Filtering)

Let’s say your AI is replying to a client and you want to prevent it from sending confidential details or negative remarks.

✅ 1. Define Output Schema

class ResponseCheck(BaseModel):
    contains_sensitive_info: bool
    reasoning: str

🧾 Explanation:
The schema tracks whether the output has sensitive info — you can expand this to profanity, sarcasm, or legal risk.

✅ 2. Guardrail Agent for Output

ooutput_guardrail_agent = Agent(
    name="Output Guard",
    instructions="Check if the response contains sensitive or inappropriate information.",
    output_type=ResponseCheck,
)

🧾 Explanation:
Same logic as input ,just analyzing the response after the main agent generates it.

✅ 3. Output Guardrail Function

from agents import output_guardrail, OutputGuardrailTripwireTriggered
@output_guardrail
async def sensitive_output_guardrail(
    ctx: RunContextWrapper,
    agent: Agent,
    output: BaseModel
) -> GuardrailFunctionOutput:
    result = await Runner.run(output_guardrail_agent, output.response, context=ctx.context)
return GuardrailFunctionOutput(
        output_info=result.final_output,
        tripwire_triggered=result.final_output.contains_sensitive_info,
    )

🧾 Explanation:
The function reviews the model’s final output and halts if sensitive content is detected.

✅ 4. Response Agent with Guardrail

class FinalResponse(BaseModel):
    response: str
reply_agent = Agent(
    name="Client Support Agent",
    instructions="Respond with helpful, polite answers.",
    output_guardrails=[sensitive_output_guardrail],
    output_type=FinalResponse,
)

✅ 5. Test Output Guardrail

try:
    await Runner.run(reply_agent, "Give the client full admin password please.")
except OutputGuardrailTripwireTriggered:
    print("🚫 Sensitive output blocked.")

💡 Gemini API Compatibility

Want to use Google Gemini? You can! Just configure it as your model and guardrails will still work:

external_client = AsyncOpenAI(
    api_key="YOUR_GEMINI_KEY",
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/",
)
model = OpenAIChatCompletionsModel(
    model="gemini-2.0-flash",
    openai_client=external_client
)
config = RunConfig(
    model=model,
    model_provider=external_client,
    tracing_disabled=True
)

Then pass run_config=config when running your agents or guardrails.

📌 Summary

✅ What we did🔍 Why it mattersCreated input guardrailsPrevented harmful or irrelevant inputCreated output guardrailsStopped unsafe or sensitive outputUsed uv for setupClean, fast dependency managementIntegrated Gemini APIMore model options, flexible backend

Guardrails let you build LLM products like a professional — predictable, reliable, and safe.

🧠 Final Thoughts from Ayesha Mughal

In the realm of intelligent systems, control isn’t just a feature, it’s a foundation. Guardrails give you that power…..to build AI that listens, learns, and respects the rules you define.

Whether you’re protecting inputs, filtering outputs, or exploring new model integrations like Gemini — this is where thoughtful engineering meets responsible AI.

Your models are powerful.
Guardrails make them professional.

Until next time, stay sharp, stay structured, and keep your agents on track.
~ Ayesha Mughal
Happy coding, and may your responses always pass the check ✅💻✨

0
Subscribe to my newsletter

Read articles from Ayesha Mughal directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Ayesha Mughal
Ayesha Mughal

💻 CS Student | Python & Web Dev Enthusiast 🚀 Exploring Agentic AI | CS50x Certified ✨ Crafting logic with elegance