How to Use PydanticAI for Multimodal LLMs

Large multimodal models like Google Gemma 3 and Claude Opus 4 can now reason over text and images. But if you've looked at the docs, it's easy to get lost in agents, tools, and structured outputs before you even get to "Hello, World."
This post is the short version—how to pass an image (or URL) into a PydanticAI agent in just a few lines.
Updated Version: This is the newer, simplified guide from PydanticAI's updated version. For the original comprehensive version with structured outputs, agents, and testing, see my complete PydanticAI guide.
All the code is available on GitHub.
Setup
If you don't already have uv
, install it first (it's a fast Python package manager):
curl -LsSf https://astral.sh/uv/install.sh | sh
Then install PydanticAI and dependencies:
uv add pydantic-ai python-dotenv
You'll also need an OpenRouter API key. Sign up at OpenRouter and get your API key.
Create a .env
file:
OPENROUTER_API_KEY=YOUR_OPENROUTER_API_KEY
Now you're ready to run the examples.
Comprehensive Image Analysis Example
Here's a complete example that demonstrates both remote and local image handling with PydanticAI:
"""
Comprehensive Image Analysis Example
Demonstrates both remote and local image handling with pydantic-ai
"""
import os
from pydantic_ai import Agent, ImageUrl, BinaryContent
from pydantic_ai.models.openai import OpenAIModel
from pydantic_ai.providers.openrouter import OpenRouterProvider
from dotenv import load_dotenv
# Load environment variables
load_dotenv()
# Initialize the model
model = OpenAIModel(
model_name="google/gemma-3-4b-it",
provider=OpenRouterProvider(api_key=os.getenv("OPENROUTER_API_KEY"))
)
# Create the agent
agent = Agent(model=model)
def analyze_remote_image():
"""Example: Analyze a remote image using ImageUrl"""
print("=== Remote Image Analysis ===")
result = agent.run_sync([
"What company is this logo from?",
ImageUrl(url="https://iili.io/3Hs4FMg.png"),
])
print(f"Remote image analysis result: {result.output}")
print()
def analyze_local_image():
"""Example: Analyze a local image using BinaryContent"""
print("=== Local Image Analysis ===")
# Read local image file
with open("images/invoice_sample.png", "rb") as f:
image_data = f.read()
result = agent.run_sync([
"What company is this logo from?",
BinaryContent(data=image_data, media_type="image/png"),
])
print(f"Local image analysis result: {result.output}")
print()
def main():
"""Run both examples"""
print("Pydantic AI Image Analysis Examples")
print("=" * 40)
print()
# Analyze remote image
analyze_remote_image()
# Analyze local image
analyze_local_image()
print("Analysis complete!")
if __name__ == "__main__":
main()
This example shows the two main ways to handle images:
Remote images using
ImageUrl
- perfect for web-hosted imagesLocal images using
BinaryContent
- ideal for files on your system
Key Takeaways
ImageUrl
→ fastest way to use image URLsBinaryContent
→ for local image files or when you want to control uploadWorks across OpenAI, Anthropic, Google Vertex, OpenRouter, and more
OpenRouter provides access to multiple models through a single API
Google Gemma 3 offers excellent image analysis capabilities at competitive pricing
That's it - no complex agents, no long schemas. Just image input in a few lines of code.
Subscribe to my newsletter
Read articles from Stephen Collins directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Stephen Collins
Stephen Collins
Senior Software engineer currently working with a climate-tech startup