Large multimodal models like Google Gemma 3 and Claude Opus 4 can now reason over text and images. But if you've looked at the docs, it's easy to get lost in agents, tools, and structured outputs before you even get to "Hello, World."

This post is the short version—how to pass an image (or URL) into a PydanticAI agent in just a few lines.

Updated Version: This is the newer, simplified guide from PydanticAI's updated version. For the original comprehensive version with structured outputs, agents, and testing, see my complete PydanticAI guide.

All the code is available on GitHub.

Setup

If you don't already have uv, install it first (it's a fast Python package manager):

curl -LsSf https://astral.sh/uv/install.sh | sh

Then install PydanticAI and dependencies:

uv add pydantic-ai python-dotenv

You'll also need an OpenRouter API key. Sign up at OpenRouter and get your API key.

Create a .env file:

OPENROUTER_API_KEY=YOUR_OPENROUTER_API_KEY

Now you're ready to run the examples.

Comprehensive Image Analysis Example

Here's a complete example that demonstrates both remote and local image handling with PydanticAI:

"""
Comprehensive Image Analysis Example
Demonstrates both remote and local image handling with pydantic-ai
"""

import os
from pydantic_ai import Agent, ImageUrl, BinaryContent
from pydantic_ai.models.openai import OpenAIModel
from pydantic_ai.providers.openrouter import OpenRouterProvider
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Initialize the model
model = OpenAIModel(
    model_name="google/gemma-3-4b-it",
    provider=OpenRouterProvider(api_key=os.getenv("OPENROUTER_API_KEY"))
)

# Create the agent
agent = Agent(model=model)


def analyze_remote_image():
    """Example: Analyze a remote image using ImageUrl"""
    print("=== Remote Image Analysis ===")

    result = agent.run_sync([
        "What company is this logo from?",
        ImageUrl(url="https://iili.io/3Hs4FMg.png"),
    ])

    print(f"Remote image analysis result: {result.output}")
    print()


def analyze_local_image():
    """Example: Analyze a local image using BinaryContent"""
    print("=== Local Image Analysis ===")

    # Read local image file
    with open("images/invoice_sample.png", "rb") as f:
        image_data = f.read()

    result = agent.run_sync([
        "What company is this logo from?",
        BinaryContent(data=image_data, media_type="image/png"),
    ])

    print(f"Local image analysis result: {result.output}")
    print()


def main():
    """Run both examples"""
    print("Pydantic AI Image Analysis Examples")
    print("=" * 40)
    print()

    # Analyze remote image
    analyze_remote_image()

    # Analyze local image
    analyze_local_image()

    print("Analysis complete!")


if __name__ == "__main__":
    main()

This example shows the two main ways to handle images:

Remote images using ImageUrl - perfect for web-hosted images
Local images using BinaryContent - ideal for files on your system

Key Takeaways

ImageUrl → fastest way to use image URLs
BinaryContent → for local image files or when you want to control upload
Works across OpenAI, Anthropic, Google Vertex, OpenRouter, and more
OpenRouter provides access to multiple models through a single API
Google Gemma 3 offers excellent image analysis capabilities at competitive pricing

That's it - no complex agents, no long schemas. Just image input in a few lines of code.

How to Use PydanticAI for Multimodal LLMs

Setup

Comprehensive Image Analysis Example

Key Takeaways

Subscribe to my newsletter

Stephen Collins

Stephen Collins