How to Use PydanticAI for Multimodal LLMs

Stephen CollinsStephen Collins
3 min read

Large multimodal models like Google Gemma 3 and Claude Opus 4 can now reason over text and images. But if you've looked at the docs, it's easy to get lost in agents, tools, and structured outputs before you even get to "Hello, World."

This post is the short version—how to pass an image (or URL) into a PydanticAI agent in just a few lines.

Updated Version: This is the newer, simplified guide from PydanticAI's updated version. For the original comprehensive version with structured outputs, agents, and testing, see my complete PydanticAI guide.

All the code is available on GitHub.


Setup

If you don't already have uv, install it first (it's a fast Python package manager):

curl -LsSf https://astral.sh/uv/install.sh | sh

Then install PydanticAI and dependencies:

uv add pydantic-ai python-dotenv

You'll also need an OpenRouter API key. Sign up at OpenRouter and get your API key.

Create a .env file:

OPENROUTER_API_KEY=YOUR_OPENROUTER_API_KEY

Now you're ready to run the examples.


Comprehensive Image Analysis Example

Here's a complete example that demonstrates both remote and local image handling with PydanticAI:

"""
Comprehensive Image Analysis Example
Demonstrates both remote and local image handling with pydantic-ai
"""

import os
from pydantic_ai import Agent, ImageUrl, BinaryContent
from pydantic_ai.models.openai import OpenAIModel
from pydantic_ai.providers.openrouter import OpenRouterProvider
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Initialize the model
model = OpenAIModel(
    model_name="google/gemma-3-4b-it",
    provider=OpenRouterProvider(api_key=os.getenv("OPENROUTER_API_KEY"))
)

# Create the agent
agent = Agent(model=model)


def analyze_remote_image():
    """Example: Analyze a remote image using ImageUrl"""
    print("=== Remote Image Analysis ===")

    result = agent.run_sync([
        "What company is this logo from?",
        ImageUrl(url="https://iili.io/3Hs4FMg.png"),
    ])

    print(f"Remote image analysis result: {result.output}")
    print()


def analyze_local_image():
    """Example: Analyze a local image using BinaryContent"""
    print("=== Local Image Analysis ===")

    # Read local image file
    with open("images/invoice_sample.png", "rb") as f:
        image_data = f.read()

    result = agent.run_sync([
        "What company is this logo from?",
        BinaryContent(data=image_data, media_type="image/png"),
    ])

    print(f"Local image analysis result: {result.output}")
    print()


def main():
    """Run both examples"""
    print("Pydantic AI Image Analysis Examples")
    print("=" * 40)
    print()

    # Analyze remote image
    analyze_remote_image()

    # Analyze local image
    analyze_local_image()

    print("Analysis complete!")


if __name__ == "__main__":
    main()

This example shows the two main ways to handle images:

  1. Remote images using ImageUrl - perfect for web-hosted images

  2. Local images using BinaryContent - ideal for files on your system


Key Takeaways

  • ImageUrl → fastest way to use image URLs

  • BinaryContent → for local image files or when you want to control upload

  • Works across OpenAI, Anthropic, Google Vertex, OpenRouter, and more

  • OpenRouter provides access to multiple models through a single API

  • Google Gemma 3 offers excellent image analysis capabilities at competitive pricing

That's it - no complex agents, no long schemas. Just image input in a few lines of code.

0
Subscribe to my newsletter

Read articles from Stephen Collins directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Stephen Collins
Stephen Collins

Senior Software engineer currently working with a climate-tech startup