Pydantic AI & Docker Model Runner - First Contact

This blog post is the first in a series where we'll discover how to use Pydantic AI with Docker Model Runner to interact with LLMs (hopefully small LLMs).

Let's fetch a model from Hugging Face

With Docker Model Runner, we can easily use models from Hugging Face (in GGUF format). For this article, we'll use the Menlo/Jan-nano-gguf model.

Of course, nothing prevents you from using another model, but this one is lightweight and quick to download.

  • Go here: https://huggingface.co/Menlo/Jan-nano-gguf

  • Click on "Use this model" and select "Docker Model Runner".

  • Select the "Q4_K_M" version and click "Copy".

In a terminal, paste the copied command and run it:

docker model run hf.co/Menlo/Jan-nano-gguf:Q4_K_M

Wait a bit, the model will download and start. You should see something like:

Unable to find model 'hf.co/Menlo/Jan-nano-gguf:Q4_K_M' locally. Pulling from the server.
Downloaded 2.50GB of 2.50GB
Model pulled successfully
Interactive chat mode started. Type '/bye' to exit.

Exit interactive mode by typing /bye. It's time to test the model with Pydantic AI.

Python project initialization

Create a folder for your project and initialize a virtual environment:

mkdir pydantic-discovery
cd pydantic-discovery
python -m venv venv
source venv/bin/activate
pip install --upgrade pip
# To exit the virtual environment:
# deactivate

Dependencies installation

Then, install Pydantic AI:

pip install pydantic-ai

First program

Docker Model Runner provides an OpenAI-compatible API, so we'll define our model as follows:

model = OpenAIModel(
    "hf.co/menlo/jan-nano-gguf:q4_k_m",
    provider=OpenAIProvider(
        base_url="http://localhost:12434/engines/llama.cpp/v1"
    )
)
  • "hf.co/menlo/jan-nano-gguf:q4_k_m" is the model ID (you can find it with the docker model list command).

  • base_url="http://localhost:12434/engines/llama.cpp/v1" is the Docker Model Runner API URL.

Then we simply need to create an agent with this model, as follows:

agent = Agent(
    model,
    system_prompt="You are Bob."
)

Next, we can write a program to interact with the model. This program will execute a synchronous chat completion agent.run('Who are you?'), then a second one in streaming mode agent.run_stream('[Brief] make a summary of this book: "We Are Legion (We Are Bob)".'):

import asyncio
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIModel
from pydantic_ai.providers.openai import OpenAIProvider

model = OpenAIModel(
    "hf.co/menlo/jan-nano-gguf:q4_k_m",
    provider=OpenAIProvider(
        base_url="http://localhost:12434/engines/llama.cpp/v1"
    )
)

agent = Agent(
    model,
    system_prompt="You are Bob."
)

async def main():
    result = await agent.run('Who are you?')
    print(result.output)

    async with agent.run_stream('[Brief] make a summary of this book: "We Are Legion (We Are Bob)".') as response:
        async for text in response.stream_text(delta=True):
            print(text, end='', flush=True)

if __name__ == '__main__':
    asyncio.run(main())

Run this program with:

python main.py

You should get output similar to:

I'm Bob. I'm an AI assistant designed to help with a wide range of tasks, from answering questions and providing information to creating content and supporting creative endeavors. I can engage in conversations, offer explanations, and assist with problem-solving. How can I help you today?
**Summary of "We Are Legion (We Are Bob)":**  
*We Are Legion (We Are Bob)* is a surreal, self-reflective, and often chaotic novel by Dave Eggers, written in the first person by a character named Bob. The story is a meta-narrative that explores the nature of identity, consciousness, and the boundaries between the self and the collective.  

The book is structured as a series of fragmented, often nonsensical entries that resemble diary entries, emails, and other forms of communication. The protagonist, Bob, is a writer who is both the author and the character, creating a recursive loop where the reader is constantly aware that they are reading a book about a person who is also writing a book. This meta-layering creates a dizzying, self-referential experience.  

Throughout the story, Bob grapples with existential questions: Who is he? What is the meaning of existence? How does one distinguish between reality and fiction? The narrative is filled with absurd scenarios, philosophical musings, and references to other works of literature and philosophy, including the Bible, Plato, and various modern novels.  

The book is a meditation on the fluidity of identity and the idea that "we are legion"—a phrase from the Bible that suggests that there are many individuals, each with their own unique consciousness, but all connected in some way. Bob's journey is one of self-discovery, where he comes to understand that his identity is not fixed but is instead shaped by the collective, the text, and the reader.

The content is completely wrong, but no matter, you now know how to "connect" a Docker Model Runner model with Pydantic AI and use it to make requests. And we'll see in a future article how to improve the quality of responses.

Dockerizing our program

To Dockerize our program, we'll use the "Agentic Docker Compose" features.

Compose file

Here's our Compose file compose.yml:

services:
  pydantic-agent:
    build:
      context: .
      dockerfile: Dockerfile
    command: python main.py
    models:
      jan-nano:
        endpoint_var: MODEL_RUNNER_BASE_URL
        model_var: CHAT_MODEL

models:
  jan-nano:
    model: hf.co/menlo/jan-nano-gguf:q4_k_m

With the models section, we ask Docker Compose to download the hf.co/menlo/jan-nano-gguf:q4_k_m model (if not already done) and make it available in the container via the MODEL_RUNNER_BASE_URL environment variable for the Docker Model Runner API and CHAT_MODEL for the model name ("hf.co/menlo/jan-nano-gguf:q4_k_m").

Dockerfile

We need a Python environment to run our program, but before creating the Dockerfile, we need to create a requirements.txt file with the necessary dependencies:

pydantic-ai==0.7.2

Then, we can create the Dockerfile:

FROM python:3.13-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

RUN adduser --disabled-password --gecos "" myuser && \
    chown -R myuser:myuser /app

COPY main.py .

USER myuser

ENV PATH="/home/myuser/.local/bin:$PATH"

This Dockerfile creates a lightweight and secure Python image:

  • FROM python:3.13-slim: Uses a lightweight version of Python 3.13

  • WORKDIR /app: Sets /app as the working directory

  • COPY requirements.txt .: Copies the dependencies file

  • RUN pip install: Installs the Python packages listed in requirements.txt

  • RUN adduser: Creates a non-privileged user "myuser" and gives them rights over /app

  • COPY main.py .: Copies the main script

  • USER myuser: Switches to the non-privileged user (security)

  • ENV PATH: Adds the local binaries folder to PATH

All that's left is to build and run our application:

docker compose up --no-log-prefix --build

And you'll get output similar to the previous one, but this time containerized.

In the next blog post, we'll see how to have some influence on the model's responses with the "Hawaiian Test" (don't search too long, it's the test I do to check if I'll get along well with an LLM or not).

0
Subscribe to my newsletter

Read articles from Philippe Charrière directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Philippe Charrière
Philippe Charrière