As developers working with Large Language Models (LLMs), we often grapple with the challenge of constraining their outputs to meet our specific needs. In this post, I will share a technique I developed to enforce structured responses from LLMs, particularly for dynamic classification tasks.

The Challenge: Dynamic Classification

Recently, I was working on a project that required an LLM to categorize input strings into a set of predefined categories. The catch? The set of categories varied with each API call. Here's a simplified version of the prompt I was using:

You are a helpful categorizer. I will give you a string and you should 
tell me which category the string falls into. Only respond with the 
category.

<inputString>xxxx</inputString>
<categories>
category 1
category 2
...
</categories>

In my real-world project, the categories were often complex and numerous. I soon ran into a problem: the LLM would sometimes hallucinate fake category names that weren't in the provided list. I tried iterating on the prompt itself but had limited success in completely eliminating these hallucinations.

Enter Structured Output and Pydantic

To solve this issue, I turned to structured output techniques. Many libraries exist for this purpose, but with OpenAI's introduction of structured output in their API, the process has become even more straightforward. Like many solutions in this space, it leverages Pydantic to define the desired output structure.

Initially, I used a simple Pydantic model to structure the output:

class CategorizationStructure(pydantic.BaseModel):
    reasoning: str = pydantic.Field(
        description="The reasoning process behind the categorization decision.")
    category: str = pydantic.Field(
        description="The final category selected for the product.")

This worked to enforce the structure, but it didn't prevent the LLM from inventing categories. I still needed a way to dynamically create a model that would constrain the output to the specified categories.

The Solution: Dynamic Pydantic Models

To solve this, I developed a method to generate Pydantic models dynamically for each LLM call. Here's the final solution:

import enum
import pydantic

def create_category_enum(categories):
    return enum.Enum("CategoryEnum", 
               {category: category for category in categories})

def create_categorization_structure(categories):
    return pydantic.create_model(
        "CategorizationStructure",
        reasoning=(
            str,
            pydantic.Field(
                description="The reasoning process behind the categorization decision.",
            ),
        ),
        category=(
            create_category_enum(categories),
            pydantic.Field(
                description="The final category selected for the product."),
        ),
    )

def categorize_product(candidate_categories, product_text):
    return client.run_prompt(
        categorization_prompt.format(
            categories="\n".join(candidate_categories), 
            product=product_text
        ),
        response_structure=create_categorization_structure(candidate_categories)
    ).category

Let's break down how this solution works:

create_category_enum() dynamically creates an Enum from the provided categories. This ensures that only the specified categories are valid options
create_categorization_structure() uses Pydantic's create_model() function to dynamically generate a model for each set of categories. This constrains the LLM's output to the specific categories we provide.
categorize_product() ties it all together, creating the structure and passing it to the LLM API call.

This solution evolved through several iterations. Initially, I tried creating a dynamic class definition, which was functional but triggered linting warnings (Call expression not allowed in type expression). The final version using pydantic.create_model() achieves the same result while satisfying the linter, resulting in cleaner, warning-free code.

Conclusion

By leveraging Python's dynamic capabilities and Pydantic's flexible model creation, we created a robust system that enforces structured output from LLMs while adapting to varying classification scenarios. This approach combines the power of Pydantic for data validation with the flexibility needed for real-world applications, all while maintaining code quality and linter satisfaction.

As LLMs continue to evolve, techniques like these will be crucial in bridging the gap between their vast capabilities and our specific requirements. Happy coding!

Taming LLM Responses: Dynamic Pydantic Models for Flexible Structured Output

The Challenge: Dynamic Classification

Enter Structured Output and Pydantic

The Solution: Dynamic Pydantic Models

Conclusion

Subscribe to my newsletter

Dan Girellini

Dan Girellini