Emitting Fully Nested Python Schemas and Models with SQLModel

Jason VertreesJason Vertrees
10 min read

Here's another technical post.

I ran into an interesting problem while developing some tools recently. I had a nested Pydantic structure that needed migration to SQLModel, but I couldn't sacrifice the ability to emit fully nested schemas and models—a capability that gets lost in migration.

With the rise of AI models and the increasing need for dynamic model introspection, I discovered this is a common challenge. In fact, a variant of the solution I'm about to share was accepted as a PR to the CrewAI open source repository a while back.

SQLModel, a Quick Refresher

SQLModel is an elegant library that connects SQLAlchemy and Pydantic, offering a clean interface for defining and interacting with SQL databases in Python. As with lots of tech, it's not without its controversy, but I've generally found it useful, which is good for helping me get things done. Surprisingly, though, handling nested structures doesn't work seamlessly out of the box.

Here's my solution to this gap.

Key Concepts When Integrating SQLModel and Pydantic

Before going into the code, let's understand some important distinctions:

class TeamBase(SQLModel):
    ...

versus

class Team(TeamBase, table=True):
    ...

The former is essentially a Pydantic model—used for data modeling and schema validation. This is where your pure data fields belong, without database IDs, primary keys, or relationships. (table = False --> Pydantic)

The latter, with table=True, is destined for database persistence. It's aware of both Pydantic and SQLAlchemy, so this is where you add database IDs, primary keys, and relationships. (table = True --> Pydantic + SQLAlchemy)

Other improvements in this integration include:

  • Consistent Primary Keys: SQLModel requires explicit primary key definitions, unlike Pydantic

  • Read Models Implementation: Each primary model gets a corresponding read model for proper serialization

  • Model Configuration: Setting model_config with from_attributes=True (formerly orm_mode) ensures proper attribute mapping

  • Field Descriptions: Adding these enhances clarity for GenAI model introspection. This is SO IMPORTANT for those doing automated RAG pipelines that need dynamic introspection. I can't say this enough.

The Solution: Three Essential Functions

My implementation centers around three key functions that work together to create a seamless experience:

  1. create_instance(): Dynamically instantiates models from dictionaries, handling nested relationships

  2. dump_instance(): Serializes complex model instances into nested dictionaries without infinite recursion

  3. generate_json_schema(): Automatically produces JSON schemas for SQLModel classes

Let's look at the code that makes this possible:

from typing import Any, Dict, List, Optional, Set, Type

# Import necessary SQLAlchemy and SQLModel components
from sqlalchemy.dialects.postgresql import ARRAY  # For array type support
from sqlalchemy.inspection import inspect  # For introspecting model metadata
from sqlalchemy.sql.type_api import TypeEngine  # For type handling
from sqlmodel import JSON, Boolean, Field, Float, Integer, Relationship, SQLModel, String


def create_instance(model: Type[SQLModel], data: Dict[str, Any]) -> SQLModel:
    """
    Create an instance of a SQLModel using the provided data dictionary.

    This function dynamically maps nested dictionaries to corresponding ORM models.

    Args:
        model (Type[SQLModel]): The SQLModel class to instantiate.
        data (Dict[str, Any]): A dictionary containing the data to populate the model.

    Returns:
        SQLModel: An instance of the model populated with the provided data.

    Example:
        hero = create_instance(Hero, {"name": "Jason", "secret_name": "Tree"})
    """
    # Get the SQLAlchemy mapper for introspection
    mapper = inspect(model)

    # Extract column and relationship information
    columns = {column.key: column for column in mapper.columns}  # Direct model attributes
    relationships = {rel.key: rel for rel in mapper.relationships}  # Related models

    # Prepare data for instantiation
    instance_data = {}
    for key, value in (data or {}).items():
        if key in columns:
            # Handle direct attributes
            instance_data[key] = value
        elif key in relationships:
            # Handle relationships (nested models)
            related_model = relationships[key].mapper.class_

            if isinstance(value, list):
                # Handle one-to-many relationships (lists of related objects)
                instance_data[key] = [
                    create_instance(related_model, v) for v in value if v is not None
                ]
            elif value is not None:
                # Handle one-to-one relationships (single nested object)
                instance_data[key] = create_instance(related_model, value)
            else:
                # Handle null values based on relationship type
                instance_data[key] = [] if relationships[key].uselist else None

    # Create and return the model instance
    return model(**instance_data)


def dump_instance(instance: SQLModel, processed_ids: Optional[Set[int]] = None) -> Dict[str, Any]:
    """
    Recursively serialize a SQLModel instance into a dictionary, including nested relationships.

    Args:
        instance (SQLModel): The SQLModel instance to serialize.
        processed_ids (Optional[Set[int]]): Set of processed instance IDs to avoid infinite recursion.

    Returns:
        Dict[str, Any]: A dictionary representation of the instance.

    Example:
        data_dict = dump_instance(hero_instance)
    """
    # Initialize tracking set for circular references if not provided
    if processed_ids is None:
        processed_ids = set()

    # Get unique identifier for this instance
    instance_id = id(instance)

    # Check if we've already processed this instance (prevents circular reference issues)
    if instance_id in processed_ids:
        return {}  # Return empty dict to break circular references

    # Mark this instance as processed
    processed_ids.add(instance_id)

    # Get model metadata through SQLAlchemy inspection
    mapper = inspect(instance.__class__)
    columns = {column.key: column for column in mapper.columns}
    relationships = {rel.key: rel for rel in mapper.relationships}

    # Build the dictionary representation
    data = {}

    # Add all column values to the dictionary
    for key in columns:
        data[key] = getattr(instance, key)

    # Process relationships (the nested part of nested schemas)
    for key, relationship in relationships.items():
        related_data = getattr(instance, key)

        if relationship.uselist:
            # Handle one-to-many relationships (lists)
            data[key] = [
                dump_instance(rel_instance, processed_ids) for rel_instance in related_data
            ]
        else:
            # Handle one-to-one relationships (single object)
            data[key] = dump_instance(related_data, processed_ids) if related_data else None

    return data


def generate_json_schema(
    model: Type[SQLModel], processed_models: Optional[Set[str]] = None
) -> Dict[str, Any]:
    """
    Generate a JSON schema for a SQLModel class, including nested relationships.

    Args:
        model (Type[SQLModel]): The SQLModel class to generate the schema for.
        processed_models (Optional[Set[str]]): Set of already processed models to prevent recursion.

    Returns:
        Dict[str, Any]: The JSON schema as a dictionary.

    Example:
        schema = generate_json_schema(Hero)
    """
    # Initialize tracking set for circular references if not provided
    if processed_models is None:
        processed_models = set()

    # Get model metadata through SQLAlchemy inspection
    mapper = inspect(model)
    columns = {column.key: column for column in mapper.columns}
    relationships = {rel.key: rel for rel in mapper.relationships}

    # Initialize the schema structure according to JSON Schema standard
    schema = {"title": model.__name__, "type": "object", "properties": {}, "required": []}

    # Track this model to prevent circular references
    processed_models.add(model.__name__)

    # Process each column (direct attributes)
    for key, column in columns.items():
        # Skip ID columns - typically we don't include these in schemas
        if key == "id":
            continue

        # Get Pydantic field info for additional metadata
        field_info = model.__fields__.get(key)

        # Map SQLAlchemy type to JSON Schema type
        json_type = _get_json_type(column.type)

        # Create the property schema
        property_schema = {"type": json_type}

        # Add description from field if available (important for GenAI integration)
        if field_info and field_info.description:
            property_schema["description"] = field_info.description

        # Add the property to the schema
        schema["properties"][key] = property_schema

        # If field is required, add to required list
        if not column.nullable:
            schema["required"].append(key)

    # Process each relationship (nested models)
    for key, relationship in relationships.items():
        related_model = relationship.mapper.class_

        # Only process models we haven't seen yet to prevent circular references
        if related_model.__name__ not in processed_models:
            if relationship.uselist:
                # Handle one-to-many relationships (lists/arrays in JSON Schema)
                schema["properties"][key] = {
                    "type": "array",
                    "items": generate_json_schema(related_model, processed_models),
                }
            else:
                # Handle one-to-one relationships (nested objects)
                schema["properties"][key] = generate_json_schema(related_model, processed_models)

    return schema


def _get_json_type(sqlalchemy_type: TypeEngine) -> str:
    """
    Map SQLAlchemy types to JSON schema types.

    Args:
        sqlalchemy_type (TypeEngine): The SQLAlchemy type.

    Returns:
        str: The corresponding JSON schema type.
    """
    # This helper function maps SQLAlchemy column types to JSON Schema types
    # JSON Schema has limited primitive types: string, number, integer, boolean, array, object

    if isinstance(sqlalchemy_type, Integer):
        return "integer"  # Maps to JSON Schema integer
    elif isinstance(sqlalchemy_type, Float):
        return "number"   # Maps to JSON Schema number (floating point)
    elif isinstance(sqlalchemy_type, String):
        return "string"   # Maps to JSON Schema string
    elif isinstance(sqlalchemy_type, Boolean):
        return "boolean"  # Maps to JSON Schema boolean
    elif isinstance(sqlalchemy_type, JSON):
        return "object"   # Maps to JSON Schema object
    elif isinstance(sqlalchemy_type, ARRAY):
        return "array"    # Maps to JSON Schema array
    else:
        # Default to string for any other types we don't explicitly handle
        # You could extend this function to handle more types as needed
        return "string"

Model Definitions

The model definitions show how to structure your base classes and table models:

# Base Models (Pydantic-like, for data validation)
class TeamBase(SQLModel):
    """
    Base class for Team model, containing common fields.

    Attributes:
        name (str): The name of the team.
        headquarters (Optional[str]): The headquarters location of the team.
    """
    # Define data fields (no database-specific fields)
    name: str = Field(index=True)  # Create an index for faster lookups
    headquarters: Optional[str] = Field(default=None)  # Optional field with default value

    # Enable ORM mode (formerly orm_mode) to work with SQLAlchemy objects
    model_config = {"from_attributes": True}  # Allows model creation from ORM objects. Ie, allows you hydrate models from disk.


class HeroBase(SQLModel):
    """
    Base class for Hero model, containing common fields.

    Attributes:
        name (str): The name of the hero.
        secret_name (Optional[str]): The secret name of the hero.
        age (Optional[int]): The age of the hero.
        team_id (Optional[int]): The ID of the team the hero belongs to.
    """
    # Define data fields with appropriate types and constraints
    name: str = Field(index=True)
    secret_name: Optional[str] = Field(default=None)
    age: Optional[int] = Field(default=None, index=True)  # Index for fast filtering by age

    # Foreign key reference to the team table
    team_id: Optional[int] = Field(default=None, foreign_key="team.id")

    # Enable ORM mode
    model_config = {"from_attributes": True}


# Database Models (SQLAlchemy-like, for persistence)
class Team(TeamBase, table=True):  # table=True makes this a database table
    """
    Team model representing a team in the database.

    Attributes:
        id (Optional[int]): The primary key of the team.
        heroes (List["Hero"]): List of heroes belonging to the team.
    """
    # Database-specific fields
    id: Optional[int] = Field(default=None, primary_key=True)  # Primary key for the table

    # Relationship definition for the ORM
    heroes: List["Hero"] = Relationship(
        sa_relationship_kwargs={"lazy": "selectin"},  # Eager loading for performance
        back_populates="team",  # Bidirectional relationship
    )


class Hero(HeroBase, table=True):  # table=True makes this a database table
    """
    Hero model representing a hero in the database.

    Attributes:
        id (Optional[int]): The primary key of the hero.
        team (Optional["Team"]): The team the hero belongs to.
    """
    # Database-specific fields
    id: Optional[int] = Field(default=None, primary_key=True)  # Primary key for the table

    # Relationship definition - the other side of the bidirectional relationship
    team: Optional["Team"] = Relationship(back_populates="heroes")

Putting It All Together

Here's how to use these functions in practice (note, the SQLModel docs use the "Hero" classes constantly as examples, so I continue that here for those familiar):

if __name__ == "__main__":
    from pprint import pprint

    # EXAMPLE 1: Creating nested model instances from dictionaries

    # Example data with nested team information
    hero_data = {
        "name": "Jason",
        "secret_name": "Tree",
        # Notice how we can nest a full team object directly in the dictionary
        "team": {"name": "A-Team", "headquarters": "Austin, TX"},
    }

    # Another example with age information
    another_hero_data = {
        "name": "Sophia",
        "secret_name": "Shadow",
        "age": 28,
        "team": {"name": "Shadow Squad", "headquarters": "New York, NY"},
    }

    # Create Hero instances using our custom function that handles nested data
    hero = create_instance(Hero, hero_data)
    another_hero = create_instance(Hero, another_hero_data)

    # EXAMPLE 2: Serializing model instances to nested dictionaries

    print("Dumped Hero Instance:")
    # Using our custom function to dump nested model instances
    pprint(dump_instance(hero), width=80)

    print("\nDumped Another Hero Instance:")
    pprint(dump_instance(another_hero), width=80)

    # EXAMPLE 3: Generating JSON schema for model validation

    print("\nGenerated JSON Schema for Hero Model:")
    # Generate a complete JSON schema that includes nested relationships
    pprint(generate_json_schema(Hero), width=80)

    # The resulting schema can be used for:
    # - API documentation (e.g., OpenAPI/Swagger)
    # - Client-side validation
    # - Dynamic form generation
    # - GenAI model introspection

So, now you can create code that dumps nested JSON schema for models. Now, you can parse that or share that with LLMs to allow them to hydrate objects from unstructured data for you. Also, it should now be clear why those description fields help! I found this technique incredibly useful when LLMs were not good at hydrating models from data + schema. The above helps significantly.

This approach is particularly valuable if you're:

  • Building RESTful APIs with nested resources

  • Creating data validation layers for complex structures

  • Working with AI models that need rich schema information

  • Managing relational data with complex hierarchies

The ability to emit fully nested schemas and models from SQLModel closes a gap in the Python ecosystem, giving you the best of both Pydantic and SQLAlchemy without compromise.

I hope this solution helps anyone wrestling with similar challenges. While it ended up being more code than I initially expected, I've kept it as concise as possible while maintaining correctness and providing clear examples.

What complex data structure challenges have you faced in your projects? I'd love to hear your experiences in the comments.

0
Subscribe to my newsletter

Read articles from Jason Vertrees directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Jason Vertrees
Jason Vertrees