Emitting Fully Nested Python Schemas and Models with SQLModel


Here's another technical post.
I ran into an interesting problem while developing some tools recently. I had a nested Pydantic structure that needed migration to SQLModel, but I couldn't sacrifice the ability to emit fully nested schemas and models—a capability that gets lost in migration.
With the rise of AI models and the increasing need for dynamic model introspection, I discovered this is a common challenge. In fact, a variant of the solution I'm about to share was accepted as a PR to the CrewAI open source repository a while back.
SQLModel, a Quick Refresher
SQLModel is an elegant library that connects SQLAlchemy and Pydantic, offering a clean interface for defining and interacting with SQL databases in Python. As with lots of tech, it's not without its controversy, but I've generally found it useful, which is good for helping me get things done. Surprisingly, though, handling nested structures doesn't work seamlessly out of the box.
Here's my solution to this gap.
Key Concepts When Integrating SQLModel and Pydantic
Before going into the code, let's understand some important distinctions:
class TeamBase(SQLModel):
...
versus
class Team(TeamBase, table=True):
...
The former is essentially a Pydantic model—used for data modeling and schema validation. This is where your pure data fields belong, without database IDs, primary keys, or relationships. (table = False --> Pydantic)
The latter, with table=True, is destined for database persistence. It's aware of both Pydantic and SQLAlchemy, so this is where you add database IDs, primary keys, and relationships. (table = True --> Pydantic + SQLAlchemy)
Other improvements in this integration include:
Consistent Primary Keys: SQLModel requires explicit primary key definitions, unlike Pydantic
Read Models Implementation: Each primary model gets a corresponding read model for proper serialization
Model Configuration: Setting model_config with from_attributes=True (formerly orm_mode) ensures proper attribute mapping
Field Descriptions: Adding these enhances clarity for GenAI model introspection. This is SO IMPORTANT for those doing automated RAG pipelines that need dynamic introspection. I can't say this enough.
The Solution: Three Essential Functions
My implementation centers around three key functions that work together to create a seamless experience:
create_instance(): Dynamically instantiates models from dictionaries, handling nested relationships
dump_instance(): Serializes complex model instances into nested dictionaries without infinite recursion
generate_json_schema(): Automatically produces JSON schemas for SQLModel classes
Let's look at the code that makes this possible:
from typing import Any, Dict, List, Optional, Set, Type
# Import necessary SQLAlchemy and SQLModel components
from sqlalchemy.dialects.postgresql import ARRAY # For array type support
from sqlalchemy.inspection import inspect # For introspecting model metadata
from sqlalchemy.sql.type_api import TypeEngine # For type handling
from sqlmodel import JSON, Boolean, Field, Float, Integer, Relationship, SQLModel, String
def create_instance(model: Type[SQLModel], data: Dict[str, Any]) -> SQLModel:
"""
Create an instance of a SQLModel using the provided data dictionary.
This function dynamically maps nested dictionaries to corresponding ORM models.
Args:
model (Type[SQLModel]): The SQLModel class to instantiate.
data (Dict[str, Any]): A dictionary containing the data to populate the model.
Returns:
SQLModel: An instance of the model populated with the provided data.
Example:
hero = create_instance(Hero, {"name": "Jason", "secret_name": "Tree"})
"""
# Get the SQLAlchemy mapper for introspection
mapper = inspect(model)
# Extract column and relationship information
columns = {column.key: column for column in mapper.columns} # Direct model attributes
relationships = {rel.key: rel for rel in mapper.relationships} # Related models
# Prepare data for instantiation
instance_data = {}
for key, value in (data or {}).items():
if key in columns:
# Handle direct attributes
instance_data[key] = value
elif key in relationships:
# Handle relationships (nested models)
related_model = relationships[key].mapper.class_
if isinstance(value, list):
# Handle one-to-many relationships (lists of related objects)
instance_data[key] = [
create_instance(related_model, v) for v in value if v is not None
]
elif value is not None:
# Handle one-to-one relationships (single nested object)
instance_data[key] = create_instance(related_model, value)
else:
# Handle null values based on relationship type
instance_data[key] = [] if relationships[key].uselist else None
# Create and return the model instance
return model(**instance_data)
def dump_instance(instance: SQLModel, processed_ids: Optional[Set[int]] = None) -> Dict[str, Any]:
"""
Recursively serialize a SQLModel instance into a dictionary, including nested relationships.
Args:
instance (SQLModel): The SQLModel instance to serialize.
processed_ids (Optional[Set[int]]): Set of processed instance IDs to avoid infinite recursion.
Returns:
Dict[str, Any]: A dictionary representation of the instance.
Example:
data_dict = dump_instance(hero_instance)
"""
# Initialize tracking set for circular references if not provided
if processed_ids is None:
processed_ids = set()
# Get unique identifier for this instance
instance_id = id(instance)
# Check if we've already processed this instance (prevents circular reference issues)
if instance_id in processed_ids:
return {} # Return empty dict to break circular references
# Mark this instance as processed
processed_ids.add(instance_id)
# Get model metadata through SQLAlchemy inspection
mapper = inspect(instance.__class__)
columns = {column.key: column for column in mapper.columns}
relationships = {rel.key: rel for rel in mapper.relationships}
# Build the dictionary representation
data = {}
# Add all column values to the dictionary
for key in columns:
data[key] = getattr(instance, key)
# Process relationships (the nested part of nested schemas)
for key, relationship in relationships.items():
related_data = getattr(instance, key)
if relationship.uselist:
# Handle one-to-many relationships (lists)
data[key] = [
dump_instance(rel_instance, processed_ids) for rel_instance in related_data
]
else:
# Handle one-to-one relationships (single object)
data[key] = dump_instance(related_data, processed_ids) if related_data else None
return data
def generate_json_schema(
model: Type[SQLModel], processed_models: Optional[Set[str]] = None
) -> Dict[str, Any]:
"""
Generate a JSON schema for a SQLModel class, including nested relationships.
Args:
model (Type[SQLModel]): The SQLModel class to generate the schema for.
processed_models (Optional[Set[str]]): Set of already processed models to prevent recursion.
Returns:
Dict[str, Any]: The JSON schema as a dictionary.
Example:
schema = generate_json_schema(Hero)
"""
# Initialize tracking set for circular references if not provided
if processed_models is None:
processed_models = set()
# Get model metadata through SQLAlchemy inspection
mapper = inspect(model)
columns = {column.key: column for column in mapper.columns}
relationships = {rel.key: rel for rel in mapper.relationships}
# Initialize the schema structure according to JSON Schema standard
schema = {"title": model.__name__, "type": "object", "properties": {}, "required": []}
# Track this model to prevent circular references
processed_models.add(model.__name__)
# Process each column (direct attributes)
for key, column in columns.items():
# Skip ID columns - typically we don't include these in schemas
if key == "id":
continue
# Get Pydantic field info for additional metadata
field_info = model.__fields__.get(key)
# Map SQLAlchemy type to JSON Schema type
json_type = _get_json_type(column.type)
# Create the property schema
property_schema = {"type": json_type}
# Add description from field if available (important for GenAI integration)
if field_info and field_info.description:
property_schema["description"] = field_info.description
# Add the property to the schema
schema["properties"][key] = property_schema
# If field is required, add to required list
if not column.nullable:
schema["required"].append(key)
# Process each relationship (nested models)
for key, relationship in relationships.items():
related_model = relationship.mapper.class_
# Only process models we haven't seen yet to prevent circular references
if related_model.__name__ not in processed_models:
if relationship.uselist:
# Handle one-to-many relationships (lists/arrays in JSON Schema)
schema["properties"][key] = {
"type": "array",
"items": generate_json_schema(related_model, processed_models),
}
else:
# Handle one-to-one relationships (nested objects)
schema["properties"][key] = generate_json_schema(related_model, processed_models)
return schema
def _get_json_type(sqlalchemy_type: TypeEngine) -> str:
"""
Map SQLAlchemy types to JSON schema types.
Args:
sqlalchemy_type (TypeEngine): The SQLAlchemy type.
Returns:
str: The corresponding JSON schema type.
"""
# This helper function maps SQLAlchemy column types to JSON Schema types
# JSON Schema has limited primitive types: string, number, integer, boolean, array, object
if isinstance(sqlalchemy_type, Integer):
return "integer" # Maps to JSON Schema integer
elif isinstance(sqlalchemy_type, Float):
return "number" # Maps to JSON Schema number (floating point)
elif isinstance(sqlalchemy_type, String):
return "string" # Maps to JSON Schema string
elif isinstance(sqlalchemy_type, Boolean):
return "boolean" # Maps to JSON Schema boolean
elif isinstance(sqlalchemy_type, JSON):
return "object" # Maps to JSON Schema object
elif isinstance(sqlalchemy_type, ARRAY):
return "array" # Maps to JSON Schema array
else:
# Default to string for any other types we don't explicitly handle
# You could extend this function to handle more types as needed
return "string"
Model Definitions
The model definitions show how to structure your base classes and table models:
# Base Models (Pydantic-like, for data validation)
class TeamBase(SQLModel):
"""
Base class for Team model, containing common fields.
Attributes:
name (str): The name of the team.
headquarters (Optional[str]): The headquarters location of the team.
"""
# Define data fields (no database-specific fields)
name: str = Field(index=True) # Create an index for faster lookups
headquarters: Optional[str] = Field(default=None) # Optional field with default value
# Enable ORM mode (formerly orm_mode) to work with SQLAlchemy objects
model_config = {"from_attributes": True} # Allows model creation from ORM objects. Ie, allows you hydrate models from disk.
class HeroBase(SQLModel):
"""
Base class for Hero model, containing common fields.
Attributes:
name (str): The name of the hero.
secret_name (Optional[str]): The secret name of the hero.
age (Optional[int]): The age of the hero.
team_id (Optional[int]): The ID of the team the hero belongs to.
"""
# Define data fields with appropriate types and constraints
name: str = Field(index=True)
secret_name: Optional[str] = Field(default=None)
age: Optional[int] = Field(default=None, index=True) # Index for fast filtering by age
# Foreign key reference to the team table
team_id: Optional[int] = Field(default=None, foreign_key="team.id")
# Enable ORM mode
model_config = {"from_attributes": True}
# Database Models (SQLAlchemy-like, for persistence)
class Team(TeamBase, table=True): # table=True makes this a database table
"""
Team model representing a team in the database.
Attributes:
id (Optional[int]): The primary key of the team.
heroes (List["Hero"]): List of heroes belonging to the team.
"""
# Database-specific fields
id: Optional[int] = Field(default=None, primary_key=True) # Primary key for the table
# Relationship definition for the ORM
heroes: List["Hero"] = Relationship(
sa_relationship_kwargs={"lazy": "selectin"}, # Eager loading for performance
back_populates="team", # Bidirectional relationship
)
class Hero(HeroBase, table=True): # table=True makes this a database table
"""
Hero model representing a hero in the database.
Attributes:
id (Optional[int]): The primary key of the hero.
team (Optional["Team"]): The team the hero belongs to.
"""
# Database-specific fields
id: Optional[int] = Field(default=None, primary_key=True) # Primary key for the table
# Relationship definition - the other side of the bidirectional relationship
team: Optional["Team"] = Relationship(back_populates="heroes")
Putting It All Together
Here's how to use these functions in practice (note, the SQLModel docs use the "Hero" classes constantly as examples, so I continue that here for those familiar):
if __name__ == "__main__":
from pprint import pprint
# EXAMPLE 1: Creating nested model instances from dictionaries
# Example data with nested team information
hero_data = {
"name": "Jason",
"secret_name": "Tree",
# Notice how we can nest a full team object directly in the dictionary
"team": {"name": "A-Team", "headquarters": "Austin, TX"},
}
# Another example with age information
another_hero_data = {
"name": "Sophia",
"secret_name": "Shadow",
"age": 28,
"team": {"name": "Shadow Squad", "headquarters": "New York, NY"},
}
# Create Hero instances using our custom function that handles nested data
hero = create_instance(Hero, hero_data)
another_hero = create_instance(Hero, another_hero_data)
# EXAMPLE 2: Serializing model instances to nested dictionaries
print("Dumped Hero Instance:")
# Using our custom function to dump nested model instances
pprint(dump_instance(hero), width=80)
print("\nDumped Another Hero Instance:")
pprint(dump_instance(another_hero), width=80)
# EXAMPLE 3: Generating JSON schema for model validation
print("\nGenerated JSON Schema for Hero Model:")
# Generate a complete JSON schema that includes nested relationships
pprint(generate_json_schema(Hero), width=80)
# The resulting schema can be used for:
# - API documentation (e.g., OpenAPI/Swagger)
# - Client-side validation
# - Dynamic form generation
# - GenAI model introspection
So, now you can create code that dumps nested JSON schema for models. Now, you can parse that or share that with LLMs to allow them to hydrate objects from unstructured data for you. Also, it should now be clear why those description fields help! I found this technique incredibly useful when LLMs were not good at hydrating models from data + schema. The above helps significantly.
This approach is particularly valuable if you're:
Building RESTful APIs with nested resources
Creating data validation layers for complex structures
Working with AI models that need rich schema information
Managing relational data with complex hierarchies
The ability to emit fully nested schemas and models from SQLModel closes a gap in the Python ecosystem, giving you the best of both Pydantic and SQLAlchemy without compromise.
I hope this solution helps anyone wrestling with similar challenges. While it ended up being more code than I initially expected, I've kept it as concise as possible while maintaining correctness and providing clear examples.
What complex data structure challenges have you faced in your projects? I'd love to hear your experiences in the comments.
Subscribe to my newsletter
Read articles from Jason Vertrees directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
