Disclaimer: This guide offers an alternative explanation to the official SQLAlchemy documentation on subqueries, providing more visual examples and simpler explanations. The information is based on SQLAlchemy==2.0.30, and newer versions or features may affect its relevance over time.

Solution Overview

If you're only interested in the solution, here it is:

# Build the main query
query: Query = session.query(Post)

# Build the subquery
subquery = (
    self.session.query(Reaction)
    .filter(Reaction.user_id == user_id)
    .subquery()
)

# Join with the subquery
query = query.outerjoin(subquery, Post.id == subquery.c.post_id)
alias_reaction = aliased(Reaction, subquery)
query = query.add_entity(alias_reaction)

Background

In many backend systems, we have different models for validation (Pydantic) and for models in database representation (SQLAlchemy). Although some libraries like SQLModel attempt to bridge this gap, they are still in early stages.

Consider a scenario where you have two SQLAlchemy models: Post (P) and Reaction (R). Here, P represents a social network post, and R represents a user's reaction to a post.

Suppose you need to query all P along with the R of a specific User (U). Directly joining all posts with all reactions and filtering by user_id may be inefficient due to the large volume of data. Instead, you can optimize this by filtering reactions first and then performing the join.

The SQL for this optimized approach looks like:

SELECT *
FROM 
  posts p
LEFT JOIN 
  (SELECT * 
   FROM reactions 
   WHERE user_id = {{user_id}}
  ) r
ON 
  r.post_id = p.id

The query itself it's not hard, but what I have found difficult was to found a way of keeping both models when using SQLAlchemy.

First we do a subquery, this would be equivalent to the subquery we mentioned before to speed up the join operation.

# Build the main query
query: Query = session.query(Post)

# Build the subquery
subquery = (
    self.session.query(Reaction)
    .filter(Reaction.user_id == user_id)
    .subquery()
)

This part it's not something hard and we can easily find documentation about it. The tricky part comes now, I had a hard momment finding a way to return both models in one single return. I could have returned the model P and the all the fields from R, but remember, we want a return with only both models, so it's clean and concise.

After some investigation I discovered that you need to alias the subquery, what I mean by this is that you need to specify the model that will be returned and then add the entity to the original query. Below you can see how it's done.

# Join with the subquery
query = query.outerjoin(subquery, Post.id == subquery.c.post_id)
alias_reaction = aliased(Reaction, subquery)
query = query.add_entity(alias_reaction)
return query.all()

Then we just do:

post_models: list[PostModel] = [PostModel.model_validate(post) for post, _ in posts]
reactions_models: list[ReactionModel] = [ReactionModel.model_validate(reaction) for _, reaction in reactions]

It's interesting to understand why this happens, when you do SELECT * behind it deconstructs the model and returns you all the fields. But now in SQLAlchemy you don't want to lose these models as you may want to return them, and that's basically the reason.

It was an interesting investigation that was not easy to find and I wanted to share with all of you.

How to Recover the Model from a Subquery with SQLAlchemy

Solution Overview

Background

Subscribe to my newsletter

Cosmin Marian Paduraru

Cosmin Marian Paduraru