Mastering SQL Recursion: Advanced Techniques for Handling Hierarchies, Cycles, and Optimization (Part 2)

Table of contents

Introduction
If you've ever worked with hierarchical or tree-structured data, you’ve probably hit the limitations of standard SQL. That’s where recursive SQL queries come in!
In the first article of this three-part series, we explored recursion—what it is, why it’s crucial for processing hierarchical data, and how it can be implemented in SQL. However, knowing the basics of SQL recursion may not be enough when dealing with complex or "dirty" data. Queries may take too long to execute or run indefinitely. In this article, we’ll tackle advanced recursion techniques to optimize and improve the performance of recursive SQL queries.
By the end of this article, you’ll learn:
- How to limit recursion depth to prevent infinite loops
- How to detect and handle cycles in hierarchical data
- How to optimize recursive queries
- Alternative methods to recursive CTEs for hierarchical data processing
PREREQUISITES:
- Basic SQL knowledge
- Basic SQL recursion knowledge, check out Part 1 of the series!
DISCLAIMERS:
- All examples in this article were run in a Snowflake environment (but concepts apply to PostgreSQL, SQL Server, etc.). To follow along, you can sign up for a free Snowflake trial account here.
- All code and data used in this article are available in the accompanying GitHub repository.
Understanding our dataset
We will work with a dataset containing vehicle parts manufacturing data. The dataset includes the following fields:
Column Name | Description |
Part_ID | A unique identifier for each part in the dataset |
Part_Name | A human-readable name describing the part (e.g., "Engine Block #1") |
Part_Category | The category of the part (e.g., Engine System, Transmission, etc.) |
Part_Cost | The manufacturing cost of the part in USD |
Part_Manufacturing_Duration | The time required to manufacture the part (in hours) |
Depends_On | The Part_ID of another part that this part depends on (i.e., its parent in the hierarchy). NULL means it's a root component |
Here is the available data:
Part ID | Part Name | Part Category | Depends On | Part Cost | Part Manufacturing Duration |
1 | Brake Pad #1 | Braking System | 3 | 45.00 | 2 |
2 | Brake Caliper #2 | Braking System | 1 | 80.00 | 4 |
3 | Brake Rotor #3 | Braking System | 2 | 120.00 | 6 |
4 | Engine Block #4 | Engine System | null | 500.00 | 10 |
5 | Piston #5 | Engine System | 4 | 80.00 | 5 |
6 | Spark Plug #6 | Engine System | 5 | 10.00 | 1 |
7 | Oil Filter #7 | Engine System | 6 | 15.00 | 2 |
8 | Spark Plug #8 | Engine System | 5 | 12.00 | 1.5 |
flowchart TD
4("Engine Block #4") --> 5("Piston #5")
5 --> 6("Spark Plug #6") & 8("Spark Plug #8")
6 --> 7("Oil Filter #7")
3("Brake Rotor #3") --> 1("Brake Pad #1")
1 --> 2("Brake Caliper #2")
2 --> 3
Diagram: Visual representation of the dataset
We will use this dataset to build a Bill of Materials (BoM) for each part.
A Bill of Materials is a list of all components, raw materials or assemblies required to manufacture a product.
If you'd like to test these concepts further, check out the accompanying repository for additional datasets.
Based on what we’ve learned so far, the query to build the BoM should look something like this:
WITH
RECURSIVE PARTS_BOM AS (
-- Anchor Member: Start with all parts (even those that have dependencies)
SELECT
VP.PART_ID,
VP.PART_NAME,
VP.PART_CATEGORY,
VP.PART_ID::VARCHAR AS BOM
FROM
RECURSIVE_CTE_DATA.VEHICLE_PARTS_W_CYCLES_ARTICLE VP
UNION ALL
-- Recursive Member: Traverse the parts that depend on others
SELECT
VP.PART_ID,
VP.PART_NAME,
VP.PART_CATEGORY,
PB.BOM || '->' || VP.PART_ID::VARCHAR AS BOM
FROM
RECURSIVE_CTE_DATA.VEHICLE_PARTS_W_CYCLES_ARTICLE VP
INNER JOIN PARTS_BOM PB
ON VP.DEPENDS_ON = PB.PART_ID
)
-- Final Select: Retrieve the BOM for all parts
SELECT
PART_ID,
PART_NAME,
PART_CATEGORY,
BOM
FROM PARTS_BOM
ORDER BY PART_ID;
However, if you try to execute this query, you’ll notice that it never ends.
Infinite loops in recursive queries
Infinite loops occur if the recursive query keeps generating new rows indefinitely. It typically happens due to one of the following reasons:
- Lack of a termination condition: If the recursive query does not include a stopping condition or the hierarchy does not come to a natural end, the execution will go on indefinitely.
- Presence of cycles in hierarchical data: Cycles occur when a node directly or indirectly refers back to itself, creating a loop instead of a proper tree or directed acyclic graph (DAG). This often happens in parent-child relationships where a descendant mistakenly becomes an ancestor, causing infinite recursion in queries.
Preventing inifinite loops in recursive queries
Several methods exist to avoid infinite loops. These can be broadly categorized into:
- Limiting Recursion Depth
- Detecting and handling cycles in hierarchical data
Recursion Depth
Recursion depth refers to the number of times a recursive query calls itself before stopping. Limiting it ensures that recursion does not continue indefinitely. If recursion depth is too high without proper constraints, it can lead to infinite loops or excessive memory usage.
How to limit recursion depth:
- Add a numeric field,
RECURSION_DEPTH
, to the Anchor Member and initialize it to 0 or 1. - In the Recursive Member, increment
RECURSION_DEPTH
by 1. Each time the recursive part of the query executes, it adds 1 to the parent'sRECURSION_DEPTH
, effectively tracking the recursion depth. - Use
RECURSION_DEPTH
in theWHERE
clause of the Recursive Member to define a stopping condition.
Some database systems impose a default recursion depth limit to prevent infinite loops.
Applying this logic to our use case, we can define a new query:
WITH
RECURSIVE PARTS_BOM AS (
-- Anchor Member: Start with all parts (even those that have dependencies)
SELECT
VP.PART_ID,
VP.PART_NAME,
VP.PART_CATEGORY,
VP.PART_ID::VARCHAR AS BOM,
1 AS RECURSION_DEPTH
FROM
RECURSIVE_CTE_DATA.VEHICLE_PARTS_W_CYCLES_ARTICLE VP
UNION ALL
-- Recursive Member: Traverse the parts that depend on others
SELECT
VP.PART_ID,
VP.PART_NAME,
VP.PART_CATEGORY,
PB.BOM || '->' || VP.PART_ID::VARCHAR AS BOM,
PB.RECURSION_DEPTH + 1 AS RECURSION_DEPTH
FROM
RECURSIVE_CTE_DATA.VEHICLE_PARTS_W_CYCLES_ARTICLE VP
INNER JOIN PARTS_BOM PB
ON VP.DEPENDS_ON = PB.PART_ID
WHERE
-- Loop at most 10 times
PB.RECURSION_DEPTH < 10
)
-- Final Select: Retrieve the BOM for all parts
SELECT
PART_ID,
PART_NAME,
PART_CATEGORY,
BOM,
RECURSION_DEPTH
FROM PARTS_BOM
ORDER BY PART_ID;
Executing this query now terminates successfully and produces 43 rows in the output.
Sample output:
PART_ID | PART_NAME | PART_CATEGORY | BOM | RECURSION_DEPTH |
1 | Brake Pad #1 | Braking System | 1 | 1 |
1 | Brake Pad #1 | Braking System | 3->1 | 2 |
1 | Brake Pad #1 | Braking System | 2->3->1 | 3 |
1 | Brake Pad #1 | Braking System | 1->2->3->1 | 4 |
… | … | … | … | … |
1 | Brake Pad #1 | Braking System | 1->2->3->1->2->3->1->2->3->1 | 10 |
4 | Engine Block #4 | Engine System | 4 | 1 |
5 | Piston #5 | Engine System | 5 | 1 |
5 | Piston #5 | Engine System | 4->5 | 2 |
6 | Spark Plug #6 | Engine System | 6 | 1 |
6 | Spark Plug #6 | Engine System | 5->6 | 2 |
6 | Spark Plug #6 | Engine System | 4->5->6 | 3 |
7 | Oil Filter #7 | Engine System | 7 | 1 |
7 | Oil Filter #7 | Engine System | 6->7 | 2 |
7 | Oil Filter #7 | Engine System | 5->6->7 | 3 |
7 | Oil Filter #7 | Engine System | 4->5->6->7 | 4 |
We notice that the query still does not output the expected result. We have several rows for every part depending on the various recusion levels and paths (eg. part 7), we also have repeating patterns in our BoM due to cycles in our data (eg. part 1). Let's improve our solution.
Detecting and handling cycles in hierarchical data
Detecting cycles in hierarchical data ensures that recursion does not create infinite loops. Once detected, cycles can be handled to break the loop and prevent revisiting nodes in the same path.
This can be achieved by tracking all the visited nodes in a certain recursive path:
- Add a field,
VISITED_NODES
, to the Anchor Member and initialize it with the ID of the node itself. This field could be a string, an array or any datatype that can store a series of values. In our example, we will use an array datatype. - In the Recursive Member, add the current node's ID to the
VISITED_NODES
field of the previous iteration, enabling us to efficiently track all nodes that were already visited in a path. - The
VISITED_NODES
can then be used in theWHERE
clause of the Recursive Member to stop the recursion if a node was already visited.
Often, recursion depth limitation and cycle detection are implemented together to ensure the query finishes executing.
Applying this logic to our use case, we can define a new query:
WITH
RECURSIVE PARTS_BOM AS (
-- Anchor Member: Start with all parts (even those that have dependencies)
SELECT
VP.PART_ID,
VP.PART_NAME,
VP.PART_CATEGORY,
VP.PART_ID::VARCHAR AS BOM,
1 AS RECURSION_DEPTH,
ARRAY_CONSTRUCT(VP.PART_ID) AS VISITED_NODES
FROM
RECURSIVE_CTE_DATA.VEHICLE_PARTS_W_CYCLES_ARTICLE VP
UNION ALL
-- Recursive Member: Traverse the parts that depend on others
SELECT
VP.PART_ID,
VP.PART_NAME,
VP.PART_CATEGORY,
PB.BOM || '->' || VP.PART_ID::VARCHAR AS BOM,
PB.RECURSION_DEPTH + 1 AS RECURSION_DEPTH,
-- Adding the current ID to the list of visited_nodes
ARRAY_APPEND(PB.VISITED_NODES, VP.PART_ID) AS VISITED_NODES
FROM
RECURSIVE_CTE_DATA.VEHICLE_PARTS_W_CYCLES_ARTICLE VP
INNER JOIN PARTS_BOM PB
ON VP.DEPENDS_ON = PB.PART_ID
WHERE
PB.RECURSION_DEPTH < 10
-- Skip visited nodes
AND NOT ARRAY_CONTAINS(VP.PART_ID, PB.VISITED_NODES)
)
-- Final Select: Retrieve the BOM for all parts
SELECT
PART_ID,
PART_NAME,
PART_CATEGORY,
BOM,
RECURSION_DEPTH,
VISITED_NODES
FROM PARTS_BOM
ORDER BY PART_ID, RECURSION_DEPTH;
Executing this query produces 22 rows, a significant improvement.
Sample output:
PART_ID | PART_NAME | PART_CATEGORY | BOM | RECURSION_DEPTH | VISITED_NODES |
1 | Brake Pad #1 | Braking System | 1 | 1 | [1] |
1 | Brake Pad #1 | Braking System | 3->1 | 2 | [3, 1] |
1 | Brake Pad #1 | Braking System | 2->3->1 | 3 | [2, 3, 1] |
4 | Engine Block #4 | Engine System | 4 | 1 | [4] |
5 | Piston #5 | Engine System | 5 | 1 | [5] |
5 | Piston #5 | Engine System | 4->5 | 2 | [4, 5] |
6 | Spark Plug #6 | Engine System | 6 | 1 | [6] |
6 | Spark Plug #6 | Engine System | 5->6 | 2 | [5, 6] |
6 | Spark Plug #6 | Engine System | 4->5->6 | 3 | [4, 5, 6] |
7 | Oil Filter #7 | Engine System | 7 | 1 | [7] |
7 | Oil Filter #7 | Engine System | 6->7 | 2 | [6, 7] |
7 | Oil Filter #7 | Engine System | 5->6->7 | 3 | [5, 6, 7] |
7 | Oil Filter #7 | Engine System | 4->5->6->7 | 4 | [4, 5, 6, 7] |
Although the query successfully handles cyclic data, it does not yet produce the expected final result. Currently, it generates all possible dependency paths for each part. To obtain a single row for each part, listing all its dependencies, we should retain only the row with the maximum recursion depth for each PART_ID, as it contains the full dependency chain.
The updated query becomes:
WITH
RECURSIVE PARTS_BOM AS (
-- Anchor Member: Start with all parts (even those that have dependencies)
SELECT
VP.PART_ID,
VP.PART_NAME,
VP.PART_CATEGORY,
VP.PART_ID::VARCHAR AS BOM,
1 AS RECURSION_DEPTH,
ARRAY_CONSTRUCT(VP.PART_ID) AS VISITED_NODES
FROM
RECURSIVE_CTE_DATA.VEHICLE_PARTS_W_CYCLES_ARTICLE VP
UNION ALL
-- Recursive Member: Traverse the parts that depend on others
SELECT
VP.PART_ID,
VP.PART_NAME,
VP.PART_CATEGORY,
PB.BOM || '->' || VP.PART_ID::VARCHAR AS BOM,
PB.RECURSION_DEPTH + 1 AS RECURSION_DEPTH,
-- Adding the current ID to the list of visited_nodes
ARRAY_APPEND(PB.VISITED_NODES, VP.PART_ID) AS VISITED_NODES
FROM
RECURSIVE_CTE_DATA.VEHICLE_PARTS_W_CYCLES_ARTICLE VP
INNER JOIN PARTS_BOM PB
ON VP.DEPENDS_ON = PB.PART_ID
WHERE
-- Loop at most 10 times
PB.RECURSION_DEPTH < 10
-- Skip visited nodes
AND NOT ARRAY_CONTAINS(VP.PART_ID, PB.VISITED_NODES)
)
-- Final Select: Retrieve the BOM for all parts
SELECT
PART_ID,
PART_NAME,
PART_CATEGORY,
BOM,
RECURSION_DEPTH,
VISITED_NODES
FROM PARTS_BOM
-- take the row with maximum recursion depth for each part_ID
QUALIFY ROW_NUMBER() OVER (PARTITION BY PART_ID ORDER BY RECURSION_DEPTH DESC) = 1
ORDER BY PART_ID, RECURSION_DEPTH;
Final output:
PART_ID | PART_NAME | PART_CATEGORY | BOM | RECURSION_DEPTH | VISITED_NODES |
1 | Brake Pad #1 | Braking System | 2->3->1 | 3 | [2, 3, 1] |
2 | Brake Caliper #2 | Braking System | 3->1->2 | 3 | [3, 1, 2] |
3 | Brake Rotor #3 | Braking System | 1->2->3 | 3 | [1, 2, 3] |
4 | Engine Block #4 | Engine System | 4 | 1 | [4] |
5 | Piston #5 | Engine System | 4->5 | 2 | [4, 5] |
6 | Spark Plug #6 | Engine System | 4->5->6 | 3 | [4, 5, 6] |
7 | Oil Filter #7 | Engine System | 4->5->6->7 | 4 | [4, 5, 6, 7] |
8 | Spark Plug #8 | Engine System | 4->5->8 | 3 | [4, 5, 8] |
How to optimize recursive queries
Optimizing recursive queries is essential in defining an efficient and reliable solution; especially when dealing with large datasets, as rows can exponentially multiply. Here are a few tips to help you optimize your queries:
- Filter data early: If you need to process only a subset of your data, exclude the unnecessary data before processing. Apply
WHERE
conditions as soon as possible to limit the number of rows processed during recursion. For example, if you want to generate the BOM only for parts that belong to thePART_CATEGORY
'Engine System', add a filter in your Anchor Member before processing. Avoid, as much as possible, excluding rows as a post-processing operation. - Clean loops and cycles (if possible): Loops in hierarchical data often represent anomalies. If this applies to your dataset, consider fixing the underlying data before executing the recursive query to avoid unwanted infinite loops.
- Use proper indexing: Ensure indexes are created on key columns involved in the recursive joins to speed up lookups.
- Avoid unnecessary columns: Select only the columns you require to reduce memory consumption and improve performance.
- Use
UNION ALL
instead ofUNION
:UNION
performs a distinct check, which can be expensive. UsingUNION ALL
can improve performance by skipping this check. - Monitor and tune query execution plans: Use execution plans and profiling tools (e.g.,
EXPLAIN
in Snowflake) to analyze performance bottlenecks and optimize queries accordingly.
Alternative methods to recursive CTEs for hierarchical data processing
Some systems can provide additional methods to process hierarchical data. These methods can simplify the query syntax (e.g., providing functions to calculate recursion depth and visited nodes) but may impose some limitations (e.g., constraints to use a single table).
Let's consider, for example, Snowflake's CONNECT BY
statement. According to the documentation it has the following syntax:
SELECT <column_list> [ , <level_expression> ]
FROM <data_source>
START WITH <predicate>
CONNECT BY [ PRIOR ] <col1_identifier> = [ PRIOR ] <col2_identifier>
[ , [ PRIOR ] <col3_identifier> = [ PRIOR ] <col4_identifier> ]
...
...
Where:
START WITH
defines the Anchor Member conditionsCONNECT BY
contains the join conditions, stop conditions and filters present in the Recursive Member<level_expression>
are expressions provided by Snowflake used to construct the recusion path, determine the recursion depth etc.
To generate the BoM using Snowflake's CONNECT BY
, the query would thus look like this:
WITH
PARTS_BOM AS (
SELECT
VP.PART_ID,
VP.PART_NAME,
VP.PART_CATEGORY,
SYS_CONNECT_BY_PATH(VP.PART_ID, ' -> ') BOM,
LEVEL AS RECURSION_DEPTH
FROM
RECURSIVE_CTE_DATA.VEHICLE_PARTS_W_CYCLES_ARTICLE VP
START WITH TRUE -- Anchor Member: Start with all parts (even those that have dependencies)
CONNECT BY
-- Recursive Member: Traverse the parts that depend on others
VP.DEPENDS_ON = PRIOR VP.PART_ID
-- Loop at most 10 times
AND PRIOR RECURSION_DEPTH < 10
-- Skip visited nodes
-- You could also define an array field just like in the recursive CTE example
AND PRIOR BOM NOT LIKE '%' || VP.PART_ID || '%'
)
-- Final Select: Retrieve the BOM for all parts
SELECT
PART_ID,
PART_NAME,
PART_CATEGORY,
BOM,
RECURSION_DEPTH
FROM PARTS_BOM
-- take the row with maximum recursion depth for each part_ID
QUALIFY ROW_NUMBER() OVER (PARTITION BY PART_ID ORDER BY RECURSION_DEPTH DESC) = 1
ORDER BY PART_ID, RECURSION_DEPTH
Conclusion
Recursive CTEs are a powerful SQL tool for dealing with hierarchical data structures. However, it is important to optimize these queries to avoid excessive resource consumption, infinite loops, and other performance issues, ensuring the creation of robust solutions.
Key takeaways:
- Using recursion depth to limit iterations
- Detecting and handling cycles in data
- Tips to optimize recursive queries
Stay tuned for Part 3, where we’ll cover essential use cases for SQL recursion.
Want to learn more about this topic? Check out the full series here: datainbites.hashnode.dev/series/mastering-sql-recursion.
Subscribe to my newsletter
Read articles from Elie Fayad directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Elie Fayad
Elie Fayad
I'm a data professional specializing in SQL and Snowflake, with a strong background in cloud migrations, data platform configuration, ETL/ELT pipeline development, data modeling, and workflow orchestration. I'm proactive, eager to learn, and passionate about tackling new challenges. I enjoy exploring emerging technologies and sharing knowledge with the community!