Mastering SQL Recursion: A Beginner’s Guide to Recursive CTEs and Hierarchical Data in SQL (Part 1)

Elie FayadElie Fayad
11 min read

Introduction

If you've ever worked with hierarchical or tree-structured data, you’ve probably hit the limitations of standard SQL. Trying to define and implement these relationships with joins quickly becomes messy, unreadable, and inefficient. Oftentimes, it is even impossible to determine how many joins are needed to obtain the desired results.

That’s where recursive SQL queries come in!

Recursive Common Table Expressions (CTEs) allow you to easily traverse hierarchical data in tables, enabling you to walk through parent-child relationships dynamically regardless of hierarchical depth. Whether you’re finding all employees under a manager, building a bill of materials or generating a sequence of numbers, recursive queries are an essential tool to add to your arsenal as a data engineer.

In this three part series, you will learn and master recursion in SQL. The topics are divided as follows:

  • Part 1: Introduction to recursion in SQL
  • Part 2: Advanced recursion techniques
  • Part 3: Use-cases for SQL recursion

By the end of this article, you’ll discover:

  • The basic structure of a recursive query
  • How recursive queries work
  • An example where SQL recursion is essential

PREREQUISITES:

  • Basic SQL knowledge

DISCLAIMERS:

  • All examples in this article were run in a Snowflake environment (but concepts apply to PostgreSQL, SQL Server, etc.). To follow along, you can sign up for a free Snowflake trial account here.
  • All code and data used in this article are available in the accompanying repository.

What Are Common Table Expressions?

A Common Table Expression (CTE) is a temporary result set defined within a SQL query using the WITH keyword. It allows you to break complex queries into modular, readable pieces. Think of it as a temporary table that you can reference exclusively within the query.

Basic CTE Syntax:

WITH
    CTE_1 [(cte_col_1, cte_col_2)] AS (
        SELECT col_1, col_2
        FROM ...
    )
SELECT ... 
FROM CTE_1;

How to Use CTEs to Create SQL Recursive Queries

What is recursion?

Recursion is when a process calls itself to solve a smaller version of the same problem, repeating this until it reaches a stop condition that terminates the repetition.

A stop condition in recursion is a defined rule that determines when the recursive process should end, essential in preventing infinite loops or excessive recursion.

Example 1 - Real world scenario

Imagine a set of nested folders. To find a file, you might open a folder, and if it contains more folders, you open those too — repeating the process until you find the file or reach the deepest level. Each step is the same task applied to a smaller scope.

Example 2 - Programming

In programming, recursion is like breaking down a task into simpler subtasks. For instance, calculating the factorial of a number:

  • factorial(5) = 5 * factorial(4)
  • factorial(4) = 4 * factorial(3)
  • ... until the stop condition factorial(1) = 1

The calls keep stacking up until the base case is hit, and then the results are gradually combined as the recursion unwinds.

How can recursion be achieved in SQL?

Recursion in SQL is accomplished using Recursive CTEs. You guessed it! It’s a CTE that calls itself to process hierarchical or nested data. This is especially useful for keeping the code clean and building a robust solution without knowing the hierarchy depth in advance.

Recursive CTEs generally consist of two parts:

  • Anchor Member: The starting point of the recursion (e.g. top-level employees)
  • Recursive Member: A query that references the CTE itself to iterate through lower levels of the hierarchy.

In SQL, a stop condition is typically implemented in the Recursive Member through conditions that limit the recursion depth or detect circular references.

General Syntax of a Recursive CTE:

WITH 
    [RECURSIVE] <cte_name> AS (

        -- Anchor Member
        SELECT ...
        FROM <table>
        WHERE ... -- usually a condition to identify the root node

        UNION ALL

        -- Recursive Member
        SELECT ...
        FROM 
            <table> t
            INNER JOIN <cte_name> c
                ON t.parent_id = c.id
        WHERE ... -- optional filer or stopping condition
)
SELECT * 
FROM <cte_name>;

Keep in mind that some systems provide additional functionalities and syntax to build recursive queries, such as Snowflake's CONNECT BY statement.


Example: Employee Reporting Hierarchy

Problem

Suppose you have a table EXAMPLE_EMPLOYEE_01, within schema RECURSIVE_CTE_DATA, representing employees and their reporting structure in an organization:

Employee IDRoleManager ID
1Director of EngineeringNULL
2Engineering Manager1
3Product Manager1
4Software Engineer2
5Data Analyst2
6UX Designer3
7Intern 14
8Intern 24
9HR DirectorNULL
10HR Employee9
11HR Intern9
graph TD
    %% Engineering Department
    A[Director of Engineering] --> B[Engineering Manager]
    B --> C[Software Engineer]
    C --> D[Intern 1]
    C --> E[Intern 2]
    B --> F[Data Analyst]
    A --> G[Product Manager]
    G --> H[UX Designer]

    %% HR Department
    I[HR Director] --> J[HR Employee]
    I --> K[HR Intern]

    %% Styling for clarity
    classDef engineering fill:#f9f,stroke:#333;
    classDef hr fill:#bbf,stroke:#333;
    class A,B,C,D,E,F,G,H engineering;
    class I,J,K hr;

Diagram: Visual representation of the dataset

Objective: Determine the reporting hierarchy for each employee:

EMPLOYEE IDEMPLOYEE ROLEMANAGER IDTOP LEVEL MANAGERREPORTING HIERARCHY
1Director of EngineeringNULL11
2Engineering Manager112 -> 1
3Product Manager113 -> 1
4Software Engineer214 -> 2 -> 1
5Data Analyst215 -> 2 -> 1
6UX Designer316 -> 3 -> 1
7Intern 1417 -> 4 -> 2 -> 1
8Intern 2418 -> 4 -> 2 -> 1
9HR DirectorNULL99
10HR Employee9910 -> 9
11HR Intern9911 -> 9

Solution

Let's start defining the two components of our recursive CTE EMPLOYEE_HIERARCHY.

  • Anchor Member: Start with all the top-level employees (MANAGER_ID = NULL).
SELECT
    EMPLOYEE_ID,
    EMPLOYEE_ROLE,
    MANAGER_ID,
    -- Begin the hierarchy for each top-level manager. Set the initial reporting hierarchy for each manager to just their own ID
    EMPLOYEE_ID::VARCHAR AS REPORTING_HIERARCHY 
FROM RECURSIVE_CTE_DATA.EXAMPLE_EMPLOYEE_01 
WHERE
    MANAGER_ID IS NULL
  • Recursive Member: Link each employee with their manager
SELECT 
    E.EMPLOYEE_ID,
    E.EMPLOYEE_ROLE,
    E.MANAGER_ID,
    -- Add the current employee's ID to the existing REPORTING_HIERARCHY of their manager, effectively building the reporting chain
    E.EMPLOYEE_ID::VARCHAR || ' -> ' || EH.REPORTING_HIERARCHY AS REPORTING_HIERARCHY
FROM 
    -- Join the employee with their manager
    -- The join is satisfied if the employee's manager is present in the previous recursion step
    RECURSIVE_CTE_DATA.EXAMPLE_EMPLOYEE_01 E
    INNER JOIN EMPLOYEE_HIERARCHY EH 
        ON E.MANAGER_ID = EH.EMPLOYEE_ID

As a result, the final query will look something like this:

WITH 
    RECURSIVE EMPLOYEE_HIERARCHY AS (
        -- Anchor: Start with all employees who have no managers (top-level)
        SELECT
            EMPLOYEE_ID AS TOP_LEVEL_MANAGER, -- Top-level manager's ID
            EMPLOYEE_ID,
            EMPLOYEE_ROLE,
            MANAGER_ID,
            EMPLOYEE_ID::VARCHAR AS REPORTING_HIERARCHY,
            -- Additional technical column that indicates the recursion step. It keeps track of how deep the recursion has gone
            0 AS RECURSION_LEVEL, 
            -- Additional technical column that shows the path followed in the recusion step
            EMPLOYEE_ID::VARCHAR AS RECURSION_PATH 
        FROM RECURSIVE_CTE_DATA.EXAMPLE_EMPLOYEE_01 
        WHERE MANAGER_ID IS NULL

        UNION ALL

        -- Recursive: Link each employee with their manager
        SELECT 
            EH.TOP_LEVEL_MANAGER,
            E.EMPLOYEE_ID,
            E.EMPLOYEE_ROLE,
            E.MANAGER_ID,
            E.EMPLOYEE_ID::VARCHAR || ' -> ' || EH.REPORTING_HIERARCHY AS REPORTING_HIERARCHY,
            EH.RECURSION_LEVEL + 1 AS RECURSION_LEVEL,
            EH.RECURSION_PATH || ' -> ' || E.EMPLOYEE_ID::VARCHAR AS RECURSION_PATH
        FROM 
            RECURSIVE_CTE_DATA.EXAMPLE_EMPLOYEE_01  E
            INNER JOIN EMPLOYEE_HIERARCHY EH 
                ON E.MANAGER_ID = EH.EMPLOYEE_ID
)
SELECT *
FROM EMPLOYEE_HIERARCHY
ORDER BY TOP_LEVEL_MANAGER, EMPLOYEE_ID

Let's understand better what is happening in each recursion step:


Step 0: Add top-level managers

  • Final content:
EMPLOYEE_IDEMPLOYEE_ROLEMANAGER_IDTOP_LEVEL_MANAGERREPORTING_HIERARCHY
1Director of Engineeringnull11
9HR Directornull99

Step 1: Add employees who report to top-level managers

  • EMPLOYEE_HIERARCHY context for step 1:
EMPLOYEE_IDEMPLOYEE_ROLEMANAGER_IDTOP_LEVEL_MANAGERREPORTING_HIERARCHY
1Director of Engineeringnull11
9HR Directornull99
  • Recursion output - all employees that report to EMPLOYEE_ID 1 or 9:
EMPLOYEE_IDEMPLOYEE_ROLEMANAGER_IDTOP_LEVEL_MANAGERREPORTING_HIERARCHY
2Engineering Manager112 -> 1
3Product Manager113 -> 1
10HR Employee9910 -> 9
11HR Intern9911 -> 9
  • Final content:
EMPLOYEE_IDEMPLOYEE_ROLEMANAGER_IDTOP_LEVEL_MANAGERREPORTING_HIERARCHY
1Director of Engineeringnull11
9HR Directornull99
2Engineering Manager112 -> 1
3Product Manager113 -> 1
10HR Employee9910 -> 9
11HR Intern9911 -> 9

Step 2: Add employees who report to managers of step 1

  • EMPLOYEE_HIERARCHY context for step 2: All employees of the previous recusion step become managers
EMPLOYEE_IDEMPLOYEE_ROLEMANAGER_IDTOP_LEVEL_MANAGERREPORTING_HIERARCHY
2Engineering Manager112 -> 1
3Product Manager113 -> 1
10HR Employee9910 -> 9
11HR Intern9911 -> 9
  • Recursion output - all employees that report to EMPLOYEE_ID 2, 3, 10 or 11:
EMPLOYEE_IDEMPLOYEE_ROLEMANAGER_IDTOP_LEVEL_MANAGERREPORTING_HIERARCHY
4Software Engineer214 -> 2 -> 1
5Data Analyst215 -> 2 -> 1
6UX Designer316 -> 3 -> 1
  • Final content:
EMPLOYEE_IDEMPLOYEE_ROLEMANAGER_IDTOP_LEVEL_MANAGERREPORTING_HIERARCHY
1Director of Engineeringnull11
9HR Directornull99
2Engineering Manager112 -> 1
3Product Manager113 -> 1
10HR Employee9910 -> 9
11HR Intern9911 -> 9
4Software Engineer214 -> 2 -> 1
5Data Analyst215 -> 2 -> 1
6UX Designer316 -> 3 -> 1

Step 3: Add employees who report to managers of step 2

  • EMPLOYEE_HIERARCHY context for step 3:
EMPLOYEE_IDEMPLOYEE_ROLEMANAGER_IDTOP_LEVEL_MANAGERREPORTING_HIERARCHY
4Software Engineer214 -> 2 -> 1
5Data Analyst215 -> 2 -> 1
6UX Designer316 -> 3 -> 1
  • Recursion output - all employees that report to EMPLOYEE_ID 4, 5 or 6:
EMPLOYEE_IDEMPLOYEE_ROLEMANAGER_IDTOP_LEVEL_MANAGERREPORTING_HIERARCHY
7Intern 1417 -> 4 -> 2 -> 1
8Intern 2418 -> 4 -> 2 -> 1
  • Final content:
EMPLOYEE_IDEMPLOYEE_ROLEMANAGER_IDTOP_LEVEL_MANAGERREPORTING_HIERARCHY
1Director of Engineeringnull11
9HR Directornull99
2Engineering Manager112 -> 1
3Product Manager113 -> 1
10HR Employee9910 -> 9
11HR Intern9911 -> 9
4Software Engineer214 -> 2 -> 1
5Data Analyst215 -> 2 -> 1
6UX Designer316 -> 3 -> 1
7Intern 1417 -> 4 -> 2 -> 1
8Intern 2418 -> 4 -> 2 -> 1

Conclusion

Recursive CTEs are a powerful SQL tool for dealing with hierarchical data structures — whether it’s organizational charts, folder structures, or dependency graphs. By separating queries into anchor and recursive members, and iterating over rows until a stopping condition is met, we’re able to traverse even the deepest trees with organized and efficient code.

Key takeaways:

  • Use recursive CTEs when you are dealing with hierarchical data.
  • Define a stopping condition, when hierarchy is not finite, or recursion will run indefinitely (or until the system’s max recursion depth).
  • Recursive CTEs are supported in most modern SQL engines, but syntax and limits may vary.

Stay tuned for Part 2, where we’ll cover advanced recursion techniques including cycle detection, max depth limiting, and optimization tips for large datasets.


Want to learn more about this topic? Check out the full series here: datainbites.hashnode.dev/series/mastering-sql-recursion.

0
Subscribe to my newsletter

Read articles from Elie Fayad directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Elie Fayad
Elie Fayad

I'm a data professional specializing in SQL and Snowflake, with a strong background in cloud migrations, data platform configuration, ETL/ELT pipeline development, data modeling, and workflow orchestration. I'm proactive, eager to learn, and passionate about tackling new challenges. I enjoy exploring emerging technologies and sharing knowledge with the community!