Introduction

In the first article of this three-part series, we explored recursion—what it is, why it’s crucial for processing hierarchical data, and how it can be implemented in SQL. In the second article, we learned advanced recursion techniques to prevent infinite loops, handle cycles in data, and optimize recursive queries.

This article is the third and final part of the SQL Recursion series, where we'll go over some practical use cases of SQL recursion.

In this article, we will present three different scenarios:

Calculating the total cost and duration of a product assembly
Comparing budget allocation vs. actual spending across different cost centers
Finding the shortest path for packet routing in a network system

Each of these scenarios illustrates how recursion can simplify complex data relationships and unlock powerful query logic.

PREREQUISITES:

SQL knowledge
SQL recursion knowledge, check out the SQL Recursion series!

DISCLAIMERS:

All examples in this article were run in a Snowflake environment (but concepts apply to PostgreSQL, SQL Server, etc.). To follow along, you can sign up for a free Snowflake trial account here.
All code and data used in this article are available in the accompanying GitHub repository.

📋 Use Case 1: Product Assembly - Total Cost and Duration

🎯 Objective

Calculate the total cost and manufacturing duration required to produce a product. Understanding total costs and time significantly helps in optimizing supply chain planning and in pricing decisions.

📊 Dataset

We will use three tables:

PRODUCTS: Contains the list of products being sold.

COLUMN_NAME	DESCRIPTION
`PRODUCT_ID`	Product ID.
`PRODUCT_NAME`	Product name.
`ASSEMBLY_TIME`	Duration (in minutes) to assemble the product.
`ASSEMBLY_COST`	Final assembly cost (in USD).

PRODUCT_PARTS: Represents the Bill of Materials (BoM) for each product.

COLUMN_NAME	DESCRIPTION
`PART_ID`	Part ID.
`PART_NAME`	Part name.
`COMPONENT_ID`	Component required to manufacture the part.
`QUANTITY`	Number of required components.

COMPONENT_DETAILS: Contains cost and duration details for each component.

COLUMN_NAME	DESCRIPTION
`COMPONENT_ID`	Component ID.
`COMPONENT_NAME`	Component name.
`COST`	Manufacturing cost (in USD).
`DURATION`	Manufacturing duration (in minutes).

A Bill of Materials (BoM) is a list of all components, raw materials or assemblies required to manufacture a product.

🔢 Sample Data

PRODUCTS:

PRODUCT_ID	PRODUCT_NAME	ASSEMBLY_TIME	ASSEMBLY_COST
1000	Smartphone	30	60

PRODUCT_PARTS:

PART_ID	PART_NAME	COMPONENT_ID	QUANTITY
1000	Smartphone	2000	1
1000	Smartphone	2001	1
2000	CPU	3000	8
2000	CPU	3001	1
2001	Screen	3002	1
2001	Screen	3003	1

COMPONENT_DETAILS:

COMPONENT_ID	COMPONENT_NAME	COST	DURATION
2000	CPU	100	30
2001	Screen	80	20
3000	CPU Core	10	5
3001	Cache	15	7
3002	Glass Panel	20	10
3003	Touch Sensor	10	8

✔️ Solution

💻 Recursive Query

To calculate the total cost and duration, we use the following recursive query:

WITH
    ALL_COMPONENTS_PRODUCT AS (

        -- Anchor Member: Start with all the final products -> all products present in the UC1_PRODUCTS table

        SELECT 
            -- Final product ID
            PRODS.PRODUCT_ID AS TARGET_PRODUCT
            -- Component that the final product depends on
            , PRODS.PRODUCT_ID AS CURRENT_COMPONENT
            -- Cost of the current step
            , PRODS.ASSEMBLY_COST AS STEP_COST
            -- Duration of the current step
            , PRODS.ASSEMBLY_TIME AS STEP_DURATION
            -- Required quantities of the component
            , 1 AS REQUIRED_QUANTITY
            -- technical recursive columns
            , ARRAY_CONSTRUCT(PRODS.PRODUCT_ID) AS COMPONENTS_IN_PATH
            , 1 AS ITERATION_LEVEL
            /*THESE COLUMNS ARE JUST TO HELP VISUALIZE THE OUTPUT*/
            , PRODS.PRODUCT_NAME AS TARGET_PRODUCT_NAME
            , PRODS.PRODUCT_NAME AS CURRENT_COMPONENT_NAME
            , ARRAY_CONSTRUCT(PRODS.PRODUCT_NAME) AS COMPONENTS_NAME_IN_PATH
        FROM 
            RECURSIVE_CTE_DATA.UC1_PRODUCTS PRODS

        UNION ALL

        -- Recursive Member: Traverse the hierarchy from the top to the bottom to find all components the final product depends on

        SELECT 
            PREV_STEP.TARGET_PRODUCT
            , PROD_PART.COMPONENT_ID AS CURRENT_COMPONENT
            , COMP_DET.COST AS STEP_COST
            , COMP_DET.DURATION AS STEP_DURATION
            , PROD_PART.QUANTITY AS REQUIRED_QUANTITY
            , ARRAY_APPEND(PREV_STEP.COMPONENTS_IN_PATH, PROD_PART.COMPONENT_ID) AS COMPONENTS_IN_PATH
            , PREV_STEP.ITERATION_LEVEL + 1 AS ITERATION_LEVEL
            /*THESE COLUMNS ARE JUST TO HELP VISUALIZE THE OUTPUT*/
            , PREV_STEP.TARGET_PRODUCT_NAME
            , COMP_DET.COMPONENT_NAME AS CURRENT_COMPONENT_NAME
            , ARRAY_APPEND(PREV_STEP.COMPONENTS_NAME_IN_PATH, COMP_DET.COMPONENT_NAME) AS COMPONENTS_NAME_IN_PATH
        FROM 
            ALL_COMPONENTS_PRODUCT PREV_STEP
            INNER JOIN RECURSIVE_CTE_DATA.UC1_PRODUCT_PARTS PROD_PART -- get the component on which the parent component is dependent on
                ON PREV_STEP.CURRENT_COMPONENT = PROD_PART.PART_ID
            INNER JOIN RECURSIVE_CTE_DATA.UC1_COMPONENT_DETAILS COMP_DET -- get the details of the components
                ON COMP_DET.COMPONENT_ID = PROD_PART.COMPONENT_ID
        WHERE
            PREV_STEP.ITERATION_LEVEL <= 1000
            AND NOT ARRAY_CONTAINS(PROD_PART.COMPONENT_ID, PREV_STEP.COMPONENTS_IN_PATH)
    ), TOTALS_PER_COMPONENT as (
        -- Find the total quantity, cost, and duration for each component of the product
        SELECT 
            TARGET_PRODUCT
            , TARGET_PRODUCT_NAME
            , CURRENT_COMPONENT
            , CURRENT_COMPONENT_NAME
            , SUM(REQUIRED_QUANTITY * STEP_COST) AS REQUIRED_QUANTITY_COST
            , SUM(REQUIRED_QUANTITY * STEP_DURATION) AS REQUIRED_QUANTITY_DURATION
            , SUM(REQUIRED_QUANTITY) AS TOTAL_REQUIRED_QUANTITY
        FROM ALL_COMPONENTS_PRODUCT
        GROUP BY ALL
    )
-- Sum the costs and duration to get the final data
SELECT 
    TARGET_PRODUCT
    , TARGET_PRODUCT_NAME
    , SUM(REQUIRED_QUANTITY_COST) AS TOTAL_COST
    , SUM(REQUIRED_QUANTITY_DURATION) AS TOTAL_DURATION
    -- List all the components in the final product
    , OBJECT_DELETE(
        OBJECT_AGG(
            CURRENT_COMPONENT_NAME || '_' || CURRENT_COMPONENT, 
            TOTAL_REQUIRED_QUANTITY
        ), 
        TARGET_PRODUCT_NAME || '_' || TARGET_PRODUCT
    ) AS ALL_COMPONENTS
FROM TOTALS_PER_COMPONENT
GROUP BY ALL
ORDER BY TARGET_PRODUCT

📊 Results

The final result:

TARGET_PRODUCT	TARGET_PRODUCT_NAME	TOTAL_COST	TOTAL_DURATION	ALL_COMPONENTS
1000	Smartphone	365	145	See below

// ALL_COMPONENTS
{
  "CPU Core_3000": 8,
  "CPU_2000": 1,
  "Cache_3001": 1,
  "Glass Panel_3002": 1,
  "Screen_2001": 1,
  "Touch Sensor_3003": 1
}

📋 Use Case 2: Budget Allocated vs. Actual Spend Across Cost Centers

🎯 Objective

Determine whether the cost centers (CCs) under the Corporate HQ overspent or underspent in a specific year.

📊 Dataset

We will use two tables:

COST_CENTERS: Stores all cost centers and their hierarchical relationships.

COLUMN_NAME	DESCRIPTION
`COST_CENTER_ID`	Cost center ID.
`PARENT_CC_ID`	Parent cost center to which the cost center reports to.
`COST_CENTER_NAME`	Cost center name.

COST_CENTERS_BUDGETS: Contains budget and actual spending data for each cost center per year.

COLUMN_NAME	DESCRIPTION
`COST_CENTER_ID`	Cost center ID.
`REFERENCE_YEAR`	Year of reference.
`ALLOCATED_BUDGET`	Allocated budget for the cost center in the reference year (in USD).
`TOTAL_SPENT`	Actual amount spent by the cost center in the reference year (in USD).

🔢 Sample Data

COST_CENTERS:

COST_CENTER_ID	PARENT_CC_ID	COST_CENTER_NAME
1	NULL	Corporate HQ
2	1	Engineering
3	1	Marketing
4	2	Software
5	2	Hardware
6	3	Content
7	3	SEO
8	6	Social Media

COST_CENTERS_BUDGETS:

COST_CENTER_ID	REFERENCE_YEAR	ALLOCATED_BUDGET	TOTAL_SPENT
1	2025	200000	180000
1	2024	180000	175000
2	2025	100000	110000
2	2024	95000	90000
3	2025	80000	85000
3	2024	75000	70000
4	2025	150000	155000
4	2024	140000	135000
5	2025	130000	120000
5	2024	120000	115000
6	2025	70000	65000
6	2024	65000	60000
7	2025	60000	62000
7	2024	55000	57000
8	2025	40000	38000
8	2024	35000	34000

✔️ Solution

💻 Recursive Query

We use a recursive query to roll up budget and spending data according to the cost center hierarchy:

WITH
    COST_CENTER_ROLL_UP AS (

        -- Anchor Member: Get all cost centers directly under Corporate HQ

        SELECT
            -- Parent cost center
            CC.COST_CENTER_ID AS ROOT_COST_CENTER
            , CC.COST_CENTER_NAME AS ROOT_COST_CENTER_NAME
            -- The reference year
            , CC_B.REFERENCE_YEAR
            -- The child cost center
            , CC.COST_CENTER_ID AS CURRENT_CC
            -- Cost center budget
            , CC_B.ALLOCATED_BUDGET
            -- Cost center total yearly spent
            , CC_B.TOTAL_SPENT
            -- technical recursive columns
            , 1 AS ITERATION_LEVEL
            , ARRAY_CONSTRUCT(CC.COST_CENTER_ID) AS CCS_IN_PATH
            /*THESE COLUMNS ARE JUST TO HELP VISUALIZE THE OUTPUT*/
            , CC.COST_CENTER_NAME AS CURRENT_CC_NAME
            , ARRAY_CONSTRUCT(CC.COST_CENTER_NAME) AS CCS_IN_PATH_NAME
        FROM
            RECURSIVE_CTE.RECURSIVE_CTE_DATA.UC2_COST_CENTERS CC
            INNER JOIN RECURSIVE_CTE.RECURSIVE_CTE_DATA.UC2_COST_CENTERS_BUDGETS CC_B
                ON CC_B.COST_CENTER_ID = CC.COST_CENTER_ID
        WHERE
            CC.PARENT_CC_ID = 1

        UNION ALL

        -- Recursive Member: Traverse the hierarchy to find all descendant cost centers

        SELECT
            PARENT_CC.ROOT_COST_CENTER
            , PARENT_CC.ROOT_COST_CENTER_NAME
            , PARENT_CC.REFERENCE_YEAR
            , CC.COST_CENTER_ID AS CURRENT_CC
            , CC_B.ALLOCATED_BUDGET
            , CC_B.TOTAL_SPENT
            , PARENT_CC.ITERATION_LEVEL + 1 AS ITERATION_LEVEL
            , ARRAY_APPEND(PARENT_CC.CCS_IN_PATH, CC.COST_CENTER_ID) AS CCS_IN_PATH
            /*THESE COLUMNS ARE JUST TO HELP VISUALIZE THE OUTPUT*/
            , CC.COST_CENTER_NAME AS CURRENT_CC_NAME
            , ARRAY_APPEND(PARENT_CC.CCS_IN_PATH_NAME, CC.COST_CENTER_NAME) AS CCS_IN_PATH_NAME
        FROM
            COST_CENTER_ROLL_UP PARENT_CC
            INNER JOIN RECURSIVE_CTE.RECURSIVE_CTE_DATA.UC2_COST_CENTERS CC -- get the children of the parent cost centers of the previous step
                ON CC.PARENT_CC_ID = PARENT_CC.CURRENT_CC
            INNER JOIN RECURSIVE_CTE.RECURSIVE_CTE_DATA.UC2_COST_CENTERS_BUDGETS CC_B -- get the related data of the child cost center
                ON CC_B.COST_CENTER_ID = CC.COST_CENTER_ID
                    AND CC_B.REFERENCE_YEAR = PARENT_CC.REFERENCE_YEAR
        WHERE
            PARENT_CC.ITERATION_LEVEL < 1000
            AND NOT ARRAY_CONTAINS(CC.COST_CENTER_ID, PARENT_CC.CCS_IN_PATH)
    )
SELECT 
    ROOT_COST_CENTER AS COST_CENTER
    , ROOT_COST_CENTER_NAME AS COST_CENTER_NAME
    , REFERENCE_YEAR
    -- get the total allocated budget
    , SUM(ALLOCATED_BUDGET) AS TOTAL_BUDGET_CC
    -- get the total effective spent
    , SUM(TOTAL_SPENT) AS TOTAL_SPENT_CC
    -- calculate the difference between budget and actual spent
    , TOTAL_BUDGET_CC - TOTAL_SPENT_CC AS BUDGET_VARIANCE
    , ROUND((BUDGET_VARIANCE / TOTAL_BUDGET_CC) * 100, 2) AS BUDGET_VARIANCE_PCT
    -- classify the cost center
    , CASE
        WHEN BUDGET_VARIANCE < 0 THEN 'Over Budget'
        WHEN BUDGET_VARIANCE = 0 THEN 'Within Budget'
        WHEN BUDGET_VARIANCE > 0 THEN 'Under Budget'
      END BUDGET_STATUS
FROM COST_CENTER_ROLL_UP
GROUP BY ALL
ORDER BY COST_CENTER, REFERENCE_YEAR

📊 Results

The final result:

COST_CENTER	COST_CENTER_NAME	REFERENCE_YEAR	TOTAL_BUDGET_CC	TOTAL_SPENT_CC	BUDGET_VARIANCE	BUDGET_VARIANCE_PCT	BUDGET_STATUS
2	Engineering	2024	355000	340000	15000	4.23	Under Budget
2	Engineering	2025	380000	385000	-5000	-1.32	Over Budget
3	Marketing	2024	230000	221000	9000	3.91	Under Budget
3	Marketing	2025	250000	250000	0	0.00	Within Budget

The chart below visualizes the budget variance across cost centers for the years 2024 and 2025, clearly highlighting which centers stayed within budget and which exceeded their allocated funds:

Budget Variance Visualization

📋 Use Case 3: Network Routing - Minimum Travel Distance

🎯 Objective

Calculate the minimum distance (in KMs) a packet has to travel from a source hub to arrive to all possible destination hubs.

This use case highlights how recursive queries can be leveraged to implement pseudo-graph algorithms.

📊 Dataset

We will use two tables:

ROUTING_HUBS: Stores all available hubs.

COLUMN_NAME	DESCRIPTION
`HUB_ID`	Hub ID.
`HUB_LOCATION`	Country where the hub is located.

HUB_LINKS: Contains the connections between hubs. Links are bidirectional.

COLUMN_NAME	DESCRIPTION
`SOURCE_HUB_ID`	Source Hub ID.
`TARGET_HUB_ID`	Destination Hub ID.
`PATH_LENGTH_KM`	Distance (in KMs) from source to destination.

🔢 Sample Data

ROUTING_HUBS:

HUB_ID	HUB_LOCATION
1	Lebanon
2	Italy
3	France
4	Germany
5	Portugal
6	USA

HUB_LINKS:

SOURCE_HUB_ID	TARGET_HUB_ID	PATH_LENGTH_KM
1	2	2200
2	1	2200
2	3	1100
3	2	1100
3	4	900
4	3	900
4	5	2100
5	4	2100
1	3	3100
3	1	3100
5	2	1800
2	5	1800
2	4	1200
4	2	1200
3	5	1700
5	3	1700
4	6	7500
6	4	7500
6	2	8800
2	6	8800
3	6	8700
6	3	8700

✔️ Solution

💻 Recursive Query

We use a recursive query to explore all paths from the source hub (Lebanon) to all destinations, accumulating distances and hops, while avoiding cycles:

WITH
    ALL_PATHS_TO_DESTINATION AS (

        -- Anchor Member: Initialize the source hub (Lebanon)

        SELECT
            -- SOURCE HUB
            HUB.HUB_ID AS SOURCE_HUB_ID
            , HUB.HUB_LOCATION AS SOURCE_HUB_LOCATION
            -- DESTINATION HUB
            , HUB.HUB_ID AS DESTINATION_HUB_ID
            , HUB.HUB_LOCATION AS DESTINATION_HUB_LOCATION
            -- Path length in KMs
            , 0 AS PATH_LENGTH_KM
            -- Number of hops to get to current destination
            , 0 AS NUMBER_OF_HOPS
            -- Path to get to current destination
            , HUB.HUB_ID::VARCHAR AS TRAVERSED_PATH
            -- technical recursive columns
            , 1 AS ITERATION_LEVEL
            , ARRAY_CONSTRUCT(HUB.HUB_ID) AS HUBS_IN_PATH
            /*THESE COLUMNS ARE JUST TO HELP VISUALIZE THE OUTPUT*/
            , HUB.HUB_LOCATION AS TRAVERSED_PATH_LOCATIONS
            , ARRAY_CONSTRUCT(HUB.HUB_LOCATION) AS HUB_LOCATIONS_IN_PATH
        FROM
            RECURSIVE_CTE.RECURSIVE_CTE_DATA.UC3_ROUTING_HUBS HUB
        WHERE
            HUB.HUB_LOCATION = 'Lebanon'

        UNION ALL

        -- Recursive Member: Get next reachable hubs not yet visited

        SELECT
            SOURCE_HUB.SOURCE_HUB_ID
            , SOURCE_HUB.SOURCE_HUB_LOCATION
            , HUB_LINK.TARGET_HUB_ID AS DESTINATION_HUB_ID
            , TARGET_HUBS.HUB_LOCATION AS DESTINATION_HUB_LOCATION
            , SOURCE_HUB.PATH_LENGTH_KM + HUB_LINK.PATH_LENGTH_KM AS PATH_LENGTH_KM
            , SOURCE_HUB.NUMBER_OF_HOPS + 1 AS NUMBER_OF_HOPS
            , SOURCE_HUB.TRAVERSED_PATH || ' -> ' || HUB_LINK.TARGET_HUB_ID AS TRAVERSED_PATH
            , SOURCE_HUB.ITERATION_LEVEL + 1 AS ITERATION_LEVEL
            , ARRAY_APPEND(SOURCE_HUB.HUBS_IN_PATH, HUB_LINK.TARGET_HUB_ID) AS HUBS_IN_PATH
            /*THESE COLUMNS ARE JUST TO HELP VISUALIZE THE OUTPUT*/
            , SOURCE_HUB.TRAVERSED_PATH_LOCATIONS || ' -> ' || TARGET_HUBS.HUB_LOCATION AS TRAVERSED_PATH_LOCATIONS
            , ARRAY_APPEND(SOURCE_HUB.HUB_LOCATIONS_IN_PATH, TARGET_HUBS.HUB_LOCATION) AS HUB_LOCATIONS_IN_PATH
        FROM
            ALL_PATHS_TO_DESTINATION AS SOURCE_HUB
            INNER JOIN RECURSIVE_CTE.RECURSIVE_CTE_DATA.UC3_HUB_LINKS HUB_LINK -- get the reachable hubs from current hub
                ON HUB_LINK.SOURCE_HUB_ID = SOURCE_HUB.DESTINATION_HUB_ID
            INNER JOIN RECURSIVE_CTE.RECURSIVE_CTE_DATA.UC3_ROUTING_HUBS TARGET_HUBS -- get destination hub details
                ON TARGET_HUBS.HUB_ID = HUB_LINK.TARGET_HUB_ID
        WHERE
            /*
                In situations like this use case it is very important to define robust stop conditions since rows can be generated exponentially 

                Assuming 40 hubs, with approx. 6 outbound connections per hub, we can arrive to millions of records just on iteration 10
            */
            -- Reasonable hop limit
            SOURCE_HUB.NUMBER_OF_HOPS < 30 
            -- Prevent cycles in path
            AND NOT ARRAY_CONTAINS(HUB_LINK.TARGET_HUB_ID, SOURCE_HUB.HUBS_IN_PATH)
            -- Reasonable Distance threshold
            AND SOURCE_HUB.PATH_LENGTH_KM + HUB_LINK.PATH_LENGTH_KM < 40000  
    )
SELECT 
    SOURCE_HUB_LOCATION
    , DESTINATION_HUB_LOCATION
    , PATH_LENGTH_KM
    , NUMBER_OF_HOPS
    , TRAVERSED_PATH_LOCATIONS
FROM ALL_PATHS_TO_DESTINATION
QUALIFY ROW_NUMBER() OVER (
                -- For every calculated source-destination path
                PARTITION BY SOURCE_HUB_ID, DESTINATION_HUB_ID
                --Take the shortest path in terms of KMs
                ORDER BY PATH_LENGTH_KM ASC
            ) = 1
ORDER BY SOURCE_HUB_ID, DESTINATION_HUB_ID, NUMBER_OF_HOPS, PATH_LENGTH_KM

📊 Results

The final result:

SOURCE_HUB_LOCATION	DESTINATION_HUB_LOCATION	PATH_LENGTH_KM	NUMBER_OF_HOPS	TRAVERSED_PATH_LOCATIONS
Lebanon	Lebanon	0	0	Lebanon
Lebanon	Italy	2200	1	Lebanon -> Italy
Lebanon	France	3100	1	Lebanon -> France
Lebanon	Germany	3400	2	Lebanon -> Italy -> Germany
Lebanon	Portugal	4000	2	Lebanon -> Italy -> Portugal
Lebanon	USA	10900	3	Lebanon -> Italy -> Germany -> USA

Conclusion

With this article we conclude the SQL Recursion series. We explored three use cases that cover SQL recursion concepts—from foundational principles to advanced techniques.

Now it’s your turn to practice and master this powerful skill, elevating your data manipulation and analysis capabilities. Feel free to check out the repository for additional datasets and get creative with the insights you can uncover!

Want to learn more about this topic? Check out the full series here: datainbites.hashnode.dev/series/mastering-sql-recursion.

Mastering SQL Recursion: From Theory to Practice with Key Use Cases (Part 3)

Table of contents

Introduction

📋 Use Case 1: Product Assembly - Total Cost and Duration

🎯 Objective

📊 Dataset

🔢 Sample Data

✔️ Solution

💻 Recursive Query

📊 Results

📋 Use Case 2: Budget Allocated vs. Actual Spend Across Cost Centers

🎯 Objective

📊 Dataset

🔢 Sample Data

✔️ Solution

💻 Recursive Query

📊 Results

📋 Use Case 3: Network Routing - Minimum Travel Distance

🎯 Objective

📊 Dataset

🔢 Sample Data

✔️ Solution

💻 Recursive Query

📊 Results

Conclusion

Subscribe to my newsletter

Elie Fayad

Elie Fayad