Mastering SQL Recursion: From Theory to Practice with Key Use Cases (Part 3)


Introduction
In the first article of this three-part series, we explored recursionβwhat it is, why itβs crucial for processing hierarchical data, and how it can be implemented in SQL. In the second article, we learned advanced recursion techniques to prevent infinite loops, handle cycles in data, and optimize recursive queries.
This article is the third and final part of the SQL Recursion series, where we'll go over some practical use cases of SQL recursion.
In this article, we will present three different scenarios:
Calculating the total cost and duration of a product assembly
Comparing budget allocation vs. actual spending across different cost centers
Finding the shortest path for packet routing in a network system
Each of these scenarios illustrates how recursion can simplify complex data relationships and unlock powerful query logic.
PREREQUISITES:
SQL knowledge
SQL recursion knowledge, check out the SQL Recursion series!
DISCLAIMERS:
All examples in this article were run in a Snowflake environment (but concepts apply to PostgreSQL, SQL Server, etc.). To follow along, you can sign up for a free Snowflake trial account here.
All code and data used in this article are available in the accompanying GitHub repository.
π Use Case 1: Product Assembly - Total Cost and Duration
π― Objective
Calculate the total cost and manufacturing duration required to produce a product. Understanding total costs and time significantly helps in optimizing supply chain planning and in pricing decisions.
π Dataset
We will use three tables:
PRODUCTS
: Contains the list of products being sold.
COLUMN_NAME | DESCRIPTION |
PRODUCT_ID | Product ID. |
PRODUCT_NAME | Product name. |
ASSEMBLY_TIME | Duration (in minutes) to assemble the product. |
ASSEMBLY_COST | Final assembly cost (in USD). |
PRODUCT_PARTS
: Represents the Bill of Materials (BoM) for each product.
COLUMN_NAME | DESCRIPTION |
PART_ID | Part ID. |
PART_NAME | Part name. |
COMPONENT_ID | Component required to manufacture the part. |
QUANTITY | Number of required components. |
COMPONENT_DETAILS
: Contains cost and duration details for each component.
COLUMN_NAME | DESCRIPTION |
COMPONENT_ID | Component ID. |
COMPONENT_NAME | Component name. |
COST | Manufacturing cost (in USD). |
DURATION | Manufacturing duration (in minutes). |
A Bill of Materials (BoM) is a list of all components, raw materials or assemblies required to manufacture a product.
π’ Sample Data
PRODUCTS
:
PRODUCT_ID | PRODUCT_NAME | ASSEMBLY_TIME | ASSEMBLY_COST |
1000 | Smartphone | 30 | 60 |
PRODUCT_PARTS
:
PART_ID | PART_NAME | COMPONENT_ID | QUANTITY |
1000 | Smartphone | 2000 | 1 |
1000 | Smartphone | 2001 | 1 |
2000 | CPU | 3000 | 8 |
2000 | CPU | 3001 | 1 |
2001 | Screen | 3002 | 1 |
2001 | Screen | 3003 | 1 |
COMPONENT_DETAILS
:
COMPONENT_ID | COMPONENT_NAME | COST | DURATION |
2000 | CPU | 100 | 30 |
2001 | Screen | 80 | 20 |
3000 | CPU Core | 10 | 5 |
3001 | Cache | 15 | 7 |
3002 | Glass Panel | 20 | 10 |
3003 | Touch Sensor | 10 | 8 |
βοΈ Solution
π» Recursive Query
To calculate the total cost and duration, we use the following recursive query:
WITH
ALL_COMPONENTS_PRODUCT AS (
-- Anchor Member: Start with all the final products -> all products present in the UC1_PRODUCTS table
SELECT
-- Final product ID
PRODS.PRODUCT_ID AS TARGET_PRODUCT
-- Component that the final product depends on
, PRODS.PRODUCT_ID AS CURRENT_COMPONENT
-- Cost of the current step
, PRODS.ASSEMBLY_COST AS STEP_COST
-- Duration of the current step
, PRODS.ASSEMBLY_TIME AS STEP_DURATION
-- Required quantities of the component
, 1 AS REQUIRED_QUANTITY
-- technical recursive columns
, ARRAY_CONSTRUCT(PRODS.PRODUCT_ID) AS COMPONENTS_IN_PATH
, 1 AS ITERATION_LEVEL
/*THESE COLUMNS ARE JUST TO HELP VISUALIZE THE OUTPUT*/
, PRODS.PRODUCT_NAME AS TARGET_PRODUCT_NAME
, PRODS.PRODUCT_NAME AS CURRENT_COMPONENT_NAME
, ARRAY_CONSTRUCT(PRODS.PRODUCT_NAME) AS COMPONENTS_NAME_IN_PATH
FROM
RECURSIVE_CTE_DATA.UC1_PRODUCTS PRODS
UNION ALL
-- Recursive Member: Traverse the hierarchy from the top to the bottom to find all components the final product depends on
SELECT
PREV_STEP.TARGET_PRODUCT
, PROD_PART.COMPONENT_ID AS CURRENT_COMPONENT
, COMP_DET.COST AS STEP_COST
, COMP_DET.DURATION AS STEP_DURATION
, PROD_PART.QUANTITY AS REQUIRED_QUANTITY
, ARRAY_APPEND(PREV_STEP.COMPONENTS_IN_PATH, PROD_PART.COMPONENT_ID) AS COMPONENTS_IN_PATH
, PREV_STEP.ITERATION_LEVEL + 1 AS ITERATION_LEVEL
/*THESE COLUMNS ARE JUST TO HELP VISUALIZE THE OUTPUT*/
, PREV_STEP.TARGET_PRODUCT_NAME
, COMP_DET.COMPONENT_NAME AS CURRENT_COMPONENT_NAME
, ARRAY_APPEND(PREV_STEP.COMPONENTS_NAME_IN_PATH, COMP_DET.COMPONENT_NAME) AS COMPONENTS_NAME_IN_PATH
FROM
ALL_COMPONENTS_PRODUCT PREV_STEP
INNER JOIN RECURSIVE_CTE_DATA.UC1_PRODUCT_PARTS PROD_PART -- get the component on which the parent component is dependent on
ON PREV_STEP.CURRENT_COMPONENT = PROD_PART.PART_ID
INNER JOIN RECURSIVE_CTE_DATA.UC1_COMPONENT_DETAILS COMP_DET -- get the details of the components
ON COMP_DET.COMPONENT_ID = PROD_PART.COMPONENT_ID
WHERE
PREV_STEP.ITERATION_LEVEL <= 1000
AND NOT ARRAY_CONTAINS(PROD_PART.COMPONENT_ID, PREV_STEP.COMPONENTS_IN_PATH)
), TOTALS_PER_COMPONENT as (
-- Find the total quantity, cost, and duration for each component of the product
SELECT
TARGET_PRODUCT
, TARGET_PRODUCT_NAME
, CURRENT_COMPONENT
, CURRENT_COMPONENT_NAME
, SUM(REQUIRED_QUANTITY * STEP_COST) AS REQUIRED_QUANTITY_COST
, SUM(REQUIRED_QUANTITY * STEP_DURATION) AS REQUIRED_QUANTITY_DURATION
, SUM(REQUIRED_QUANTITY) AS TOTAL_REQUIRED_QUANTITY
FROM ALL_COMPONENTS_PRODUCT
GROUP BY ALL
)
-- Sum the costs and duration to get the final data
SELECT
TARGET_PRODUCT
, TARGET_PRODUCT_NAME
, SUM(REQUIRED_QUANTITY_COST) AS TOTAL_COST
, SUM(REQUIRED_QUANTITY_DURATION) AS TOTAL_DURATION
-- List all the components in the final product
, OBJECT_DELETE(
OBJECT_AGG(
CURRENT_COMPONENT_NAME || '_' || CURRENT_COMPONENT,
TOTAL_REQUIRED_QUANTITY
),
TARGET_PRODUCT_NAME || '_' || TARGET_PRODUCT
) AS ALL_COMPONENTS
FROM TOTALS_PER_COMPONENT
GROUP BY ALL
ORDER BY TARGET_PRODUCT
π Results
The final result:
TARGET_PRODUCT | TARGET_PRODUCT_NAME | TOTAL_COST | TOTAL_DURATION | ALL_COMPONENTS |
1000 | Smartphone | 365 | 145 | See below |
// ALL_COMPONENTS
{
"CPU Core_3000": 8,
"CPU_2000": 1,
"Cache_3001": 1,
"Glass Panel_3002": 1,
"Screen_2001": 1,
"Touch Sensor_3003": 1
}
π Use Case 2: Budget Allocated vs. Actual Spend Across Cost Centers
π― Objective
Determine whether the cost centers (CCs) under the Corporate HQ overspent or underspent in a specific year.
π Dataset
We will use two tables:
COST_CENTERS
: Stores all cost centers and their hierarchical relationships.
COLUMN_NAME | DESCRIPTION |
COST_CENTER_ID | Cost center ID. |
PARENT_CC_ID | Parent cost center to which the cost center reports to. |
COST_CENTER_NAME | Cost center name. |
COST_CENTERS_BUDGETS
: Contains budget and actual spending data for each cost center per year.
COLUMN_NAME | DESCRIPTION |
COST_CENTER_ID | Cost center ID. |
REFERENCE_YEAR | Year of reference. |
ALLOCATED_BUDGET | Allocated budget for the cost center in the reference year (in USD). |
TOTAL_SPENT | Actual amount spent by the cost center in the reference year (in USD). |
π’ Sample Data
COST_CENTERS
:
COST_CENTER_ID | PARENT_CC_ID | COST_CENTER_NAME |
1 | NULL | Corporate HQ |
2 | 1 | Engineering |
3 | 1 | Marketing |
4 | 2 | Software |
5 | 2 | Hardware |
6 | 3 | Content |
7 | 3 | SEO |
8 | 6 | Social Media |
COST_CENTERS_BUDGETS
:
COST_CENTER_ID | REFERENCE_YEAR | ALLOCATED_BUDGET | TOTAL_SPENT |
1 | 2025 | 200000 | 180000 |
1 | 2024 | 180000 | 175000 |
2 | 2025 | 100000 | 110000 |
2 | 2024 | 95000 | 90000 |
3 | 2025 | 80000 | 85000 |
3 | 2024 | 75000 | 70000 |
4 | 2025 | 150000 | 155000 |
4 | 2024 | 140000 | 135000 |
5 | 2025 | 130000 | 120000 |
5 | 2024 | 120000 | 115000 |
6 | 2025 | 70000 | 65000 |
6 | 2024 | 65000 | 60000 |
7 | 2025 | 60000 | 62000 |
7 | 2024 | 55000 | 57000 |
8 | 2025 | 40000 | 38000 |
8 | 2024 | 35000 | 34000 |
βοΈ Solution
π» Recursive Query
We use a recursive query to roll up budget and spending data according to the cost center hierarchy:
WITH
COST_CENTER_ROLL_UP AS (
-- Anchor Member: Get all cost centers directly under Corporate HQ
SELECT
-- Parent cost center
CC.COST_CENTER_ID AS ROOT_COST_CENTER
, CC.COST_CENTER_NAME AS ROOT_COST_CENTER_NAME
-- The reference year
, CC_B.REFERENCE_YEAR
-- The child cost center
, CC.COST_CENTER_ID AS CURRENT_CC
-- Cost center budget
, CC_B.ALLOCATED_BUDGET
-- Cost center total yearly spent
, CC_B.TOTAL_SPENT
-- technical recursive columns
, 1 AS ITERATION_LEVEL
, ARRAY_CONSTRUCT(CC.COST_CENTER_ID) AS CCS_IN_PATH
/*THESE COLUMNS ARE JUST TO HELP VISUALIZE THE OUTPUT*/
, CC.COST_CENTER_NAME AS CURRENT_CC_NAME
, ARRAY_CONSTRUCT(CC.COST_CENTER_NAME) AS CCS_IN_PATH_NAME
FROM
RECURSIVE_CTE.RECURSIVE_CTE_DATA.UC2_COST_CENTERS CC
INNER JOIN RECURSIVE_CTE.RECURSIVE_CTE_DATA.UC2_COST_CENTERS_BUDGETS CC_B
ON CC_B.COST_CENTER_ID = CC.COST_CENTER_ID
WHERE
CC.PARENT_CC_ID = 1
UNION ALL
-- Recursive Member: Traverse the hierarchy to find all descendant cost centers
SELECT
PARENT_CC.ROOT_COST_CENTER
, PARENT_CC.ROOT_COST_CENTER_NAME
, PARENT_CC.REFERENCE_YEAR
, CC.COST_CENTER_ID AS CURRENT_CC
, CC_B.ALLOCATED_BUDGET
, CC_B.TOTAL_SPENT
, PARENT_CC.ITERATION_LEVEL + 1 AS ITERATION_LEVEL
, ARRAY_APPEND(PARENT_CC.CCS_IN_PATH, CC.COST_CENTER_ID) AS CCS_IN_PATH
/*THESE COLUMNS ARE JUST TO HELP VISUALIZE THE OUTPUT*/
, CC.COST_CENTER_NAME AS CURRENT_CC_NAME
, ARRAY_APPEND(PARENT_CC.CCS_IN_PATH_NAME, CC.COST_CENTER_NAME) AS CCS_IN_PATH_NAME
FROM
COST_CENTER_ROLL_UP PARENT_CC
INNER JOIN RECURSIVE_CTE.RECURSIVE_CTE_DATA.UC2_COST_CENTERS CC -- get the children of the parent cost centers of the previous step
ON CC.PARENT_CC_ID = PARENT_CC.CURRENT_CC
INNER JOIN RECURSIVE_CTE.RECURSIVE_CTE_DATA.UC2_COST_CENTERS_BUDGETS CC_B -- get the related data of the child cost center
ON CC_B.COST_CENTER_ID = CC.COST_CENTER_ID
AND CC_B.REFERENCE_YEAR = PARENT_CC.REFERENCE_YEAR
WHERE
PARENT_CC.ITERATION_LEVEL < 1000
AND NOT ARRAY_CONTAINS(CC.COST_CENTER_ID, PARENT_CC.CCS_IN_PATH)
)
SELECT
ROOT_COST_CENTER AS COST_CENTER
, ROOT_COST_CENTER_NAME AS COST_CENTER_NAME
, REFERENCE_YEAR
-- get the total allocated budget
, SUM(ALLOCATED_BUDGET) AS TOTAL_BUDGET_CC
-- get the total effective spent
, SUM(TOTAL_SPENT) AS TOTAL_SPENT_CC
-- calculate the difference between budget and actual spent
, TOTAL_BUDGET_CC - TOTAL_SPENT_CC AS BUDGET_VARIANCE
, ROUND((BUDGET_VARIANCE / TOTAL_BUDGET_CC) * 100, 2) AS BUDGET_VARIANCE_PCT
-- classify the cost center
, CASE
WHEN BUDGET_VARIANCE < 0 THEN 'Over Budget'
WHEN BUDGET_VARIANCE = 0 THEN 'Within Budget'
WHEN BUDGET_VARIANCE > 0 THEN 'Under Budget'
END BUDGET_STATUS
FROM COST_CENTER_ROLL_UP
GROUP BY ALL
ORDER BY COST_CENTER, REFERENCE_YEAR
π Results
The final result:
COST_CENTER | COST_CENTER_NAME | REFERENCE_YEAR | TOTAL_BUDGET_CC | TOTAL_SPENT_CC | BUDGET_VARIANCE | BUDGET_VARIANCE_PCT | BUDGET_STATUS |
2 | Engineering | 2024 | 355000 | 340000 | 15000 | 4.23 | Under Budget |
2 | Engineering | 2025 | 380000 | 385000 | -5000 | -1.32 | Over Budget |
3 | Marketing | 2024 | 230000 | 221000 | 9000 | 3.91 | Under Budget |
3 | Marketing | 2025 | 250000 | 250000 | 0 | 0.00 | Within Budget |
The chart below visualizes the budget variance across cost centers for the years 2024 and 2025, clearly highlighting which centers stayed within budget and which exceeded their allocated funds:
π Use Case 3: Network Routing - Minimum Travel Distance
π― Objective
Calculate the minimum distance (in KMs) a packet has to travel from a source hub to arrive to all possible destination hubs.
This use case highlights how recursive queries can be leveraged to implement pseudo-graph algorithms.
π Dataset
We will use two tables:
ROUTING_HUBS
: Stores all available hubs.
COLUMN_NAME | DESCRIPTION |
HUB_ID | Hub ID. |
HUB_LOCATION | Country where the hub is located. |
HUB_LINKS
: Contains the connections between hubs. Links are bidirectional.
COLUMN_NAME | DESCRIPTION |
SOURCE_HUB_ID | Source Hub ID. |
TARGET_HUB_ID | Destination Hub ID. |
PATH_LENGTH_KM | Distance (in KMs) from source to destination. |
π’ Sample Data
ROUTING_HUBS
:
HUB_ID | HUB_LOCATION |
1 | Lebanon |
2 | Italy |
3 | France |
4 | Germany |
5 | Portugal |
6 | USA |
HUB_LINKS
:
SOURCE_HUB_ID | TARGET_HUB_ID | PATH_LENGTH_KM |
1 | 2 | 2200 |
2 | 1 | 2200 |
2 | 3 | 1100 |
3 | 2 | 1100 |
3 | 4 | 900 |
4 | 3 | 900 |
4 | 5 | 2100 |
5 | 4 | 2100 |
1 | 3 | 3100 |
3 | 1 | 3100 |
5 | 2 | 1800 |
2 | 5 | 1800 |
2 | 4 | 1200 |
4 | 2 | 1200 |
3 | 5 | 1700 |
5 | 3 | 1700 |
4 | 6 | 7500 |
6 | 4 | 7500 |
6 | 2 | 8800 |
2 | 6 | 8800 |
3 | 6 | 8700 |
6 | 3 | 8700 |
βοΈ Solution
π» Recursive Query
We use a recursive query to explore all paths from the source hub (Lebanon
) to all destinations, accumulating distances and hops, while avoiding cycles:
WITH
ALL_PATHS_TO_DESTINATION AS (
-- Anchor Member: Initialize the source hub (Lebanon)
SELECT
-- SOURCE HUB
HUB.HUB_ID AS SOURCE_HUB_ID
, HUB.HUB_LOCATION AS SOURCE_HUB_LOCATION
-- DESTINATION HUB
, HUB.HUB_ID AS DESTINATION_HUB_ID
, HUB.HUB_LOCATION AS DESTINATION_HUB_LOCATION
-- Path length in KMs
, 0 AS PATH_LENGTH_KM
-- Number of hops to get to current destination
, 0 AS NUMBER_OF_HOPS
-- Path to get to current destination
, HUB.HUB_ID::VARCHAR AS TRAVERSED_PATH
-- technical recursive columns
, 1 AS ITERATION_LEVEL
, ARRAY_CONSTRUCT(HUB.HUB_ID) AS HUBS_IN_PATH
/*THESE COLUMNS ARE JUST TO HELP VISUALIZE THE OUTPUT*/
, HUB.HUB_LOCATION AS TRAVERSED_PATH_LOCATIONS
, ARRAY_CONSTRUCT(HUB.HUB_LOCATION) AS HUB_LOCATIONS_IN_PATH
FROM
RECURSIVE_CTE.RECURSIVE_CTE_DATA.UC3_ROUTING_HUBS HUB
WHERE
HUB.HUB_LOCATION = 'Lebanon'
UNION ALL
-- Recursive Member: Get next reachable hubs not yet visited
SELECT
SOURCE_HUB.SOURCE_HUB_ID
, SOURCE_HUB.SOURCE_HUB_LOCATION
, HUB_LINK.TARGET_HUB_ID AS DESTINATION_HUB_ID
, TARGET_HUBS.HUB_LOCATION AS DESTINATION_HUB_LOCATION
, SOURCE_HUB.PATH_LENGTH_KM + HUB_LINK.PATH_LENGTH_KM AS PATH_LENGTH_KM
, SOURCE_HUB.NUMBER_OF_HOPS + 1 AS NUMBER_OF_HOPS
, SOURCE_HUB.TRAVERSED_PATH || ' -> ' || HUB_LINK.TARGET_HUB_ID AS TRAVERSED_PATH
, SOURCE_HUB.ITERATION_LEVEL + 1 AS ITERATION_LEVEL
, ARRAY_APPEND(SOURCE_HUB.HUBS_IN_PATH, HUB_LINK.TARGET_HUB_ID) AS HUBS_IN_PATH
/*THESE COLUMNS ARE JUST TO HELP VISUALIZE THE OUTPUT*/
, SOURCE_HUB.TRAVERSED_PATH_LOCATIONS || ' -> ' || TARGET_HUBS.HUB_LOCATION AS TRAVERSED_PATH_LOCATIONS
, ARRAY_APPEND(SOURCE_HUB.HUB_LOCATIONS_IN_PATH, TARGET_HUBS.HUB_LOCATION) AS HUB_LOCATIONS_IN_PATH
FROM
ALL_PATHS_TO_DESTINATION AS SOURCE_HUB
INNER JOIN RECURSIVE_CTE.RECURSIVE_CTE_DATA.UC3_HUB_LINKS HUB_LINK -- get the reachable hubs from current hub
ON HUB_LINK.SOURCE_HUB_ID = SOURCE_HUB.DESTINATION_HUB_ID
INNER JOIN RECURSIVE_CTE.RECURSIVE_CTE_DATA.UC3_ROUTING_HUBS TARGET_HUBS -- get destination hub details
ON TARGET_HUBS.HUB_ID = HUB_LINK.TARGET_HUB_ID
WHERE
/*
In situations like this use case it is very important to define robust stop conditions since rows can be generated exponentially
Assuming 40 hubs, with approx. 6 outbound connections per hub, we can arrive to millions of records just on iteration 10
*/
-- Reasonable hop limit
SOURCE_HUB.NUMBER_OF_HOPS < 30
-- Prevent cycles in path
AND NOT ARRAY_CONTAINS(HUB_LINK.TARGET_HUB_ID, SOURCE_HUB.HUBS_IN_PATH)
-- Reasonable Distance threshold
AND SOURCE_HUB.PATH_LENGTH_KM + HUB_LINK.PATH_LENGTH_KM < 40000
)
SELECT
SOURCE_HUB_LOCATION
, DESTINATION_HUB_LOCATION
, PATH_LENGTH_KM
, NUMBER_OF_HOPS
, TRAVERSED_PATH_LOCATIONS
FROM ALL_PATHS_TO_DESTINATION
QUALIFY ROW_NUMBER() OVER (
-- For every calculated source-destination path
PARTITION BY SOURCE_HUB_ID, DESTINATION_HUB_ID
--Take the shortest path in terms of KMs
ORDER BY PATH_LENGTH_KM ASC
) = 1
ORDER BY SOURCE_HUB_ID, DESTINATION_HUB_ID, NUMBER_OF_HOPS, PATH_LENGTH_KM
π Results
The final result:
SOURCE_HUB_LOCATION | DESTINATION_HUB_LOCATION | PATH_LENGTH_KM | NUMBER_OF_HOPS | TRAVERSED_PATH_LOCATIONS |
Lebanon | Lebanon | 0 | 0 | Lebanon |
Lebanon | Italy | 2200 | 1 | Lebanon -> Italy |
Lebanon | France | 3100 | 1 | Lebanon -> France |
Lebanon | Germany | 3400 | 2 | Lebanon -> Italy -> Germany |
Lebanon | Portugal | 4000 | 2 | Lebanon -> Italy -> Portugal |
Lebanon | USA | 10900 | 3 | Lebanon -> Italy -> Germany -> USA |
Conclusion
With this article we conclude the SQL Recursion series. We explored three use cases that cover SQL recursion conceptsβfrom foundational principles to advanced techniques.
Now itβs your turn to practice and master this powerful skill, elevating your data manipulation and analysis capabilities. Feel free to check out the repository for additional datasets and get creative with the insights you can uncover!
Want to learn more about this topic? Check out the full series here: datainbites.hashnode.dev/series/mastering-sql-recursion.
Subscribe to my newsletter
Read articles from Elie Fayad directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Elie Fayad
Elie Fayad
I'm a data professional specializing in SQL and Snowflake, with a strong background in cloud migrations, data platform configuration, ETL/ELT pipeline development, data modeling, and workflow orchestration. I'm proactive, eager to learn, and passionate about tackling new challenges. I enjoy exploring emerging technologies and sharing knowledge with the community!