Mastering SQL Recursion: From Theory to Practice with Key Use Cases (Part 3)

Elie FayadElie Fayad
13 min read

Introduction

In the first article of this three-part series, we explored recursionβ€”what it is, why it’s crucial for processing hierarchical data, and how it can be implemented in SQL. In the second article, we learned advanced recursion techniques to prevent infinite loops, handle cycles in data, and optimize recursive queries.

This article is the third and final part of the SQL Recursion series, where we'll go over some practical use cases of SQL recursion.

In this article, we will present three different scenarios:

  • Calculating the total cost and duration of a product assembly

  • Comparing budget allocation vs. actual spending across different cost centers

  • Finding the shortest path for packet routing in a network system

Each of these scenarios illustrates how recursion can simplify complex data relationships and unlock powerful query logic.


PREREQUISITES:

DISCLAIMERS:

  • All examples in this article were run in a Snowflake environment (but concepts apply to PostgreSQL, SQL Server, etc.). To follow along, you can sign up for a free Snowflake trial account here.

  • All code and data used in this article are available in the accompanying GitHub repository.


πŸ“‹ Use Case 1: Product Assembly - Total Cost and Duration

🎯 Objective

Calculate the total cost and manufacturing duration required to produce a product. Understanding total costs and time significantly helps in optimizing supply chain planning and in pricing decisions.

πŸ“Š Dataset

We will use three tables:

  • PRODUCTS: Contains the list of products being sold.
COLUMN_NAMEDESCRIPTION
PRODUCT_IDProduct ID.
PRODUCT_NAMEProduct name.
ASSEMBLY_TIMEDuration (in minutes) to assemble the product.
ASSEMBLY_COSTFinal assembly cost (in USD).
  • PRODUCT_PARTS: Represents the Bill of Materials (BoM) for each product.
COLUMN_NAMEDESCRIPTION
PART_IDPart ID.
PART_NAMEPart name.
COMPONENT_IDComponent required to manufacture the part.
QUANTITYNumber of required components.
  • COMPONENT_DETAILS: Contains cost and duration details for each component.
COLUMN_NAMEDESCRIPTION
COMPONENT_IDComponent ID.
COMPONENT_NAMEComponent name.
COSTManufacturing cost (in USD).
DURATIONManufacturing duration (in minutes).

A Bill of Materials (BoM) is a list of all components, raw materials or assemblies required to manufacture a product.

πŸ”’ Sample Data

  • PRODUCTS:
PRODUCT_IDPRODUCT_NAMEASSEMBLY_TIMEASSEMBLY_COST
1000Smartphone3060
  • PRODUCT_PARTS:
PART_IDPART_NAMECOMPONENT_IDQUANTITY
1000Smartphone20001
1000Smartphone20011
2000CPU30008
2000CPU30011
2001Screen30021
2001Screen30031
  • COMPONENT_DETAILS:
COMPONENT_IDCOMPONENT_NAMECOSTDURATION
2000CPU10030
2001Screen8020
3000CPU Core105
3001Cache157
3002Glass Panel2010
3003Touch Sensor108

βœ”οΈ Solution

πŸ’» Recursive Query

To calculate the total cost and duration, we use the following recursive query:

WITH
    ALL_COMPONENTS_PRODUCT AS (

        -- Anchor Member: Start with all the final products -> all products present in the UC1_PRODUCTS table

        SELECT 
            -- Final product ID
            PRODS.PRODUCT_ID AS TARGET_PRODUCT
            -- Component that the final product depends on
            , PRODS.PRODUCT_ID AS CURRENT_COMPONENT
            -- Cost of the current step
            , PRODS.ASSEMBLY_COST AS STEP_COST
            -- Duration of the current step
            , PRODS.ASSEMBLY_TIME AS STEP_DURATION
            -- Required quantities of the component
            , 1 AS REQUIRED_QUANTITY
            -- technical recursive columns
            , ARRAY_CONSTRUCT(PRODS.PRODUCT_ID) AS COMPONENTS_IN_PATH
            , 1 AS ITERATION_LEVEL
            /*THESE COLUMNS ARE JUST TO HELP VISUALIZE THE OUTPUT*/
            , PRODS.PRODUCT_NAME AS TARGET_PRODUCT_NAME
            , PRODS.PRODUCT_NAME AS CURRENT_COMPONENT_NAME
            , ARRAY_CONSTRUCT(PRODS.PRODUCT_NAME) AS COMPONENTS_NAME_IN_PATH
        FROM 
            RECURSIVE_CTE_DATA.UC1_PRODUCTS PRODS

        UNION ALL

        -- Recursive Member: Traverse the hierarchy from the top to the bottom to find all components the final product depends on

        SELECT 
            PREV_STEP.TARGET_PRODUCT
            , PROD_PART.COMPONENT_ID AS CURRENT_COMPONENT
            , COMP_DET.COST AS STEP_COST
            , COMP_DET.DURATION AS STEP_DURATION
            , PROD_PART.QUANTITY AS REQUIRED_QUANTITY
            , ARRAY_APPEND(PREV_STEP.COMPONENTS_IN_PATH, PROD_PART.COMPONENT_ID) AS COMPONENTS_IN_PATH
            , PREV_STEP.ITERATION_LEVEL + 1 AS ITERATION_LEVEL
            /*THESE COLUMNS ARE JUST TO HELP VISUALIZE THE OUTPUT*/
            , PREV_STEP.TARGET_PRODUCT_NAME
            , COMP_DET.COMPONENT_NAME AS CURRENT_COMPONENT_NAME
            , ARRAY_APPEND(PREV_STEP.COMPONENTS_NAME_IN_PATH, COMP_DET.COMPONENT_NAME) AS COMPONENTS_NAME_IN_PATH
        FROM 
            ALL_COMPONENTS_PRODUCT PREV_STEP
            INNER JOIN RECURSIVE_CTE_DATA.UC1_PRODUCT_PARTS PROD_PART -- get the component on which the parent component is dependent on
                ON PREV_STEP.CURRENT_COMPONENT = PROD_PART.PART_ID
            INNER JOIN RECURSIVE_CTE_DATA.UC1_COMPONENT_DETAILS COMP_DET -- get the details of the components
                ON COMP_DET.COMPONENT_ID = PROD_PART.COMPONENT_ID
        WHERE
            PREV_STEP.ITERATION_LEVEL <= 1000
            AND NOT ARRAY_CONTAINS(PROD_PART.COMPONENT_ID, PREV_STEP.COMPONENTS_IN_PATH)
    ), TOTALS_PER_COMPONENT as (
        -- Find the total quantity, cost, and duration for each component of the product
        SELECT 
            TARGET_PRODUCT
            , TARGET_PRODUCT_NAME
            , CURRENT_COMPONENT
            , CURRENT_COMPONENT_NAME
            , SUM(REQUIRED_QUANTITY * STEP_COST) AS REQUIRED_QUANTITY_COST
            , SUM(REQUIRED_QUANTITY * STEP_DURATION) AS REQUIRED_QUANTITY_DURATION
            , SUM(REQUIRED_QUANTITY) AS TOTAL_REQUIRED_QUANTITY
        FROM ALL_COMPONENTS_PRODUCT
        GROUP BY ALL
    )
-- Sum the costs and duration to get the final data
SELECT 
    TARGET_PRODUCT
    , TARGET_PRODUCT_NAME
    , SUM(REQUIRED_QUANTITY_COST) AS TOTAL_COST
    , SUM(REQUIRED_QUANTITY_DURATION) AS TOTAL_DURATION
    -- List all the components in the final product
    , OBJECT_DELETE(
        OBJECT_AGG(
            CURRENT_COMPONENT_NAME || '_' || CURRENT_COMPONENT, 
            TOTAL_REQUIRED_QUANTITY
        ), 
        TARGET_PRODUCT_NAME || '_' || TARGET_PRODUCT
    ) AS ALL_COMPONENTS
FROM TOTALS_PER_COMPONENT
GROUP BY ALL
ORDER BY TARGET_PRODUCT

πŸ“Š Results

The final result:

TARGET_PRODUCTTARGET_PRODUCT_NAMETOTAL_COSTTOTAL_DURATIONALL_COMPONENTS
1000Smartphone365145See below
// ALL_COMPONENTS
{
  "CPU Core_3000": 8,
  "CPU_2000": 1,
  "Cache_3001": 1,
  "Glass Panel_3002": 1,
  "Screen_2001": 1,
  "Touch Sensor_3003": 1
}

πŸ“‹ Use Case 2: Budget Allocated vs. Actual Spend Across Cost Centers

🎯 Objective

Determine whether the cost centers (CCs) under the Corporate HQ overspent or underspent in a specific year.

πŸ“Š Dataset

We will use two tables:

  • COST_CENTERS: Stores all cost centers and their hierarchical relationships.
COLUMN_NAMEDESCRIPTION
COST_CENTER_IDCost center ID.
PARENT_CC_IDParent cost center to which the cost center reports to.
COST_CENTER_NAMECost center name.
  • COST_CENTERS_BUDGETS: Contains budget and actual spending data for each cost center per year.
COLUMN_NAMEDESCRIPTION
COST_CENTER_IDCost center ID.
REFERENCE_YEARYear of reference.
ALLOCATED_BUDGETAllocated budget for the cost center in the reference year (in USD).
TOTAL_SPENTActual amount spent by the cost center in the reference year (in USD).

πŸ”’ Sample Data

  • COST_CENTERS:
COST_CENTER_IDPARENT_CC_IDCOST_CENTER_NAME
1NULLCorporate HQ
21Engineering
31Marketing
42Software
52Hardware
63Content
73SEO
86Social Media
  • COST_CENTERS_BUDGETS:
COST_CENTER_IDREFERENCE_YEARALLOCATED_BUDGETTOTAL_SPENT
12025200000180000
12024180000175000
22025100000110000
220249500090000
320258000085000
320247500070000
42025150000155000
42024140000135000
52025130000120000
52024120000115000
620257000065000
620246500060000
720256000062000
720245500057000
820254000038000
820243500034000

βœ”οΈ Solution

πŸ’» Recursive Query

We use a recursive query to roll up budget and spending data according to the cost center hierarchy:

WITH
    COST_CENTER_ROLL_UP AS (

        -- Anchor Member: Get all cost centers directly under Corporate HQ

        SELECT
            -- Parent cost center
            CC.COST_CENTER_ID AS ROOT_COST_CENTER
            , CC.COST_CENTER_NAME AS ROOT_COST_CENTER_NAME
            -- The reference year
            , CC_B.REFERENCE_YEAR
            -- The child cost center
            , CC.COST_CENTER_ID AS CURRENT_CC
            -- Cost center budget
            , CC_B.ALLOCATED_BUDGET
            -- Cost center total yearly spent
            , CC_B.TOTAL_SPENT
            -- technical recursive columns
            , 1 AS ITERATION_LEVEL
            , ARRAY_CONSTRUCT(CC.COST_CENTER_ID) AS CCS_IN_PATH
            /*THESE COLUMNS ARE JUST TO HELP VISUALIZE THE OUTPUT*/
            , CC.COST_CENTER_NAME AS CURRENT_CC_NAME
            , ARRAY_CONSTRUCT(CC.COST_CENTER_NAME) AS CCS_IN_PATH_NAME
        FROM
            RECURSIVE_CTE.RECURSIVE_CTE_DATA.UC2_COST_CENTERS CC
            INNER JOIN RECURSIVE_CTE.RECURSIVE_CTE_DATA.UC2_COST_CENTERS_BUDGETS CC_B
                ON CC_B.COST_CENTER_ID = CC.COST_CENTER_ID
        WHERE
            CC.PARENT_CC_ID = 1

        UNION ALL

        -- Recursive Member: Traverse the hierarchy to find all descendant cost centers

        SELECT
            PARENT_CC.ROOT_COST_CENTER
            , PARENT_CC.ROOT_COST_CENTER_NAME
            , PARENT_CC.REFERENCE_YEAR
            , CC.COST_CENTER_ID AS CURRENT_CC
            , CC_B.ALLOCATED_BUDGET
            , CC_B.TOTAL_SPENT
            , PARENT_CC.ITERATION_LEVEL + 1 AS ITERATION_LEVEL
            , ARRAY_APPEND(PARENT_CC.CCS_IN_PATH, CC.COST_CENTER_ID) AS CCS_IN_PATH
            /*THESE COLUMNS ARE JUST TO HELP VISUALIZE THE OUTPUT*/
            , CC.COST_CENTER_NAME AS CURRENT_CC_NAME
            , ARRAY_APPEND(PARENT_CC.CCS_IN_PATH_NAME, CC.COST_CENTER_NAME) AS CCS_IN_PATH_NAME
        FROM
            COST_CENTER_ROLL_UP PARENT_CC
            INNER JOIN RECURSIVE_CTE.RECURSIVE_CTE_DATA.UC2_COST_CENTERS CC -- get the children of the parent cost centers of the previous step
                ON CC.PARENT_CC_ID = PARENT_CC.CURRENT_CC
            INNER JOIN RECURSIVE_CTE.RECURSIVE_CTE_DATA.UC2_COST_CENTERS_BUDGETS CC_B -- get the related data of the child cost center
                ON CC_B.COST_CENTER_ID = CC.COST_CENTER_ID
                    AND CC_B.REFERENCE_YEAR = PARENT_CC.REFERENCE_YEAR
        WHERE
            PARENT_CC.ITERATION_LEVEL < 1000
            AND NOT ARRAY_CONTAINS(CC.COST_CENTER_ID, PARENT_CC.CCS_IN_PATH)
    )
SELECT 
    ROOT_COST_CENTER AS COST_CENTER
    , ROOT_COST_CENTER_NAME AS COST_CENTER_NAME
    , REFERENCE_YEAR
    -- get the total allocated budget
    , SUM(ALLOCATED_BUDGET) AS TOTAL_BUDGET_CC
    -- get the total effective spent
    , SUM(TOTAL_SPENT) AS TOTAL_SPENT_CC
    -- calculate the difference between budget and actual spent
    , TOTAL_BUDGET_CC - TOTAL_SPENT_CC AS BUDGET_VARIANCE
    , ROUND((BUDGET_VARIANCE / TOTAL_BUDGET_CC) * 100, 2) AS BUDGET_VARIANCE_PCT
    -- classify the cost center
    , CASE
        WHEN BUDGET_VARIANCE < 0 THEN 'Over Budget'
        WHEN BUDGET_VARIANCE = 0 THEN 'Within Budget'
        WHEN BUDGET_VARIANCE > 0 THEN 'Under Budget'
      END BUDGET_STATUS
FROM COST_CENTER_ROLL_UP
GROUP BY ALL
ORDER BY COST_CENTER, REFERENCE_YEAR

πŸ“Š Results

The final result:

COST_CENTERCOST_CENTER_NAMEREFERENCE_YEARTOTAL_BUDGET_CCTOTAL_SPENT_CCBUDGET_VARIANCEBUDGET_VARIANCE_PCTBUDGET_STATUS
2Engineering2024355000340000150004.23Under Budget
2Engineering2025380000385000-5000-1.32Over Budget
3Marketing202423000022100090003.91Under Budget
3Marketing202525000025000000.00Within Budget

The chart below visualizes the budget variance across cost centers for the years 2024 and 2025, clearly highlighting which centers stayed within budget and which exceeded their allocated funds:

Budget Variance Visualization


πŸ“‹ Use Case 3: Network Routing - Minimum Travel Distance

🎯 Objective

Calculate the minimum distance (in KMs) a packet has to travel from a source hub to arrive to all possible destination hubs.

This use case highlights how recursive queries can be leveraged to implement pseudo-graph algorithms.

πŸ“Š Dataset

We will use two tables:

  • ROUTING_HUBS: Stores all available hubs.
COLUMN_NAMEDESCRIPTION
HUB_IDHub ID.
HUB_LOCATIONCountry where the hub is located.
  • HUB_LINKS: Contains the connections between hubs. Links are bidirectional.
COLUMN_NAMEDESCRIPTION
SOURCE_HUB_IDSource Hub ID.
TARGET_HUB_IDDestination Hub ID.
PATH_LENGTH_KMDistance (in KMs) from source to destination.

πŸ”’ Sample Data

  • ROUTING_HUBS:
HUB_IDHUB_LOCATION
1Lebanon
2Italy
3France
4Germany
5Portugal
6USA
  • HUB_LINKS:
SOURCE_HUB_IDTARGET_HUB_IDPATH_LENGTH_KM
122200
212200
231100
321100
34900
43900
452100
542100
133100
313100
521800
251800
241200
421200
351700
531700
467500
647500
628800
268800
368700
638700

βœ”οΈ Solution

πŸ’» Recursive Query

We use a recursive query to explore all paths from the source hub (Lebanon) to all destinations, accumulating distances and hops, while avoiding cycles:

WITH
    ALL_PATHS_TO_DESTINATION AS (

        -- Anchor Member: Initialize the source hub (Lebanon)

        SELECT
            -- SOURCE HUB
            HUB.HUB_ID AS SOURCE_HUB_ID
            , HUB.HUB_LOCATION AS SOURCE_HUB_LOCATION
            -- DESTINATION HUB
            , HUB.HUB_ID AS DESTINATION_HUB_ID
            , HUB.HUB_LOCATION AS DESTINATION_HUB_LOCATION
            -- Path length in KMs
            , 0 AS PATH_LENGTH_KM
            -- Number of hops to get to current destination
            , 0 AS NUMBER_OF_HOPS
            -- Path to get to current destination
            , HUB.HUB_ID::VARCHAR AS TRAVERSED_PATH
            -- technical recursive columns
            , 1 AS ITERATION_LEVEL
            , ARRAY_CONSTRUCT(HUB.HUB_ID) AS HUBS_IN_PATH
            /*THESE COLUMNS ARE JUST TO HELP VISUALIZE THE OUTPUT*/
            , HUB.HUB_LOCATION AS TRAVERSED_PATH_LOCATIONS
            , ARRAY_CONSTRUCT(HUB.HUB_LOCATION) AS HUB_LOCATIONS_IN_PATH
        FROM
            RECURSIVE_CTE.RECURSIVE_CTE_DATA.UC3_ROUTING_HUBS HUB
        WHERE
            HUB.HUB_LOCATION = 'Lebanon'

        UNION ALL

        -- Recursive Member: Get next reachable hubs not yet visited

        SELECT
            SOURCE_HUB.SOURCE_HUB_ID
            , SOURCE_HUB.SOURCE_HUB_LOCATION
            , HUB_LINK.TARGET_HUB_ID AS DESTINATION_HUB_ID
            , TARGET_HUBS.HUB_LOCATION AS DESTINATION_HUB_LOCATION
            , SOURCE_HUB.PATH_LENGTH_KM + HUB_LINK.PATH_LENGTH_KM AS PATH_LENGTH_KM
            , SOURCE_HUB.NUMBER_OF_HOPS + 1 AS NUMBER_OF_HOPS
            , SOURCE_HUB.TRAVERSED_PATH || ' -> ' || HUB_LINK.TARGET_HUB_ID AS TRAVERSED_PATH
            , SOURCE_HUB.ITERATION_LEVEL + 1 AS ITERATION_LEVEL
            , ARRAY_APPEND(SOURCE_HUB.HUBS_IN_PATH, HUB_LINK.TARGET_HUB_ID) AS HUBS_IN_PATH
            /*THESE COLUMNS ARE JUST TO HELP VISUALIZE THE OUTPUT*/
            , SOURCE_HUB.TRAVERSED_PATH_LOCATIONS || ' -> ' || TARGET_HUBS.HUB_LOCATION AS TRAVERSED_PATH_LOCATIONS
            , ARRAY_APPEND(SOURCE_HUB.HUB_LOCATIONS_IN_PATH, TARGET_HUBS.HUB_LOCATION) AS HUB_LOCATIONS_IN_PATH
        FROM
            ALL_PATHS_TO_DESTINATION AS SOURCE_HUB
            INNER JOIN RECURSIVE_CTE.RECURSIVE_CTE_DATA.UC3_HUB_LINKS HUB_LINK -- get the reachable hubs from current hub
                ON HUB_LINK.SOURCE_HUB_ID = SOURCE_HUB.DESTINATION_HUB_ID
            INNER JOIN RECURSIVE_CTE.RECURSIVE_CTE_DATA.UC3_ROUTING_HUBS TARGET_HUBS -- get destination hub details
                ON TARGET_HUBS.HUB_ID = HUB_LINK.TARGET_HUB_ID
        WHERE
            /*
                In situations like this use case it is very important to define robust stop conditions since rows can be generated exponentially 

                Assuming 40 hubs, with approx. 6 outbound connections per hub, we can arrive to millions of records just on iteration 10
            */
            -- Reasonable hop limit
            SOURCE_HUB.NUMBER_OF_HOPS < 30 
            -- Prevent cycles in path
            AND NOT ARRAY_CONTAINS(HUB_LINK.TARGET_HUB_ID, SOURCE_HUB.HUBS_IN_PATH)
            -- Reasonable Distance threshold
            AND SOURCE_HUB.PATH_LENGTH_KM + HUB_LINK.PATH_LENGTH_KM < 40000  
    )
SELECT 
    SOURCE_HUB_LOCATION
    , DESTINATION_HUB_LOCATION
    , PATH_LENGTH_KM
    , NUMBER_OF_HOPS
    , TRAVERSED_PATH_LOCATIONS
FROM ALL_PATHS_TO_DESTINATION
QUALIFY ROW_NUMBER() OVER (
                -- For every calculated source-destination path
                PARTITION BY SOURCE_HUB_ID, DESTINATION_HUB_ID
                --Take the shortest path in terms of KMs
                ORDER BY PATH_LENGTH_KM ASC
            ) = 1
ORDER BY SOURCE_HUB_ID, DESTINATION_HUB_ID, NUMBER_OF_HOPS, PATH_LENGTH_KM

πŸ“Š Results

The final result:

SOURCE_HUB_LOCATIONDESTINATION_HUB_LOCATIONPATH_LENGTH_KMNUMBER_OF_HOPSTRAVERSED_PATH_LOCATIONS
LebanonLebanon00Lebanon
LebanonItaly22001Lebanon -> Italy
LebanonFrance31001Lebanon -> France
LebanonGermany34002Lebanon -> Italy -> Germany
LebanonPortugal40002Lebanon -> Italy -> Portugal
LebanonUSA109003Lebanon -> Italy -> Germany -> USA

Conclusion

With this article we conclude the SQL Recursion series. We explored three use cases that cover SQL recursion conceptsβ€”from foundational principles to advanced techniques.

Now it’s your turn to practice and master this powerful skill, elevating your data manipulation and analysis capabilities. Feel free to check out the repository for additional datasets and get creative with the insights you can uncover!


Want to learn more about this topic? Check out the full series here: datainbites.hashnode.dev/series/mastering-sql-recursion.

0
Subscribe to my newsletter

Read articles from Elie Fayad directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Elie Fayad
Elie Fayad

I'm a data professional specializing in SQL and Snowflake, with a strong background in cloud migrations, data platform configuration, ETL/ELT pipeline development, data modeling, and workflow orchestration. I'm proactive, eager to learn, and passionate about tackling new challenges. I enjoy exploring emerging technologies and sharing knowledge with the community!