Getting the Available Ranges from Two Tables — From SQL to SPL #34

esProcesProc
3 min read

Problem Description & Analysis:

There are two tables in a certain database. The original inventory table data_add stores multiple batches of inventory for multiple items. Each batch of inventory has a starting number START_NUM and an ending number END_NUM, representing the range of consecutive numbers.

The consumption table data_cons stores multiple batches of consumption for multiple items, each batch of consumption also has a continuous number range.

Task: Now we need to calculate the current inventory of each item, which is the original inventory minus consumption. Note that each batch is calculated separately, and the two consecutive batches are also calculated separately, such as the second and third records; The continuous numbering range of the original inventory may be consumed into discontinuous multi segment numbering, in which case multiple records need to be generated, with each record corresponding to a continuous numbering segment.

Code Comparisons:

SQL

with merged_cons(item_id, start_num, end_num) AS (
    SELECT * FROM data_cons
    MATCH_RECOGNIZE (
        PARTITION BY item_id
        ORDER BY start_num, end_num
        MEASURES FIRST(start_num) AS start_num, LAST(end_num) AS end_num
        PATTERN( merged* strt )
        DEFINE
            merged AS MAX(end_num) + 1 >= NEXT(start_num)
    )
)
, intersections(id, item_id, start_before, end_before, start_after, end_after) AS (
    SELECT a.id, a.item_id, /*a.start_num AS add_start, a.end_num AS add_end, 
        b.start_num AS cons_start, b.end_num AS cons_end, */
        CASE WHEN a.start_num < b.start_num - 1 THEN a.start_num END AS start_before,
        CASE WHEN a.start_num < b.start_num - 1 THEN b.start_num - 1 END AS end_before,
        CASE WHEN b.end_num + 1 < a.end_num THEN b.end_num + 1 END AS start_after,
        CASE WHEN b.end_num + 1 < a.end_num THEN a.end_num END AS end_after
    FROM data_add a
    JOIN merged_cons b
        ON a.item_id = b.item_id AND LEAST(a.end_num, b.end_num) >= GREATEST(a.start_num, b.start_num)
)
SELECT item_id, start_before as start_num, end_before as end_num
FROM intersections WHERE start_before IS NOT NULL
UNION ALL
SELECT item_id, start_after as start_num, end_after as end_num
FROM intersections WHERE start_after IS NOT NULL
UNION ALL
SELECT item_id, start_num, end_num
FROM data_add d
WHERE NOT EXISTS(SELECT 1 FROM intersections i WHERE i.id = d.id);

SQL requires the use of MATCH_RECOGNIZE statements and multiple subqueries to indirectly implement set calculation, which is complex in code and difficult to understand.

SPL: SPL can use variables to represent sets and directly perform set related calculations:

🧩 Try.DEMO

A1-B1: Load data.

A2: Group the original inventory by item and convert each batch within the group into a continuous sequence small set, without aggregation. B2: Perform similar calculation to the consumption table and merge multiple small sets within the group into one large set. ~ indicates the current group or current member.

A3: Left join.

A4: Add calculated column diff: Calculate the difference set between each small set of the original inventory and the large set of the consumption table, group each difference set by condition, and assign the consecutive sequence to the same group. The function group is used for grouping, and by default is equivalence grouping, @i represents grouping by condition, and ~[-1] represents the previous member. Function merge@d merges ordered sets and calculate the difference set.

A5: Generate a record for each group in the diff field of each A4 record. The function news can expand a record into multiple records. ~.m(-1) represents the last member in the sequence ~.

✅👉🏻 Get Started with esProc SPL — esProc SPL FREE Download.

10
Subscribe to my newsletter

Read articles from esProc directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

esProc
esProc

esProc SPL is a JVM-based programming language designed for structured data computation, serving as both a data analysis tool and an embedded computing engine. FREE download👉🏻: https://www.esproc.com/download-esproc