Calculate the Number of Intersections between Adjacent Subsets after Grouping — From SQL to SPL #14


Problem description & analysis:
A certain database table records the execution status of a project, with multiple people participating in the project every day, and one person can participate in multiple tasks of the project in one day.
Task: Now we need to calculate: how many people have also participated in the project the previous day for every day. That is, after grouping by date, calculate the number of intersections between daily and previous day’s personnel. The first day is special, assuming that everyone participated in the project the previous day.
Code comparisons:
We can first group by date, and then perform intersection operations on the grouped subsets, which requires retaining the subsets after grouping. But after SQL grouping, it must aggregate immediately, and subsets cannot be retained, and calculating intersection is also impossible. It needs to do it another way, to group by person first and determine whether each person appeared on a certain date and the previous day, and then group and aggregate these dates, it involves multiple layers of nesting and window functions, which is very troublesome.
After grouping in SPL, subsets can be retained and adjacent subsets can be referenced. Code can be written directly according to the idea.
A1: Load data from the database and deduplicate EMP-ID.
A2: Group by date, but do not aggregate.
A3: Create a new two-dimensional table based on the grouping results. If the current group is Group 1, directly return the number of members in the group; If it is not the first group, then calculate the intersection of the EMP-IDs of the current group and the previous group, and then calculate the number of members. ^ is used for calculating intersection, [-1] represents the previous group.
esProc SPL is FREE to download: esProc SPL FREE Download.
Subscribe to my newsletter
Read articles from esProc directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

esProc
esProc
esProc SPL is a JVM-based programming language designed for structured data computation, serving as both a data analysis tool and an embedded computing engine. FREE download👉🏻: https://www.esproc.com/download-esproc