Compacting date intervals in BigQuery


Here's a practical BigQuery SQL exercise that highlights some important concepts as well is an interesting algorithm imho. I've pair programmed this with LLMs, if that's a thing 😎
Problem statement: compacting a SCD-2 table, essentially finding intervals that can be safely merged, turning two adjacent intervals with the same data into a single, bigger interval.
This particular input data guarantees these intervals cannot overlap (at the same grain), but there can be gaps. We're also talking about [left-inclusive, right-exclusive) intervals.
Here's a breakdown of how it all works:
1️⃣ we're starting by computing a hash of all the column of interest, excluding the grain (in my example: flag_a, flag_b)
2️⃣ then we use LAG() over grain window to detect whether the current row starts right after the previous one and if the hashes (so the 'payload' of the two rows) match
3️⃣ we mark the start of a new "segment" when either:
- attributes have changed (flags differ), so hash being different
- intervals are not adjacent (there are gaps)
4️⃣ use a cumulative SUM() over grain window to group rows into segment IDs
5️⃣ collapse each segment using MIN(valid_from) and MAX(valid_to)
We can now see that in our example that several intervals were merged into bigger ones.
Found it useful? Subscribe to my Analytics newsletter at notjustsql.com.
Subscribe to my newsletter
Read articles from Constantin Lungu directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Constantin Lungu
Constantin Lungu
Senior Data Engineer • Contractor / Freelancer • GCP & AWS Certified