Aggregating Multiple SCD-2 Attribute Timelines in BigQuery

Here’s another practical BigQuery SQL exercise 💡

Say you have an input SCD2-style table with [valid_from, valid_to) and key-value attributes. Now you want to determine which attributes were valid at the same time for a given grain (e.g. id).

To do this, we:

1️⃣ Build an anchor table of all change points (start and end dates), grouped by id.

2️⃣ Generate date ranges using LEAD() over the change points, so we know the next boundary.

3️⃣ Join back to the original table to find which rows were active within each [valid_from, valid_to) segment.

4️⃣ Aggregate the key-value pairs as ARRAY<STRUCT<key, value>> to preserve temporal context.

We can now see that, for example, for the period between [2023-01-05, 2023-01-08), for id = 2, B was true and A was false.

table

Found it useful? Subscribe to my Analytics newsletter at notjustsql.com.

0
Subscribe to my newsletter

Read articles from Constantin Lungu directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Constantin Lungu
Constantin Lungu

Senior Data Engineer • Contractor / Freelancer • GCP & AWS Certified