Computing a hash aggregation in BigQuery

So I've seen Snowflake has an HASH_AGG function. When would we need it?

Every time we'd like to work out if ANY value in a group (or the entire table) has changed in any way, even a single extra blank space.

While BigQuery does not have it yet, we can still simulate it using the tools at hand.

Here's how we can do it:
- TO_JSON_STRING to create a STRUCT from each entire row (or create a STRUCT containing only the columns you care about)
- STRING_AGG to aggregate all the json strings into a single value per group (or the entire table)
- FARM_FINGERPRINT, a hashing function that will product the same output given only the exact same input, check my comment for more info

No alt text provided for this image

Found it useful? Subscribe to my Analytics newsletter at notjustsql.com.

0
Subscribe to my newsletter

Read articles from Constantin Lungu directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Constantin Lungu
Constantin Lungu

Senior Data Engineer • Contractor / Freelancer • GCP & AWS Certified