Computing a hash aggregation in BigQuery
So I've seen Snowflake has an HASH_AGG function. When would we need it?
Every time we'd like to work out if ANY value in a group (or the entire table) has changed in any way, even a single extra blank space.
While BigQuery does not have it yet, we can still simulate it using the tools at hand.
Here's how we can do it:
- TO_JSON_STRING to create a STRUCT from each entire row (or create a STRUCT containing only the columns you care about)
- STRING_AGG to aggregate all the json strings into a single value per group (or the entire table)
- FARM_FINGERPRINT, a hashing function that will product the same output given only the exact same input, check my comment for more info
Found it useful? Subscribe to my Analytics newsletter at notjustsql.com.
Subscribe to my newsletter
Read articles from Constantin Lungu directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Constantin Lungu
Constantin Lungu
Senior Data Engineer • Contractor / Freelancer • GCP & AWS Certified