In a previous post, I mentioned Remote Functions—a powerful way to send data from BigQuery to an external service for processing, including a Cloud Run function. This is especially useful when SQL lacks built-in support for your specific needs, and w...
Ever needed to track what changed in a table and when? In data engineering, this is known as Change Data Capture (CDC)—a fundamental challenge when dealing with evolving datasets. Now, the Change History features in BigQuery sound pretty interesting....
When we talk about functions in BigQuery, we're referring to several distinct capabilities. Beyond the standard built-in functions like CURRENT_TIMESTAMP() or LENGTH(), BigQuery helps users to define custom functions that extend SQL capabilities. The...
In yesterday’s post, we looked at retrieving information from a table by joining it multiple times—each with different join criteria. This raises a natural question: are there better alternatives to this approach? I initially experimented with a CASE...
Ever had a random piece of knowledge from school suddenly click in a real-world scenario? It felt like that for me remembering about ROLLUP a few days ago. I wrote about GROUP BY ROLLUP roughly 1.5 years ago—one of my first posts here. At the time, i...
A few days ago I thought that the following SQL query would not work— I expected the window function result would be summed multiple times. 🚨 Turns out, I was wrong. This was a great reminder of why understanding SQL’s order of execution is crucial!...
Here’s a BigQuery trick I use all the time—seriously, not saying this to make my post catchier 😁. It’s not flashy or very complicated, but it’s one of my favorites: Job History. Under the Query Editor, you’ll find Job History, which stores all p...
I’ve come across this SQL transformation multiple times, and it’s an interesting two-way problem. 1️⃣ From columns to rows (ARRAY as UNPIVOT):We start with separate timestamps for different lifecycle events. To analyze events dynamically, we reshape ...
As I wrap up Module 3 of the Data Engineering Zoomcamp, I want to share my experience working with Google BigQuery and the valuable insights I gained while analyzing the NYC Yellow Taxi dataset. This module has fundamentally changed how I think about...
If you’re like me, you probably use QUALIFY + ROW_NUMBER() almost daily for deduplication or finding the first/last occurrence of something. It’s a powerful combo in modern SQL ! But here’s the catch: there are subtle nuances and edge cases that can ...