Partition and cluster an existing BigQuery table
Sometimes it so happens that we create or are using a table with data that is non-partitioned but we need to convert this into a partitioned table. A typical use-case is old tables that start accumulate data over time. Quite often, we need the same data with extra partitions (and/or clusters), and the rest we can use DDL commands (such as updating metadata).
I’ve created a simple helper script that comes in handy for just that: https://github.com/raghuveer-s/example-code/blob/main/partition-bq-table/partition-bq-table.sql
The script does try to preserve some table and column metadata before partitioning, cluster, and copying the data into a new table. For the most part, you should be able to just change the variable names and use it.
Some final points to consider:
There are some assumptions baked in the script, such as the existence of
common
dataset where it creates the stored procedures needed.You probably want to change the region (I have used
region-eu
in the code). The script makes use of theINFORMATION_SCHEMA
and this needs a qualifier (https://cloud.google.com/bigquery/docs/information-schema-intro#syntax). Note for my future self: This really should be parameterized.If you want to preserve the same table name for the partitioned table, add a couple of commands for changing table name with
ALTER TABLE RENAME
followed byDROP TABLE
of the old table. Note that, changing a table name does come with some limitations in BigQuery (https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#alter_table_rename_to_statement).
Subscribe to my newsletter
Read articles from Raghuveer Sriraman directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Raghuveer Sriraman
Raghuveer Sriraman
Working mostly on Data Engineering, Machine Learning and Bayesian analytics.