Schema changes can be a significant challenge for many ETL pipelines. In the past, changes were more manageable because systems operated independently. However, with the rise of SaaS products, out-of-the-box solutions, and frequent release cycles, change has become a constant, and occasional schema changes must be addressed.

Setting the Scene

In our scenario, our data source is a Snowflake database managed by a SaaS vendor. We decided to use Fivetran to sync data from Snowflake into our target database. The SaaS vendor follows a weekly release cycle for minor updates and a monthly cadence for major releases, with changes occurring in different environments on varying dates.

We set up Fivetran replication from this Snowflake source. Our connector is configured to sync a substantial number of tables, and we have been syncing daily for several days without issues.

Cue in the Schema Change

One day, our sync did not complete successfully. We received a status of “Rescheduled” with the reason “Unsupported schema change requires table resync: ADD_COLUMN” for one of the tables.

As we were still new to analyzing the Fivetran logs (not available on the Fivetran Console), we were unsure of what had happened. We asked some questions within the team, and while it was a learning experience, it was challenging at the time. Here are the questions we faced:

Question 1: What column was added?

This information was not readily available. However, by querying the Fivetran logs for alter_table events, we could identify the table and the column that was added.

SELECT   *
FROM     <destination_db>.<destination_schema>.log
WHERE    conector_id = :connector_id
AND      message_event = 'alter_table'
ORDER BY time_stamp DESC;

Question 2: What happened to the sync of the other tables?

We were initially unaware of the status of the other tables. The Fivetran Console did not provide this information. However, we later learned that the other tables had indeed been synced. This information can also be queried from the logs.

SELECT   message_data:schema AS "schema",
         message_data:table AS "table",
         message_data:count
FROM     <destination_db>.<destination_schema>.log
WHERE    conector_id = :connector_id
AND      message_event = 'records_modified'
ORDER BY time_stamp DESC;

Question 3: Should we just rerun the sync?

If we had relied on the Fivetran scheduler, it would have retried the sync automatically without any intervention from us. So, we decided to rerun the sync.

What we now understand is that Fivetran, on that first sync that ended in a reschedule, had already altered the table with the schema change. At this point, it simply required an acknowledgment to proceed with a full sync for that table.

This realization was surprising. I initially thought that Fivetran would detect the schema change and, upon rerunning the sync, would then alter the table and perform the sync. However, that was not the case.

This issue is somewhat unique to us because we do not own the source. But imagine this happening to a table of multi-terabyte size. Attempting to resync could take days or even weeks. And there was no way of getting around it.

RTFM

The documentation clarified the issue further and the behavior we saw.

https://fivetran.com/docs/connectors/databases/snowflake#automatictableresyncs

Automatic table re-syncs

For tables with a primary key, we support ADD COLUMN DDL operations with null or default static values. All other table or column alterations trigger an automatic table re-sync. For tables without a primary key, any DDL operation triggers an automatic table re-sync.

Apparently the fault was ours. Though we can’t really do anything about it as the snowflake source is vendor owned.

What to do now

Sad to say, not much could be done.

We educated support about this and watch out and just prepare communications on these occurences.

We raised the issue to fivetran and made a feature request.

Now we wait.

📧

We'd love to hear from you! Share your experiences with schema changes in ETL processes in the comments below. Have questions or feedback on the strategies discussed? Don't hesitate to reach out.

Fivetran Schema Change Handling