Upset with Fivetran Scheduler Offsets

A few months ago, I finally had the opportunity to use Fivetran in a real-world project. To provide some background, this was a greenfield project involving a financial system, tooling and architecture had been selected before my arrival. The tech stack included Databricks for the Data Lake (or Delta Lake), with Fivetran chosen as the data acquisition tool. The company was using Control-M as its scheduler, but no one really wanted to use that. For this specific task, it was decided to utilize the Fivetran Scheduler to trigger the data acquisition processes.
My Mission (or so I thought)
I was brought into this project to work on Databricks components. Despite this, I was eager to see Fivetran in action. Architect colleagues I’ve collaborated with for DMS often touted how much easier Fivetran will make such acquisition or migration tasks.
I diligently explored how to orchestrate the data acquisition with Databricks workflows. As expected, Databricks didn’t have direct connectivity to Fivetran. This is a common security practice. It prevents data engineers from being tempted to create code to directly access the Fivetran APIs, potentially leading to messy implementations.
I expressed concerns that orchestration should employ the preferred orchestration tool, despite it being considered outdated technology. Unfortunately, my concerns were ignored. With my well-defined JIRA card languishing without any progress, I decided to focus and proceed with the work regardless.
With no API connectivity, I began examining Fivetran logs, which were being synced by Fivetran to Databricks. BTW, Fivetran doesn’t provide logs on its web console, and the way a customer can access logs is by syncing log data to a destination using a Fivetran connector. (This novel topic might be a future blog.)
The Fivetran log connector had a 5-15 minute delay which was acceptable for now. SLA's will be dealt with much later on.
Initial Signs
We had a job scheduled at 4 p.m. From the logs, I noticed syncs were only starting around 4:11 p.m. This seemed odd, but I didn't think much of it at the time, as it was still early in the project and nothing was stable yet.
After a few more weeks, we were preparing to promote the code to the next higher environment. We deployed the same code (Fivetran was deployed using Terraform, by the way) for a 4 p.m. sync. We waited, but by 4:15, the job hadn't started. By 4:30, still nothing. 4:45, nothing. Finally, the job only started at 4:54 p.m.
RTFM
Digging through the Fivetran documentation, I stumbled upon this warning:
https://fivetran.com/docs/core-concepts/syncoverview
When you add a new destination, Fivetran assigns it a fixed time offset. The offset can be any random value in minutes ranging from 0 to 60. It is derived from the destination ID hash. This offset is shared by every connector in the destination. The offset value remains the same regardless of the set sync frequency.
Let that sink in: a random 0-60 minute offset. And it can’t be modified, even through a support request (trust me, I tried).
Denouement
This discovery threw a wrench in our plans to avoid using Control-M. After losing a moderate amount of effort on creating a log-based solution, we pivoted to developing a simple script to be triggered by Control-M. In terms of code, simpler solution. However, in terms of infrastructure, it was a minor nightmare. We had to sort out an agent for Control-M, open up network connectivity, create a Fivetran service user to trigger the API calls, and manage secrets, among other tasks.
Things I learned
Avoid the fivetran scheduler like the plague, the API is the way to go to trigger fivetran syncs. You won’t be hit by an unreasonably high offset. And you’ll have more options to handle reschedules or errors on syncs. I’ll rant about the Fivetran Scheduler retry logic in another time.
Embrace the orchestration tool as your ally. I still do not like Control-M. While Control-M may not be my preferred choice, I recognize the robust infrastructure this organization has established around it, including change management processes, 24/7 support teams, alerts, and more. It does make me think of what hurdles needs to be scaled to get something like Airflow productionized.
Subscribe to my newsletter
Read articles from Kurdapyo Data Engineer directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Kurdapyo Data Engineer
Kurdapyo Data Engineer
I’m the kuya at Kurdapyo Labs — a recovering Oracle developer who saw the light and helped migrate legacy systems out of Oracle (and saved a lot of money doing it). I used to write PL/SQL, Perl, ksh, Bash, and all kinds of hand-crafted ETL. These days, I wrestle with PySpark, Airflow, Terraform, and YAML that refuses to cooperate. I’ve been around long enough to know when things were harder… and when they were actually better. This blog is where I write (and occasionally rant) about modern data tools — especially the ones marketed as “no-code” that promise simplicity, but still break in production anyway. Disclaimer: These are my thoughts—100% my own, not my employer’s, my client’s, or that one loud guy on tech Twitter. I’m just sharing what I’ve learned (and unlearned) along the way. No promises, no warranties—just real talk, some opinions, and the occasional coffee/beer-fueled rant. If something here helps you out, awesome! If you think I’ve missed something or want to share your own take, I’d love to hear from you. Let’s learn from each other.