The Great Migration - Snowflake to Fabric


Making Sense of the Migration
Snowflake has been around for over a decade and is a proven powerhouse when it comes to warehousing at scale. To say it is widely adopted would be a gross understatement. This begs the question: why are organizations considering switching to a platform that only became generally available in late 2023? What capabilities does Fabric bring to the table that make it worth the hassle of migration?
Cost
For all its tenure as a best-in-class warehousing platform, Snowflake has a persistent issue: It’s expensive, and its capabilities are no longer unique enough to justify the cost. Fabric simply brings more raw compute power for significantly less. Drawing direct parallels between the platforms’ respective cost models is not simple, as both abstract underlying hardware from the user. We can approximate via load testing and compute usage monitoring, but that deserves its own article altogether. To give a sense of scale, consider the following costs of general compute (capacities in Fabric and virtual warehouses in Snowflake)
Fabric capacity cost by SKU (per hour) | Pay-as-you-go | Reservation |
F2 | $ 0.36 | $ 0.215 |
F64 | $ 11.52 | $ 6.583 |
F2048 | $ 368.64 | $ 219.295 |
Snowflake compute cost (per hour) | Standard | Enterprise | Business Critical |
X-Small VW | $ 2.00 | $ 3.00 | $ 4.00 |
X-Large VW | $ 32.00 | $ 48.00 | $ 64.00 |
6X-Large VW | $ 1,024.00 | $ 1,536.00 | $ 2,048.00 |
See Fabric Pricing Page and Snowflake Consumption Table for up-to-date pricing metrics.
A few points to keep in mind as you look at these numbers: first, the compute tiers displayed do not necessarily have a 1:1 mapping in terms of processing power. Rather they represent the smallest, largest, and default/standard size of each platform’s respective general compute instances. Second, some of Snowflake’s security features are only available in more expensive editions. For example, row and column level security require the Enterprise edition and above. Private endpoints and PHI regulation compliance are locked behind the Business Critical edition. Since Fabric always provides these features, it makes sense to look at the Business Critical edition for price comparison purposes. Lastly, the Snowflake prices shown are based on an advertised standard pricing, but the actual cost per credit may vary across region and cloud provider.
Without getting into extensive calculations comparing a Fabric capacity unit to a Snowflake credit, the general trend is that Snowflake compute is significantly more expensive, especially for long-running workloads. Snowflake has the advantage of offering more granular control over compute provisioning as well as easily leveraged autoscaling capabilities, but these features are not enough to offset the noticeably higher cost per hour of active compute. For always-on compute, we’re looking at Snowflake being up to 10x more expensive than Fabric. I was in disbelief when I first discovered this particular metric, so I would imagine plenty of skepticism for those seeing it for the first time. The best way to validate the 10x metric for your use case is to spin up a trial fabric instance and see how many capacity units it takes to run equivalents to your Snowflake workloads.
Unified Ecosystem
Another line of reasoning I often hear when decision-makers talk about migrating to Fabric is better integration with the Microsoft ecosystem. Cloud agnosticism is great and all, but so is first-class integration with M365, Azure, Entra, and Purview. Not to mention Power BI, which is as much a part of Fabric as Real Time Intelligence and Data Science. Leveraging a large library of external tools grants Snowflake enhanced flexibility, but also quickly increases design and maintenance complexity. Fabric aims to simplify the dev experience to a single pane of glass, where business and IT can finally look at the same picture in harmony, instead of arguing whether their data is black and blue or white and gold like a dress from 2015.
The Great Feature Race
Snowflake and Fabric (and Databricks, while we’re at it) are each distinct and have fundamental differences, but their purposes as platforms are largely the same. Both are evolving rapidly and constantly adding new features. It is highly likely that some of the features I mention in this blog which are currently unique to one platform will become available or replicated in the other platform a couple months down the road. Because of this dynamic, there is no timeless answer to the question “what can Fabric do that Snowflake can’t” or vice versa. That said, at the time of writing these, Fabric does have the advantage of having first-class access to the entire Microsoft ecosystem. Integrations with Azure AI Foundry and innovations like Direct Lake that rely on separate engines working intimately together are something that Fabric will always have a head start on.
Migrating Snowflake Data Workloads
Let’s say that you’re convinced; Fabric’s shiny new features and promises of glorious cost savings have you hooked. How does one go from a SQL-centric code-first platform with potentially dozens of external tool integrations to ‘just Fabric’?
ETL
The good news about Fabric’s ETL tools is that they have a wide variety of connectors beyond the Microsoft ecosystem. In terms of connectivity and transformation capabilities, there shouldn’t be any gaps to fill. The elephant in the room is the mountain of Snowflake SQL code used to define your pipelines, transformations, views, tables, etc. that needs to be converted over to Fabric items. Luckily, most of your SQL engineers won’t have to learn PySpark to make this happen. Any basic transformations and data movement can be accomplished with no-code workflows in Fabric data pipelines and Dataflows Gen2. You may be able to salvage some of your SQL code by converting it to T-SQL, but keep in mind that Snowflake SQL has capabilities that don’t have an easy translation into T-SQL.
Another thing to keep in mind is that not all of the external tools you’re using with Snowflake will necessarily plug neatly into Fabric. Fabric is rapidly evolving and actively improving support for tools like Fivetran, dbt, and Airflow, but many of these features are still in preview as of writing this. Carefully evaluate which tools you need to keep, and which can be replaced with a Fabric item.
Lastly, if you are not currently using Power BI, I strongly recommend doing so. You can connect Fabric datasets to other BI tools such as Tableau, but the platform itself is built and optimized around Power BI.
Storage
Snowflake and Fabric’s basic storage costs are very similar in practice, and Fabric’s shortcut and mirroring capabilities make integration relatively painless. That is to say, moving all your data from Snowflake to Fabric is not a necessity. The question then becomes “which data should I move to Fabric, which data should I mirror, and which data should I create a shortcut to?“ First, let’s establish a basic understanding of what each option entails.
If you want to completely shift your data estate from Snowflake to Fabric, then use Fabric’s copy activity to move existing data. Snowflake is supported as both a source and destination, so beyond establishing a connection between the platforms, there is minimal overhead and cost.
Mirroring in Fabric allows you to keep an up-to-date copy of Snowflake (or other external sources) data. If you plan on frequently querying the data, this will be the more cost-effective option compared to shortcuts. This is because mirroring data only requires one full read on instantiation and afterwards listens for incremental changes. Fabric also provides you with free TBs of mirrored data equal to the number of Capacity Units provisioned in the corresponding capacity. For example, an F64 capacity would provide 64 TB of free mirrored data. Note that Snowflake will incur cost both for the initial read and incremental updates.
In contrast, shortcuts do not create a copy of your Snowflake data; rather they serve as a pointer to its location. Shortcuts are the way to go for performing infrequent or ad-hoc queries because they read source data only when needed. As soon as you start using shortcuts in regularly running pipelines, however, they become significantly more expensive than mirroring due to the increase in read requests against Snowflake.
The takeaway is that you can leverage Fabric’s data processing and analytics capabilities without dumping all your existing data into a new location.
High-Level Considerations
Differences in Compute Management and Billing
An important distinction between the two platforms is how they handle compute billing. You may have noticed that Fabric offers the choice of pay-as-you-go or reservation for its capacities. It is important to understand that Fabric’s idea of pay-as-you-go is fundamentally different from that of Snowflake’s pay-as-you-use style billing. My previous blog post (linked here) goes into more detail, but essentially you will always pay the full price listed regardless of how much Fabric compute you actually use. You can turn off capacities manually, but there is no concept of a compute instance that only bills when it is being used to run something (there is currently a preview feature, but it is limited to Spark compute only). Furthermore, Fabric’s compute allocation is significantly less granular than Snowflake’s. While you won’t have to individually provision compute nodes for specific jobs and workflows, you will also be sacrificing some control over compute resources if you make the migration.
Governance
Depending on your industry, this may be the most daunting part of the migration process. Switching to Fabric essentially means adopting Purview as your primary tool for data governance. This comes with yet another fundamental shift in how you develop your high-level strategy. Snowflake’s governance (assuming you are using Snowflake-native tools instead of external governance software) is largely declarative through data access policies and masking via SQL code. Similar to how you handle compute nodes, you get significantly more granular control how data is actually masked and protected. Fabric’s (or more precisely, Purview’s) approach is heavily label-driven. Purview can scan your data for PHI and the like and automatically apply sensitivity labels. Of course, you also get the standard security features such as RBAC, CLS, RLS, etc. but the important thing to understand is that your Snowflake policies will NOT translate over to Purview. Lastly, if you are not using Entra (formerly Azure AD), you will likely need to adopt it before you migrate to Fabric, since Purview’s RBAC capabilities are based on Entra groups and roles.
Conclusion
Moving from Snowflake to Fabric is by no means a small undertaking. Fabric boasts some clear advantages: lower costs, tighter Microsoft integration, and a more unified experience for IT and business teams. That said, it is still growing as a platform and doesn’t have the tried-and-true decade-long reputation of Snowflake. Migrating does not just mean a technical transition, but an architectural and strategic shift as well. Trying to create a Fabric solution by mapping each of your Snowflake resources 1:1 against their closest Fabric equivalents will almost certainly end in headache for everyone involved. Think it through, plan deliberately, and make your move. Just know that personally, I think the Fabric bandwagon is well justified.
Subscribe to my newsletter
Read articles from Declan Morris directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
