A Gentle Introduction to Snowflake

Sriram KrishnanSriram Krishnan
3 min read

Until now, we’ve been primarily focused on dbt—writing models, macros, seeds, snapshots, and tests. But behind every dbt run, there needs to be an actual compute engine that executes the SQL we've written. In our case, that engine is Snowflake.

Snowflake is a modern, cloud-native data warehouse that’s built for scale, speed, and simplicity. It’s designed from the ground up to support large volumes of analytical queries while abstracting away the operational burden typically associated with traditional warehouses. Unlike conventional databases, Snowflake separates compute from storage, allowing both to scale independently and making it ideal for the kind of modular, on-demand workloads dbt produces.

One of Snowflake’s core strengths is its compute engine, known as a warehouse. A warehouse in Snowflake is not where your data lives—it’s the engine that runs your SQL. Every time you run a dbt model, Snowflake automatically spins up a virtual warehouse (if it's not already running), executes the SQL, and then suspends the warehouse when it's idle. This behavior makes it extremely cost-effective, since you're only billed for the time queries are actually running. It also means multiple teams can share the same data without interfering with each other’s performance, because each team can use its own compute cluster.

Storage in Snowflake, on the other hand, is fully managed. You don’t need to worry about tuning, vacuuming, or indexing. All data is stored in a compressed, columnar format behind the scenes, and Snowflake handles the rest. Whether you have gigabytes or terabytes of historical data, you're only charged for storage and the compute required to access it. This makes Snowflake an ideal backend for dbt: you get enterprise-grade scalability without the overhead of managing infrastructure.

Understanding how dbt uses Snowflake is key to building more efficient data models. It gives you insight into how queries are executed, how resources are consumed, and where you might optimize for performance and cost as your project grows.


What Happens When You Run dbt run

Let’s say you’ve written a model:

-- models/facts/fct_revenue.sql
select
  customer_id,
  sum(amount_usd) as mrr_usd
from {{ ref('stg_stripe_payments') }}
group by 1

When you run:

dbt run --select fct_revenue

Here’s what actually happens:


1. dbt compiles your SQL

It replaces all macros and Jinja references like {{ ref('...') }} with fully qualified table names.

Compiled SQL Example:

-- target/compiled/fct_revenue.sql

select
  p.customer_id,
  date_trunc('month', p.payment_time) as revenue_month,
  sum(p.amount_usd * e.exchange_rate_to_usd) as mrr_usd,
  c.lifecycle_stage
from analytics.stg_stripe_payments as p
left join analytics.dim_customer as c
  on p.customer_id = c.customer_id
left join analytics.currency_exchange_rates as e
  on p.currency = e.currency
group by 1, 2, 4

This is the exact SQL that gets sent to Snowflake.


2. dbt connects to Snowflake

It uses your profiles.yml file to authenticate and find the right warehouse, database, schema, and role.

outputs:
  dev:
    type: snowflake
    account: xyz-123
    user: dbt_user
    password: "{{ env_var('DBT_PASSWORD') }}"
    role: dbt_role
    database: analytics
    warehouse: transform_wh
    schema: dbt_sriram

In this setup:

  • dbt builds the model in analytics.dbt_sriram

  • It runs the query using the transform_wh warehouse


3. Snowflake executes the SQL on the warehouse

Snowflake spins up the compute engine (warehouse), runs the compiled SQL, and writes the result as a table or view depending on your materialization.


4. dbt creates or replaces the model

If your model is materialized='table', dbt runs a CREATE OR REPLACE TABLE under the hood.

The final object appears in Snowflake under the schema you specified.


5. dbt saves the logs and metadata locally

In your local /target/ folder, dbt writes metadata, logs, and compiled files.

Sample run_results.json:

{
  "metadata": {
    "dbt_version": "1.6.0",
    "generated_at": "2025-07-03T10:00:00Z"
  },
  "results": [
    {
      "status": "success",
      "model": "fct_revenue",
      "execution_time": 2.1
    }
  ]
}

This helps you track which models ran, how long they took, and whether they passed tests.


Final Thoughts

This is your first official look at Snowflake in this series—not as a black box dbt connects to, but as the foundation of your analytics stack.

0
Subscribe to my newsletter

Read articles from Sriram Krishnan directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Sriram Krishnan
Sriram Krishnan