Creating Seeds & Snapshots in dbt – Use Cases and Tips

Sriram KrishnanSriram Krishnan
6 min read

In the previous blog, we modeled facts and dimensions to turn raw data into analytics-ready marts. But analytics isn’t only about what’s true now—it’s also about what was true then.

That’s where two underused but powerful dbt features come into play: Seeds and Snapshots.

In this chapter, we’ll explore:

  • What Seeds and Snapshots do

  • Real SaaS use cases using our Stripe + Segment + HubSpot data

  • How to track changing attributes like customer lifecycle

  • When to use dbt Snapshots vs. custom CDC logic

  • Updated folder structure for your dbt project


📁 Updated Project Structure

Here's how your project should now look:

/models
  /sources
    stripe.yml
    segment.yml
    hubspot.yml

  /staging
    stg_stripe_payments.sql
    stg_segment_events.sql
    stg_hubspot_contacts.sql

  /dimensions
    dim_customer.sql
    dim_lifecycle_stage.sql

  /facts
    fct_revenue.sql
    fct_user_engagement.sql

  /vault
    hub_customer.sql
    sat_customer_profile.sql
    link_customer_payment.sql

/seeds
  currency_exchange_rates.csv

/snapshots
  scd_customer_status.sql

Let’s look at what Seeds and Snapshots contribute to your transformation layer.


🌱 Seeds – Lightweight Reference Tables

What Are Seeds?

Seeds are CSV files stored in your repo and loaded into your warehouse via dbt. They’re perfect for:

  • Static mappings (like currency codes)

  • Business rules (like tier thresholds)

  • Manual reference tables

They are version-controlled alongside your models.


Use Case: Currency Normalization

Let’s say your SaaS app processes payments in multiple currencies. To analyze revenue in USD, you can store exchange rates in a CSV seed:

-- /seeds/currency_exchange_rates.csv
currency,exchange_rate_to_usd
USD,1.0
EUR,1.12
INR,0.012

You reference this in your revenue fact:

-- fct_revenue.sql
with exchange_rates as (
  select * from {{ ref('currency_exchange_rates') }}
)

select
  p.customer_id,
  date_trunc('month', p.payment_time) as revenue_month,
  sum(p.amount_usd * e.exchange_rate_to_usd) as mrr_usd,
  c.lifecycle_stage
from {{ ref('stg_stripe_payments') }} p
left join {{ ref('dim_customer') }} c
  on p.customer_id = c.customer_id
left join exchange_rates e
  on p.currency = e.currency
group by 1, 2, 4

Run it with:

dbt seed

What Happens:

  • dbt reads each .csv file in your /seeds folder

  • It creates or overwrites the corresponding table in your target schema

  • The table name is the same as the CSV file name (minus .csv)

Becomes:

<target_database>.<target_schema>.currency_exchange_rates

And yes — this table is fully queryable, joinable, and can be used just like any other model via {{ ref('currency_exchange_rates') }}.


🔁 How Do You Update Seeds?

Step 1: Edit the CSV

Update the contents of the file directly in:

/seeds/currency_exchange_rates.csv

For example, to change the INR conversion rate:

currency,exchange_rate_to_usd
USD,1.0
EUR,1.12
INR,0.013

Step 2: Re-run dbt seed

After editing, rerun:

dbt seed

dbt will:

  • Replace the existing table

  • Load the new contents from the CSV file

  • Log the number of rows loaded


Important Notes

  • Overwrites entire table — Seeds are not incremental. Every time you run dbt seed, it fully reloads the table from the CSV file.

  • No history is retained — If you need to track changes over time in a seed (e.g., exchange rate history), you'll need to model that separately using snapshots or versioned seed files.


🕰️ Snapshots – Tracking Historical Changes

What Are Snapshots?

While Seeds bring in fixed reference data, Snapshots bring in time.

dbt Snapshots let you track how a record changes over time by capturing point-in-time versions of rows from a source model. It’s a built-in way to implement Slowly Changing Dimensions (Type 2).


🔍 Do Snapshots Create a Table?

Yes—snapshots create a permanent table in your data warehouse.

  • They live in the schema you configure (usually snapshots)

  • They persist across dbt runs

  • dbt automatically adds tracking fields:
    dbt_valid_from, dbt_valid_to, and dbt_scd_id

-- Inside your snapshot config:
config(
  target_schema='snapshots'
)

This results in a table like:

your_database.snapshots.scd_customer_status

📦 How Do Snapshots Handle Updates?

dbt snapshots don’t update existing rows. Instead, they follow this flow:

Snapshot EventWhat Happens
First runAll source rows are inserted with valid_from
Field changes (e.g. lifecycle_stage)Old row is closed (valid_to is set), new row is inserted
No changesNothing happens
Record is deleted upstreamRow is closed, if invalidate_hard_deletes=True

🧠 Example: Tracking Customer Lifecycle Transitions

Imagine the stg_hubspot_contacts model returns this on Day 1:

| contact_id | email            | lifecycle_stage | signup_date |
|------------|------------------|------------------|-------------|
| C001       | user1@acme.com   | trial            | 2024-12-01  |

Day 10 – Customer becomes active:

| C001 | user1@acme.com | active | 2024-12-01 |

Day 30 – Customer churns:

| C001 | user1@acme.com | churned | 2024-12-01 |

✍️ Define the Snapshot File

-- /snapshots/scd_customer_status.sql

{% snapshot scd_customer_status %}

{{
  config(
    target_schema='snapshots',
    unique_key='contact_id',
    strategy='check',
    check_cols=['lifecycle_stage'],
    invalidate_hard_deletes=True
  )
}}

select
  contact_id,
  email,
  lifecycle_stage,
  signup_date,
  current_timestamp as snapshot_loaded_at
from {{ ref('stg_hubspot_contacts') }}

{% endsnapshot %}

Run this with:

dbt snapshot

📈 What Does the Snapshot Table Look Like?

| contact_id | lifecycle_stage | dbt_valid_from      | dbt_valid_to         |
|------------|------------------|----------------------|------------------------|
| C001       | trial            | 2025-07-01 08:00:00  | 2025-07-10 09:15:22    |
| C001       | active           | 2025-07-10 09:15:22  | 2025-07-30 07:03:45    |
| C001       | churned          | 2025-07-30 07:03:45  | NULL                   |

Each row represents the state of a record during a specific time window. You get full visibility into how a record evolved, without manually writing change-tracking logic.


🛑 What About Deletes?

If a record disappears from the source and you set:

invalidate_hard_deletes=True

dbt will close the current row by setting dbt_valid_to to the current time, without deleting history.

This ensures your analytics are still auditable and explainable—even if a system upstream performs hard deletes.


🔎 Querying Snapshot History

To get the full lifecycle for each customer:

select
  contact_id,
  lifecycle_stage,
  dbt_valid_from,
  dbt_valid_to
from {{ ref('scd_customer_status') }}
order by contact_id, dbt_valid_from

To get just the latest state:

select *
from {{ ref('scd_customer_status') }}
where dbt_valid_to is null

These queries are gold for churn analysis, cohort modeling, or lifecycle transition dashboards.


⏱️ Pro Tip: Check vs Timestamp Strategy

  • strategy='check' → Tracks changes in specific columns (e.g. lifecycle_stage)

  • strategy='timestamp' → Tracks changes using a last_updated_at timestamp

Use check when change is based on business state. Use timestamp when you have reliable update timestamps from source systems.


🔁 Snapshot vs. Custom CDC Logic

You might wonder: why not build your own CDC logic?

Here’s how they compare:

Featuredbt SnapshotsCustom CDC Logic
SetupSimple config + SQLRequires tools (Fivetran, Debezium, Kafka, etc.)
GranularityTracks row-level business state changesTracks all DML ops (insert/update/delete)
Use CaseSCD (e.g., lifecycle_stage changes)Full event replication, real-time syncs
Warehouse-Only?✅ Yes, entirely in dbt❌ Requires external ingestion pipeline
Good ForAnalytics, churn modeling, Type 2 SCDsOperational systems, auditing, replaying change streams
PerformanceLightweight for smaller dimensionsScales for millions of rows—but with more infra overhead

You can even use both together:

  • Use CDC tools to land raw changes

  • Use dbt Snapshots to track curated, business-level state changes in your transformation layer


Final Thought

Analytics doesn’t end at clean dimensions and facts. Seeds bring clarity. Snapshots bring memory.

Together, they make your dbt project smarter, auditable, and more trustworthy—without introducing heavy infrastructure.

0
Subscribe to my newsletter

Read articles from Sriram Krishnan directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Sriram Krishnan
Sriram Krishnan