Creating Seeds & Snapshots in dbt – Use Cases and Tips

In the previous blog, we modeled facts and dimensions to turn raw data into analytics-ready marts. But analytics isn’t only about what’s true now—it’s also about what was true then.
That’s where two underused but powerful dbt features come into play: Seeds and Snapshots.
In this chapter, we’ll explore:
What Seeds and Snapshots do
Real SaaS use cases using our Stripe + Segment + HubSpot data
How to track changing attributes like
customer lifecycle
When to use dbt Snapshots vs. custom CDC logic
Updated folder structure for your dbt project
📁 Updated Project Structure
Here's how your project should now look:
/models
/sources
stripe.yml
segment.yml
hubspot.yml
/staging
stg_stripe_payments.sql
stg_segment_events.sql
stg_hubspot_contacts.sql
/dimensions
dim_customer.sql
dim_lifecycle_stage.sql
/facts
fct_revenue.sql
fct_user_engagement.sql
/vault
hub_customer.sql
sat_customer_profile.sql
link_customer_payment.sql
/seeds
currency_exchange_rates.csv
/snapshots
scd_customer_status.sql
Let’s look at what Seeds and Snapshots contribute to your transformation layer.
🌱 Seeds – Lightweight Reference Tables
What Are Seeds?
Seeds are CSV files stored in your repo and loaded into your warehouse via dbt. They’re perfect for:
Static mappings (like currency codes)
Business rules (like tier thresholds)
Manual reference tables
They are version-controlled alongside your models.
Use Case: Currency Normalization
Let’s say your SaaS app processes payments in multiple currencies. To analyze revenue in USD, you can store exchange rates in a CSV seed:
-- /seeds/currency_exchange_rates.csv
currency,exchange_rate_to_usd
USD,1.0
EUR,1.12
INR,0.012
You reference this in your revenue fact:
-- fct_revenue.sql
with exchange_rates as (
select * from {{ ref('currency_exchange_rates') }}
)
select
p.customer_id,
date_trunc('month', p.payment_time) as revenue_month,
sum(p.amount_usd * e.exchange_rate_to_usd) as mrr_usd,
c.lifecycle_stage
from {{ ref('stg_stripe_payments') }} p
left join {{ ref('dim_customer') }} c
on p.customer_id = c.customer_id
left join exchange_rates e
on p.currency = e.currency
group by 1, 2, 4
Run it with:
dbt seed
What Happens:
dbt reads each
.csv
file in your/seeds
folderIt creates or overwrites the corresponding table in your target schema
The table name is the same as the CSV file name (minus
.csv
)
Becomes:
<target_database>.<target_schema>.currency_exchange_rates
And yes — this table is fully queryable, joinable, and can be used just like any other model via {{ ref('currency_exchange_rates') }}
.
🔁 How Do You Update Seeds?
Step 1: Edit the CSV
Update the contents of the file directly in:
/seeds/currency_exchange_rates.csv
For example, to change the INR conversion rate:
currency,exchange_rate_to_usd
USD,1.0
EUR,1.12
INR,0.013
Step 2: Re-run dbt seed
After editing, rerun:
dbt seed
dbt will:
Replace the existing table
Load the new contents from the CSV file
Log the number of rows loaded
Important Notes
Overwrites entire table — Seeds are not incremental. Every time you run
dbt seed
, it fully reloads the table from the CSV file.No history is retained — If you need to track changes over time in a seed (e.g., exchange rate history), you'll need to model that separately using snapshots or versioned seed files.
🕰️ Snapshots – Tracking Historical Changes
What Are Snapshots?
While Seeds bring in fixed reference data, Snapshots bring in time.
dbt Snapshots let you track how a record changes over time by capturing point-in-time versions of rows from a source model. It’s a built-in way to implement Slowly Changing Dimensions (Type 2).
🔍 Do Snapshots Create a Table?
Yes—snapshots create a permanent table in your data warehouse.
They live in the schema you configure (usually
snapshots
)They persist across dbt runs
dbt automatically adds tracking fields:
dbt_valid_from
,dbt_valid_to
, anddbt_scd_id
-- Inside your snapshot config:
config(
target_schema='snapshots'
)
This results in a table like:
your_database.snapshots.scd_customer_status
📦 How Do Snapshots Handle Updates?
dbt snapshots don’t update existing rows. Instead, they follow this flow:
Snapshot Event | What Happens |
First run | All source rows are inserted with valid_from |
Field changes (e.g. lifecycle_stage ) | Old row is closed (valid_to is set), new row is inserted |
No changes | Nothing happens |
Record is deleted upstream | Row is closed, if invalidate_hard_deletes=True |
🧠 Example: Tracking Customer Lifecycle Transitions
Imagine the stg_hubspot_contacts
model returns this on Day 1:
| contact_id | email | lifecycle_stage | signup_date |
|------------|------------------|------------------|-------------|
| C001 | user1@acme.com | trial | 2024-12-01 |
Day 10 – Customer becomes active:
| C001 | user1@acme.com | active | 2024-12-01 |
Day 30 – Customer churns:
| C001 | user1@acme.com | churned | 2024-12-01 |
✍️ Define the Snapshot File
-- /snapshots/scd_customer_status.sql
{% snapshot scd_customer_status %}
{{
config(
target_schema='snapshots',
unique_key='contact_id',
strategy='check',
check_cols=['lifecycle_stage'],
invalidate_hard_deletes=True
)
}}
select
contact_id,
email,
lifecycle_stage,
signup_date,
current_timestamp as snapshot_loaded_at
from {{ ref('stg_hubspot_contacts') }}
{% endsnapshot %}
Run this with:
dbt snapshot
📈 What Does the Snapshot Table Look Like?
| contact_id | lifecycle_stage | dbt_valid_from | dbt_valid_to |
|------------|------------------|----------------------|------------------------|
| C001 | trial | 2025-07-01 08:00:00 | 2025-07-10 09:15:22 |
| C001 | active | 2025-07-10 09:15:22 | 2025-07-30 07:03:45 |
| C001 | churned | 2025-07-30 07:03:45 | NULL |
Each row represents the state of a record during a specific time window. You get full visibility into how a record evolved, without manually writing change-tracking logic.
🛑 What About Deletes?
If a record disappears from the source and you set:
invalidate_hard_deletes=True
dbt will close the current row by setting dbt_valid_to
to the current time, without deleting history.
This ensures your analytics are still auditable and explainable—even if a system upstream performs hard deletes.
🔎 Querying Snapshot History
To get the full lifecycle for each customer:
select
contact_id,
lifecycle_stage,
dbt_valid_from,
dbt_valid_to
from {{ ref('scd_customer_status') }}
order by contact_id, dbt_valid_from
To get just the latest state:
select *
from {{ ref('scd_customer_status') }}
where dbt_valid_to is null
These queries are gold for churn analysis, cohort modeling, or lifecycle transition dashboards.
⏱️ Pro Tip: Check vs Timestamp Strategy
strategy='check'
→ Tracks changes in specific columns (e.g.lifecycle_stage
)strategy='timestamp'
→ Tracks changes using alast_updated_at
timestamp
Use check
when change is based on business state. Use timestamp
when you have reliable update timestamps from source systems.
🔁 Snapshot vs. Custom CDC Logic
You might wonder: why not build your own CDC logic?
Here’s how they compare:
Feature | dbt Snapshots | Custom CDC Logic |
Setup | Simple config + SQL | Requires tools (Fivetran, Debezium, Kafka, etc.) |
Granularity | Tracks row-level business state changes | Tracks all DML ops (insert/update/delete) |
Use Case | SCD (e.g., lifecycle_stage changes) | Full event replication, real-time syncs |
Warehouse-Only? | ✅ Yes, entirely in dbt | ❌ Requires external ingestion pipeline |
Good For | Analytics, churn modeling, Type 2 SCDs | Operational systems, auditing, replaying change streams |
Performance | Lightweight for smaller dimensions | Scales for millions of rows—but with more infra overhead |
You can even use both together:
Use CDC tools to land raw changes
Use dbt Snapshots to track curated, business-level state changes in your transformation layer
Final Thought
Analytics doesn’t end at clean dimensions and facts. Seeds bring clarity. Snapshots bring memory.
Together, they make your dbt project smarter, auditable, and more trustworthy—without introducing heavy infrastructure.
Subscribe to my newsletter
Read articles from Sriram Krishnan directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
