Dimensional Modeling Demystified: Star vs Snowflake vs Vault in dbt

Once you've ingested and staged raw data, the next step is transforming that into usable business insights. This blog dives into the art of dimensional modeling—structuring your dbt models to support reporting, dashboarding, and analytics at scale.
We’ll explore three main modeling styles:
Star Schema – Simple and effective for fast analytics
Snowflake Schema – More normalized, useful for reusability
Data Vault – Highly auditable and adaptable for raw change tracking
All concepts are demonstrated using the same SaaS use case introduced earlier—pulling data from Stripe, Segment, and HubSpot to track revenue, user activity, and customer lifecycle.
🧱 Before You Begin: Suggested Folder Structure
Here’s the folder structure we’ll use to guide the transformations:
/models
/sources
stripe.yml
segment.yml
hubspot.yml
/staging
stg_stripe_payments.sql
stg_segment_events.sql
stg_hubspot_contacts.sql
/dimensions
dim_customer.sql
dim_lifecycle_stage.sql
/facts
fct_revenue.sql
fct_user_engagement.sql
/vault (optional, for Data Vault pattern)
hub_customer.sql
sat_customer_profile.sql
link_customer_payment.sql
We'll start by staging raw data, then building dimensions, and finally facts.
⭐ Star Schema: Fast & Flat
Concept
The Star Schema centers around fact tables containing numeric metrics (like MRR), surrounded by denormalized dimension tables (like Customer).
When to Use
Simple reporting needs
Performance is critical
Your analysts prefer minimal joins
Step 1: Stage the Raw Data
-- stg_stripe_payments.sql
select
id as payment_id,
customer_id,
amount / 100.0 as amount_usd,
created_at::timestamp as payment_time,
status
from {{ source('stripe', 'stripe_payments') }}
where status = 'paid'
-- stg_hubspot_contacts.sql
select
contact_id,
email,
lifecycle_stage,
created_at::date as signup_date
from {{ source('hubspot', 'hubspot_contacts') }}
Step 2: Build the Dimension Table
-- dim_customer.sql
select
contact_id as customer_id,
email,
signup_date,
lifecycle_stage
from {{ ref('stg_hubspot_contacts') }}
Step 3: Build the Fact Table
-- fct_revenue.sql
select
p.customer_id,
date_trunc('month', p.payment_time) as revenue_month,
sum(p.amount_usd) as mrr,
c.email,
c.lifecycle_stage
from {{ ref('stg_stripe_payments') }} p
left join {{ ref('dim_customer') }} c
on p.customer_id = c.customer_id
group by 1, 2, 4, 5
This model is now ready for BI tools—clean, flat, and fast.
❄️ Snowflake Schema: Modular & Normalized
Concept
Instead of repeating everything in dim_customer
, you normalize into separate sub-dimensions. For example, lifecycle_stage
becomes its own table.
When to Use
You want reusability (e.g.,
lifecycle_stage
across domains)You want to reduce redundancy
You have more complex dimensions
Step 1: Create the Sub-Dimension
-- dim_lifecycle_stage.sql
select distinct
lifecycle_stage,
dense_rank() over (order by lifecycle_stage) as lifecycle_stage_id
from {{ ref('stg_hubspot_contacts') }}
Step 2: Update Customer Dimension
-- dim_customer.sql
select
contact_id as customer_id,
email,
signup_date,
l.lifecycle_stage_id
from {{ ref('stg_hubspot_contacts') }} c
left join {{ ref('dim_lifecycle_stage') }} l
on c.lifecycle_stage = l.lifecycle_stage
Step 3: Use the Normalized Dimension in the Fact
-- fct_revenue.sql
select
p.customer_id,
date_trunc('month', p.payment_time) as revenue_month,
sum(p.amount_usd) as mrr,
d.lifecycle_stage_id
from {{ ref('stg_stripe_payments') }} p
left join {{ ref('dim_customer') }} d
on p.customer_id = d.customer_id
group by 1, 2, 4
This keeps the fact table slim and clean.
🔐 Data Vault: Auditable & Flexible
Concept
Data Vault focuses on traceability and raw fidelity. You split data into:
Hubs – Business keys (e.g.,
customer_id
)Satellites – Attributes (e.g.,
email
,signup_date
)Links – Relationships (e.g.,
customer ↔ payment
)
Step 1: Create a Customer Hub
-- hub_customer.sql
select distinct
contact_id as customer_key,
md5(contact_id) as hub_customer_id
from {{ ref('stg_hubspot_contacts') }}
Step 2: Create Satellite for Attributes
-- sat_customer_profile.sql
select
md5(contact_id) as hub_customer_id,
email,
lifecycle_stage,
signup_date,
current_timestamp as record_loaded_at
from {{ ref('stg_hubspot_contacts') }}
Step 3: Create a Link Table to Payments
-- link_customer_payment.sql
select
md5(h.contact_id) as hub_customer_id,
md5(p.customer_id) as hub_payment_id,
current_timestamp as link_loaded_at
from {{ ref('stg_hubspot_contacts') }} h
inner join {{ ref('stg_stripe_payments') }} p
on h.contact_id = p.customer_id
This structure is verbose but provides maximum historical traceability.
🧭 Which Model Should You Choose?
Use Case | Recommended Approach |
Fast dashboarding and ease of use | ⭐ Star Schema |
Shared dimensions across teams | ❄️ Snowflake Schema |
Full auditability and data traceability | 🔐 Data Vault |
You don’t have to stick to just one—many teams use Star Schema in the marts, but Snowflake or Vault patterns in the staging layer depending on the business domain and data quality.
Final Thoughts
Dimensional modeling is where your dbt project truly becomes analytics-ready. Whether you're serving dashboards, analysts, or executive reporting, the structure you choose impacts everything—speed, scale, and trust.
Subscribe to my newsletter
Read articles from Sriram Krishnan directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
