Dimensional Modeling Demystified: Star vs Snowflake vs Vault in dbt

Sriram KrishnanSriram Krishnan
4 min read

Once you've ingested and staged raw data, the next step is transforming that into usable business insights. This blog dives into the art of dimensional modeling—structuring your dbt models to support reporting, dashboarding, and analytics at scale.

We’ll explore three main modeling styles:

  • Star Schema – Simple and effective for fast analytics

  • Snowflake Schema – More normalized, useful for reusability

  • Data Vault – Highly auditable and adaptable for raw change tracking

All concepts are demonstrated using the same SaaS use case introduced earlier—pulling data from Stripe, Segment, and HubSpot to track revenue, user activity, and customer lifecycle.


🧱 Before You Begin: Suggested Folder Structure

Here’s the folder structure we’ll use to guide the transformations:

/models
  /sources
    stripe.yml
    segment.yml
    hubspot.yml

  /staging
    stg_stripe_payments.sql
    stg_segment_events.sql
    stg_hubspot_contacts.sql

  /dimensions
    dim_customer.sql
    dim_lifecycle_stage.sql

  /facts
    fct_revenue.sql
    fct_user_engagement.sql

  /vault  (optional, for Data Vault pattern)
    hub_customer.sql
    sat_customer_profile.sql
    link_customer_payment.sql

We'll start by staging raw data, then building dimensions, and finally facts.


⭐ Star Schema: Fast & Flat

Concept

The Star Schema centers around fact tables containing numeric metrics (like MRR), surrounded by denormalized dimension tables (like Customer).

When to Use

  • Simple reporting needs

  • Performance is critical

  • Your analysts prefer minimal joins


Step 1: Stage the Raw Data

-- stg_stripe_payments.sql
select
  id as payment_id,
  customer_id,
  amount / 100.0 as amount_usd,
  created_at::timestamp as payment_time,
  status
from {{ source('stripe', 'stripe_payments') }}
where status = 'paid'
-- stg_hubspot_contacts.sql
select
  contact_id,
  email,
  lifecycle_stage,
  created_at::date as signup_date
from {{ source('hubspot', 'hubspot_contacts') }}

Step 2: Build the Dimension Table

-- dim_customer.sql
select
  contact_id as customer_id,
  email,
  signup_date,
  lifecycle_stage
from {{ ref('stg_hubspot_contacts') }}

Step 3: Build the Fact Table

-- fct_revenue.sql
select
  p.customer_id,
  date_trunc('month', p.payment_time) as revenue_month,
  sum(p.amount_usd) as mrr,
  c.email,
  c.lifecycle_stage
from {{ ref('stg_stripe_payments') }} p
left join {{ ref('dim_customer') }} c
  on p.customer_id = c.customer_id
group by 1, 2, 4, 5

This model is now ready for BI tools—clean, flat, and fast.


❄️ Snowflake Schema: Modular & Normalized

Concept

Instead of repeating everything in dim_customer, you normalize into separate sub-dimensions. For example, lifecycle_stage becomes its own table.

When to Use

  • You want reusability (e.g., lifecycle_stage across domains)

  • You want to reduce redundancy

  • You have more complex dimensions


Step 1: Create the Sub-Dimension

-- dim_lifecycle_stage.sql
select distinct
  lifecycle_stage,
  dense_rank() over (order by lifecycle_stage) as lifecycle_stage_id
from {{ ref('stg_hubspot_contacts') }}

Step 2: Update Customer Dimension

-- dim_customer.sql
select
  contact_id as customer_id,
  email,
  signup_date,
  l.lifecycle_stage_id
from {{ ref('stg_hubspot_contacts') }} c
left join {{ ref('dim_lifecycle_stage') }} l
  on c.lifecycle_stage = l.lifecycle_stage

Step 3: Use the Normalized Dimension in the Fact

-- fct_revenue.sql
select
  p.customer_id,
  date_trunc('month', p.payment_time) as revenue_month,
  sum(p.amount_usd) as mrr,
  d.lifecycle_stage_id
from {{ ref('stg_stripe_payments') }} p
left join {{ ref('dim_customer') }} d
  on p.customer_id = d.customer_id
group by 1, 2, 4

This keeps the fact table slim and clean.


🔐 Data Vault: Auditable & Flexible

Concept

Data Vault focuses on traceability and raw fidelity. You split data into:

  • Hubs – Business keys (e.g., customer_id)

  • Satellites – Attributes (e.g., email, signup_date)

  • Links – Relationships (e.g., customer ↔ payment)


Step 1: Create a Customer Hub

-- hub_customer.sql
select distinct
  contact_id as customer_key,
  md5(contact_id) as hub_customer_id
from {{ ref('stg_hubspot_contacts') }}

Step 2: Create Satellite for Attributes

-- sat_customer_profile.sql
select
  md5(contact_id) as hub_customer_id,
  email,
  lifecycle_stage,
  signup_date,
  current_timestamp as record_loaded_at
from {{ ref('stg_hubspot_contacts') }}

-- link_customer_payment.sql
select
  md5(h.contact_id) as hub_customer_id,
  md5(p.customer_id) as hub_payment_id,
  current_timestamp as link_loaded_at
from {{ ref('stg_hubspot_contacts') }} h
inner join {{ ref('stg_stripe_payments') }} p
  on h.contact_id = p.customer_id

This structure is verbose but provides maximum historical traceability.


🧭 Which Model Should You Choose?

Use CaseRecommended Approach
Fast dashboarding and ease of use⭐ Star Schema
Shared dimensions across teams❄️ Snowflake Schema
Full auditability and data traceability🔐 Data Vault

You don’t have to stick to just one—many teams use Star Schema in the marts, but Snowflake or Vault patterns in the staging layer depending on the business domain and data quality.


Final Thoughts

Dimensional modeling is where your dbt project truly becomes analytics-ready. Whether you're serving dashboards, analysts, or executive reporting, the structure you choose impacts everything—speed, scale, and trust.

0
Subscribe to my newsletter

Read articles from Sriram Krishnan directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Sriram Krishnan
Sriram Krishnan