Once you've ingested and staged raw data, the next step is transforming that into usable business insights. This blog dives into the art of dimensional modeling—structuring your dbt models to support reporting, dashboarding, and analytics at scale.

We’ll explore three main modeling styles:

Star Schema – Simple and effective for fast analytics
Snowflake Schema – More normalized, useful for reusability
Data Vault – Highly auditable and adaptable for raw change tracking

All concepts are demonstrated using the same SaaS use case introduced earlier—pulling data from Stripe, Segment, and HubSpot to track revenue, user activity, and customer lifecycle.

🧱 Before You Begin: Suggested Folder Structure

Here’s the folder structure we’ll use to guide the transformations:

/models
  /sources
    stripe.yml
    segment.yml
    hubspot.yml

  /staging
    stg_stripe_payments.sql
    stg_segment_events.sql
    stg_hubspot_contacts.sql

  /dimensions
    dim_customer.sql
    dim_lifecycle_stage.sql

  /facts
    fct_revenue.sql
    fct_user_engagement.sql

  /vault  (optional, for Data Vault pattern)
    hub_customer.sql
    sat_customer_profile.sql
    link_customer_payment.sql

We'll start by staging raw data, then building dimensions, and finally facts.

⭐ Star Schema: Fast & Flat

Concept

The Star Schema centers around fact tables containing numeric metrics (like MRR), surrounded by denormalized dimension tables (like Customer).

When to Use

Simple reporting needs
Performance is critical
Your analysts prefer minimal joins

Step 1: Stage the Raw Data

-- stg_stripe_payments.sql
select
  id as payment_id,
  customer_id,
  amount / 100.0 as amount_usd,
  created_at::timestamp as payment_time,
  status
from {{ source('stripe', 'stripe_payments') }}
where status = 'paid'

-- stg_hubspot_contacts.sql
select
  contact_id,
  email,
  lifecycle_stage,
  created_at::date as signup_date
from {{ source('hubspot', 'hubspot_contacts') }}

Step 2: Build the Dimension Table

-- dim_customer.sql
select
  contact_id as customer_id,
  email,
  signup_date,
  lifecycle_stage
from {{ ref('stg_hubspot_contacts') }}

Step 3: Build the Fact Table

-- fct_revenue.sql
select
  p.customer_id,
  date_trunc('month', p.payment_time) as revenue_month,
  sum(p.amount_usd) as mrr,
  c.email,
  c.lifecycle_stage
from {{ ref('stg_stripe_payments') }} p
left join {{ ref('dim_customer') }} c
  on p.customer_id = c.customer_id
group by 1, 2, 4, 5

This model is now ready for BI tools—clean, flat, and fast.

❄️ Snowflake Schema: Modular & Normalized

Concept

Instead of repeating everything in dim_customer, you normalize into separate sub-dimensions. For example, lifecycle_stage becomes its own table.

When to Use

You want reusability (e.g., lifecycle_stage across domains)
You want to reduce redundancy
You have more complex dimensions

Step 1: Create the Sub-Dimension

-- dim_lifecycle_stage.sql
select distinct
  lifecycle_stage,
  dense_rank() over (order by lifecycle_stage) as lifecycle_stage_id
from {{ ref('stg_hubspot_contacts') }}

Step 2: Update Customer Dimension

-- dim_customer.sql
select
  contact_id as customer_id,
  email,
  signup_date,
  l.lifecycle_stage_id
from {{ ref('stg_hubspot_contacts') }} c
left join {{ ref('dim_lifecycle_stage') }} l
  on c.lifecycle_stage = l.lifecycle_stage

Step 3: Use the Normalized Dimension in the Fact

-- fct_revenue.sql
select
  p.customer_id,
  date_trunc('month', p.payment_time) as revenue_month,
  sum(p.amount_usd) as mrr,
  d.lifecycle_stage_id
from {{ ref('stg_stripe_payments') }} p
left join {{ ref('dim_customer') }} d
  on p.customer_id = d.customer_id
group by 1, 2, 4

This keeps the fact table slim and clean.

🔐 Data Vault: Auditable & Flexible

Concept

Data Vault focuses on traceability and raw fidelity. You split data into:

Hubs – Business keys (e.g., customer_id)
Satellites – Attributes (e.g., email, signup_date)
Links – Relationships (e.g., customer ↔ payment)

Step 1: Create a Customer Hub

-- hub_customer.sql
select distinct
  contact_id as customer_key,
  md5(contact_id) as hub_customer_id
from {{ ref('stg_hubspot_contacts') }}

Step 2: Create Satellite for Attributes

-- sat_customer_profile.sql
select
  md5(contact_id) as hub_customer_id,
  email,
  lifecycle_stage,
  signup_date,
  current_timestamp as record_loaded_at
from {{ ref('stg_hubspot_contacts') }}

Step 3: Create a Link Table to Payments

-- link_customer_payment.sql
select
  md5(h.contact_id) as hub_customer_id,
  md5(p.customer_id) as hub_payment_id,
  current_timestamp as link_loaded_at
from {{ ref('stg_hubspot_contacts') }} h
inner join {{ ref('stg_stripe_payments') }} p
  on h.contact_id = p.customer_id

This structure is verbose but provides maximum historical traceability.

🧭 Which Model Should You Choose?

Use Case	Recommended Approach
Fast dashboarding and ease of use	⭐ Star Schema
Shared dimensions across teams	❄️ Snowflake Schema
Full auditability and data traceability	🔐 Data Vault

You don’t have to stick to just one—many teams use Star Schema in the marts, but Snowflake or Vault patterns in the staging layer depending on the business domain and data quality.

Final Thoughts

Dimensional modeling is where your dbt project truly becomes analytics-ready. Whether you're serving dashboards, analysts, or executive reporting, the structure you choose impacts everything—speed, scale, and trust.

Dimensional Modeling Demystified: Star vs Snowflake vs Vault in dbt

🧱 Before You Begin: Suggested Folder Structure

⭐ Star Schema: Fast & Flat

Concept

When to Use

Step 1: Stage the Raw Data

Step 2: Build the Dimension Table

Step 3: Build the Fact Table

❄️ Snowflake Schema: Modular & Normalized

Concept

When to Use

Step 1: Create the Sub-Dimension

Step 2: Update Customer Dimension

Step 3: Use the Normalized Dimension in the Fact

🔐 Data Vault: Auditable & Flexible

Concept

Step 1: Create a Customer Hub

Step 2: Create Satellite for Attributes

Step 3: Create a Link Table to Payments

🧭 Which Model Should You Choose?

Final Thoughts

Subscribe to my newsletter

Sriram Krishnan

Sriram Krishnan