Testing and Documenting dbt Projects the Right Way

Now that we’ve built out models, snapshots, and even added macros to keep things DRY, it’s time to answer an important question:
“How do we know our data is correct and understandable?”
In this chapter, we’ll walk through how to:
Add tests to your dbt models and seeds
Write clean, useful documentation
Use everything we’ve already built in previous chapters (revenue models, customer dimensions, snapshots, macros)
Keep your dbt project safe, trusted, and readable—for yourself and your team
Why Testing Matters in dbt
Testing helps catch:
Broken joins
Unexpected nulls
Invalid reference values
Schema drift from source systems
In dbt, tests are written in .yml
files and are run with:
dbt test
Tests fail loudly and early—before bad data hits your dashboards.
1. Add Basic Column-Level Tests
Let’s take our dim_customer
model:
-- models/dimensions/dim_customer.sql
select
contact_id as customer_id,
email,
signup_date,
lifecycle_stage
from {{ ref('stg_hubspot_contacts') }}
We can test that:
customer_id
is not nullemail
is uniquelifecycle_stage
is a known set of values
Add this in dim_customer.yml
:
version: 2
models:
- name: dim_customer
description: "Customer dimension enriched from HubSpot"
columns:
- name: customer_id
tests:
- not_null
- name: email
tests:
- unique
- name: lifecycle_stage
tests:
- accepted_values:
values: ['trial', 'active', 'churned']
Now run:
dbt test --select dim_customer
2. Add Tests for Seeds
We loaded exchange rates from a seed file. It’s easy to miss an invalid value or typo.
currency_exchange_rates.yml
version: 2
seeds:
- name: currency_exchange_rates
columns:
- name: currency
tests:
- not_null
- unique
- name: exchange_rate_to_usd
tests:
- not_null
- greater_than: 0
These tests will fail if a row is missing currency code or contains zero as the rate.
3. Add Relationship Tests
You can test joins without writing SQL.
In our fct_revenue
model, we join on customer_id
. Let’s test if every customer_id
in the fact exists in the dimension.
fct_revenue.yml
version: 2
models:
- name: fct_revenue
description: "Monthly recurring revenue from Stripe"
columns:
- name: customer_id
tests:
- relationships:
to: ref('dim_customer')
field: customer_id
This test ensures you don’t have facts with orphaned customers.
4. Add Descriptions That Actually Help
Documentation helps new team members and yourself 3 months from now.
Every model, column, source, and test can (and should) have a description
.
Good description:
Says what the model/table contains
Says what it’s used for
Mentions sources or business logic if helpful
Example: dim_customer.yml
models:
- name: dim_customer
description: "One row per customer. Enriched with signup date, email, and latest lifecycle stage from HubSpot."
columns:
- name: customer_id
description: "Primary key from HubSpot contact ID"
- name: lifecycle_stage
description: "Current customer lifecycle (trial, active, churned)"
Now you can generate a browsable site:
dbt docs generate
dbt docs serve
This gives you a local documentation site with model lineage, column metadata, and test results—perfect for onboarding or audits.
Document Your Macros Too
Even macros can (and should) be documented.
-- macros/convert_currency.sql
{% macro convert_currency(amount_column, currency_column, exchange_rate_table='currency_exchange_rates') %}
-- Converts amount to USD using exchange rate seed
{{ amount_column }} * (
select exchange_rate_to_usd
from {{ ref(exchange_rate_table) }}
where currency = {{ currency_column }}
limit 1
)
{% endmacro %}
At the very least, leave inline comments explaining parameters and usage.
Testing Strategy Recap
Test Type | Purpose |
not_null | Catch missing values |
unique | Enforce primary key logic |
accepted_values | Catch unexpected enums or typos |
relationships | Ensure join integrity |
greater_than , <= | Validate numeric or date ranges |
Add tests where:
It would be expensive to fix issues later
You depend on stable joins or mappings
Data comes from sources you don’t fully trust (which is… most of them)
Final Folder Structure (AfterTests & Docs)
/dbt_project/
├── dbt_project.yml
├── profiles.yml # Local-only config for warehouse credentials (NOT in Git)
├── .gitignore # Includes target/, dbt_modules/, etc.
├── /models
│ ├── /sources
│ │ ├── stripe.yml
│ │ ├── segment.yml
│ │ └── hubspot.yml
│ ├── /staging
│ │ ├── stg_stripe_payments.sql
│ │ ├── stg_segment_events.sql
│ │ └── stg_hubspot_contacts.sql
│ ├── /dimensions
│ │ ├── dim_customer.sql
│ │ ├── dim_lifecycle_stage.sql
│ │ └── dim_customer.yml # includes tests + docs
│ ├── /facts
│ │ ├── fct_revenue.sql
│ │ ├── fct_user_engagement.sql
│ │ └── fct_revenue.yml # includes tests + docs
│ └── /vault
│ ├── hub_customer.sql
│ ├── sat_customer_profile.sql
│ └── link_customer_payment.sql
├── /macros
│ ├── convert_currency.sql # macro to convert amounts to USD
│ └── safe_cast_timestamp.sql # macro to cast timestamps safely
├── /snapshots
│ └── scd_customer_status.sql # tracks customer lifecycle changes over time
├── /seeds
│ ├── currency_exchange_rates.csv
│ └── currency_exchange_rates.yml # includes seed tests
├── /tests
│ └── (optional custom SQL tests or shared test macros)
├── /target/ # ⚠️ Auto-generated by dbt, DO NOT commit
│ ├── compiled/ # Compiled SQL files (after macros/refs resolved)
│ ├── run/ # Executed SQL grouped by model type
│ ├── manifest.json # Internal dbt dependency graph
│ ├── catalog.json # Model + column metadata used for docs
│ ├── run_results.json # Results of your last dbt run/test
│ └── logs/ # Execution logs (if enabled)
Subscribe to my newsletter
Read articles from Sriram Krishnan directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
