Testing and Documenting dbt Projects the Right Way

Sriram KrishnanSriram Krishnan
5 min read

Now that we’ve built out models, snapshots, and even added macros to keep things DRY, it’s time to answer an important question:

“How do we know our data is correct and understandable?”

In this chapter, we’ll walk through how to:

  • Add tests to your dbt models and seeds

  • Write clean, useful documentation

  • Use everything we’ve already built in previous chapters (revenue models, customer dimensions, snapshots, macros)

  • Keep your dbt project safe, trusted, and readable—for yourself and your team


Why Testing Matters in dbt

Testing helps catch:

  • Broken joins

  • Unexpected nulls

  • Invalid reference values

  • Schema drift from source systems

In dbt, tests are written in .yml files and are run with:

dbt test

Tests fail loudly and early—before bad data hits your dashboards.


1. Add Basic Column-Level Tests

Let’s take our dim_customer model:

-- models/dimensions/dim_customer.sql
select
  contact_id as customer_id,
  email,
  signup_date,
  lifecycle_stage
from {{ ref('stg_hubspot_contacts') }}

We can test that:

  • customer_id is not null

  • email is unique

  • lifecycle_stage is a known set of values

Add this in dim_customer.yml:

version: 2

models:
  - name: dim_customer
    description: "Customer dimension enriched from HubSpot"
    columns:
      - name: customer_id
        tests:
          - not_null

      - name: email
        tests:
          - unique

      - name: lifecycle_stage
        tests:
          - accepted_values:
              values: ['trial', 'active', 'churned']

Now run:

dbt test --select dim_customer

2. Add Tests for Seeds

We loaded exchange rates from a seed file. It’s easy to miss an invalid value or typo.

currency_exchange_rates.yml

version: 2

seeds:
  - name: currency_exchange_rates
    columns:
      - name: currency
        tests:
          - not_null
          - unique

      - name: exchange_rate_to_usd
        tests:
          - not_null
          - greater_than: 0

These tests will fail if a row is missing currency code or contains zero as the rate.


3. Add Relationship Tests

You can test joins without writing SQL.

In our fct_revenue model, we join on customer_id. Let’s test if every customer_id in the fact exists in the dimension.

fct_revenue.yml

version: 2

models:
  - name: fct_revenue
    description: "Monthly recurring revenue from Stripe"
    columns:
      - name: customer_id
        tests:
          - relationships:
              to: ref('dim_customer')
              field: customer_id

This test ensures you don’t have facts with orphaned customers.


4. Add Descriptions That Actually Help

Documentation helps new team members and yourself 3 months from now.

Every model, column, source, and test can (and should) have a description.

Good description:

  • Says what the model/table contains

  • Says what it’s used for

  • Mentions sources or business logic if helpful

Example: dim_customer.yml

models:
  - name: dim_customer
    description: "One row per customer. Enriched with signup date, email, and latest lifecycle stage from HubSpot."
    columns:
      - name: customer_id
        description: "Primary key from HubSpot contact ID"

      - name: lifecycle_stage
        description: "Current customer lifecycle (trial, active, churned)"

Now you can generate a browsable site:

dbt docs generate
dbt docs serve

This gives you a local documentation site with model lineage, column metadata, and test results—perfect for onboarding or audits.


Document Your Macros Too

Even macros can (and should) be documented.

-- macros/convert_currency.sql

{% macro convert_currency(amount_column, currency_column, exchange_rate_table='currency_exchange_rates') %}
  -- Converts amount to USD using exchange rate seed
  {{ amount_column }} * (
    select exchange_rate_to_usd
    from {{ ref(exchange_rate_table) }}
    where currency = {{ currency_column }}
    limit 1
  )
{% endmacro %}

At the very least, leave inline comments explaining parameters and usage.


Testing Strategy Recap

Test TypePurpose
not_nullCatch missing values
uniqueEnforce primary key logic
accepted_valuesCatch unexpected enums or typos
relationshipsEnsure join integrity
greater_than, <=Validate numeric or date ranges

Add tests where:

  • It would be expensive to fix issues later

  • You depend on stable joins or mappings

  • Data comes from sources you don’t fully trust (which is… most of them)


Final Folder Structure (AfterTests & Docs)

/dbt_project/
├── dbt_project.yml
├── profiles.yml           # Local-only config for warehouse credentials (NOT in Git)
├── .gitignore             # Includes target/, dbt_modules/, etc.

├── /models
│   ├── /sources
│   │   ├── stripe.yml
│   │   ├── segment.yml
│   │   └── hubspot.yml
│   ├── /staging
│   │   ├── stg_stripe_payments.sql
│   │   ├── stg_segment_events.sql
│   │   └── stg_hubspot_contacts.sql
│   ├── /dimensions
│   │   ├── dim_customer.sql
│   │   ├── dim_lifecycle_stage.sql
│   │   └── dim_customer.yml         # includes tests + docs
│   ├── /facts
│   │   ├── fct_revenue.sql
│   │   ├── fct_user_engagement.sql
│   │   └── fct_revenue.yml          # includes tests + docs
│   └── /vault
│       ├── hub_customer.sql
│       ├── sat_customer_profile.sql
│       └── link_customer_payment.sql

├── /macros
│   ├── convert_currency.sql         # macro to convert amounts to USD
│   └── safe_cast_timestamp.sql      # macro to cast timestamps safely

├── /snapshots
│   └── scd_customer_status.sql      # tracks customer lifecycle changes over time

├── /seeds
│   ├── currency_exchange_rates.csv
│   └── currency_exchange_rates.yml  # includes seed tests

├── /tests
│   └── (optional custom SQL tests or shared test macros)

├── /target/                         # ⚠️ Auto-generated by dbt, DO NOT commit
│   ├── compiled/                    # Compiled SQL files (after macros/refs resolved)
│   ├── run/                         # Executed SQL grouped by model type
│   ├── manifest.json                # Internal dbt dependency graph
│   ├── catalog.json                 # Model + column metadata used for docs
│   ├── run_results.json             # Results of your last dbt run/test
│   └── logs/                        # Execution logs (if enabled)
0
Subscribe to my newsletter

Read articles from Sriram Krishnan directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Sriram Krishnan
Sriram Krishnan