In our previous blogs, we mastered the art of building sophisticated data pipelines with dbt inside Snowflake. We can now transform raw data into well-structured, production-grade tables. We see the green "PASS" in our dbt logs, and we deploy our models on a schedule. Our job is done, right?

Not quite. A successful dbt build guarantees that our code ran correctly, but it doesn't guarantee that the data itself is trustworthy. What if an upstream API change starts sending NULL values for order_total? What if a bug in a source system generates duplicate order_ids? These are not code failures; they are data failures, and they silently erode the most valuable asset we have as a data team: trust.

Until now, data quality was often a periodic check run within a dbt job or handled by a separate, third-party tool. With the introduction of its native Data Quality Monitoring features, Snowflake is aiming to make data quality a continuous, declarative, and integral part of the data platform itself.

In this blog, we will explore this new paradigm, learning how to use Data Metric Functions (DMFs) and Expectations to build a robust, automated trust layer directly on top of our data.

What are Data Metric Functions (DMFs)?

A Data Metric Function is a pre-built or user-defined SQL function that measures a specific aspect of data quality. Think of them as specialized tools for asking questions about your data's health. Snowflake provides a powerful set of system DMFs out of the box, including:

SNOWFLAKE.CORE.NULL_COUNT: Counts the number of null values in a column.
SNOWFLAKE.CORE.UNIQUE_COUNT: Counts the number of distinct values.
SNOWFLAKE.CORE.ROW_COUNT: Measures the total number of rows.
SNOWFLAKE.CORE.FRESHNESS: Measures the time elapsed since the table was last updated.

The true power of this framework is its extensibility. If a system DMF doesn't fit your needs, you can write your own Custom DMF to encapsulate any business-specific quality check you can imagine.

Part1: The "How": A Practical Walkthrough

Let's put this into practice. We'll monitor a raw_orders table to ensure its integrity.

Step 1: The Setup

First, let's create a sample table to work with.

USE ROLE DBT_ROLE;
USE WAREHOUSE DBT_WH;
USE DATABASE RAW;
USE SCHEMA PUBLIC;

CREATE OR REPLACE TABLE raw_orders (
  order_id INT,
  customer_email STRING,
  order_total NUMBER(10, 2),
  order_date DATE
);

INSERT INTO raw_orders VALUES
  (1, 'michael.scott@dundermifflin.com', 150.25, '2026-01-10'),
  (2, 'dwight.schrute@dundermifflin.com', 25.00, '2026-01-11'),
  (NULL, 'sriram.krishnan@aol', 99.99, '2026-01-11'); -- A row with a null order_id & invalid Customer_email

Step 2: Applying a System DMF

Let's measure the number of nulls in the order_id column. We can call the DMF directly to get an immediate measurement.

SELECT SNOWFLAKE.CORE.NULL_COUNT(
    SELECT order_id FROM raw_orders
);
-- Result will be a JSON object: { "MEASUREMENT": 1 }

Step 3: Setting an Expectation

A metric is just a number. An Expectation is what makes it actionable. It defines the acceptable range for a metric's value. Let's set an expectation that the order_id column should have zero nulls.

CALL SYSTEM$SET_EXPECTATION(
    'raw_orders', -- The object I'm attaching this to
    'expect_no_null_order_ids', -- The name of my rule
    'SNOWFLAKE.CORE.NULL_COUNT', -- What to measure
    'order_id', -- Which column
    {},
    'equals 0' -- The condition for success
);

Step 4: Scheduling and Running the Checks

We can trigger our checks manually or, more powerfully, schedule them to run automatically.

-- Manually trigger all metrics on the table
EXECUTE IMMEDIATE 'CALL SNOWFLAKE.DATA_QUALITY.RUN_ALL_METRICS_FOR_TABLE(''raw_orders'')';

-- or Trigger on changes
ALTER TABLE raw_orders SET DATA_METRIC_SCHEDULE = 'TRIGGER_ON_CHANGES';

-- Or, create a Snowflake Task to run the checks daily
CREATE OR REPLACE TASK daily_quality_checks
  WAREHOUSE = DBT_WH
  SCHEDULE = 'USING CRON 0 8 * * * UTC'
AS
  CALL SNOWFLAKE.DATA_QUALITY.RUN_ALL_METRICS_FOR_TABLE('raw_orders');

Step 5: Reviewing the Results

Snowflake logs the results of every check in a centralized, queryable history.

SELECT
  metric_name,
  measurement_time,
  status, -- FAILED or PASSED
  measurement_value
FROM SNOWFLAKE.DATA_QUALITY_MONITORING_RESULTS.METRIC_RESULTS
WHERE table_name = 'RAW_ORDERS'
ORDER BY measurement_time DESC
LIMIT 1;

-- Result:
-- | METRIC_NAME                 | MEASUREMENT_TIME | STATUS | MEASUREMENT_VALUE |
-- |-----------------------------|------------------|--------|-------------------|
-- | expect_no_null_order_ids    | ...              | FAILED | 1                 |

We can see that our check failed because it found one null value, proving our monitoring is working.

Part 2: Advanced Techniques with Custom DMFs

What if you need a business-specific check that doesn't exist out of the box? This is where Custom Data Metric Functions shine. They allow you to encapsulate any business logic you can express in SQL into a reusable, schedulable quality check.

Step 2.1: Creating a Custom DMF for Email Validation

System DMFs are great for technical checks, but they don't understand our business rules. For example, they can't tell if a string is a valid email address. Let's build a custom DMF for that.

-- This function will count the number of rows where the email format is invalid.
CREATE OR REPLACE DATA_METRIC_FUNCTION raw.public.invalid_email_count(ARG_TABLE_NAME STRING, ARG_COL_NAME STRING)
RETURNS NUMBER
AS
$$
  SELECT COUNT(*)
  FROM TABLE(ARG_TABLE_NAME)
  WHERE NOT REGEXP_LIKE(ARG_COL_NAME, '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}$')
$$;

Step 2.2: Applying and Testing the Custom DMF

Now that our custom DMF is created, we can apply it to our raw_orders table with an expectation, just like a system DMF.


-- Set an expectation that the number of invalid emails should be zero.
CALL SYSTEM$SET_EXPECTATION(
    'raw_orders',
    'expect_valid_customer_emails',
    'raw.public.invalid_email_count', -- Use our custom DMF
    'customer_email', -- The column to pass to the DMF
    {},
    'equals 0'
);

From this point on, Snowflake treats our custom function as a first-class citizen in the quality framework. It can be scheduled, monitored, and alerted on just like any other metric.

Step 2.3: Finding the "Bad Rows"

Knowing that a check failed is good; knowing which rows caused the failure is better. We can use our custom DMF with the TABLE() keyword to return the actual failing records, making them immediately actionable.

-- Return the actual rows with an invalid email format
SELECT *
FROM TABLE(
    raw.public.invalid_email_count(
        (SELECT * FROM raw_orders),
        'customer_email'
    )
);

-- Result:
-- | ORDER_ID | CUSTOMER_EMAIL | ORDER_TOTAL |
-- |----------|----------------|-------------|
-- | 3        | invalid-email  | 99.99       |

Part 3: Monitoring and Alerting on Failures

Automation is useless without alerting. You can build a simple alerting system by creating a Snowflake Alert that queries the results table and sends an email on failure.

-- Create a Snowflake Alert that checks for new failures every hour
CREATE OR REPLACE ALERT quality_check_failure_alert
  WAREHOUSE = DBT_WH
  SCHEDULE = '60 MINUTE'
IF (EXISTS (
  SELECT 1
  FROM SNOWFLAKE.DATA_QUALITY_MONITORING_RESULTS.METRIC_RESULTS
  WHERE status = 'FAILED'
    AND measurement_time > DATEADD(hour, -1, CURRENT_TIMESTAMP())
))
THEN
  CALL SYSTEM$SEND_EMAIL(
    'my_email_integration',
    'data.team@example.com',
    'Data Quality Alert: A check has failed!',
    'A data quality check has failed in the last hour. Please check the METRIC_RESULTS table for details.'
  );

Part 4: Understanding the Costs

Snowflake's data quality features are billed using serverless compute credits, separate from your virtual warehouse usage. This means you are only charged for the resources you consume when a measurement is taken.

The Cost Components

Measurement Cost: This is the primary cost. It's the serverless compute used to execute the SQL inside the Data Metric Function. A simple NULL_COUNT is very cheap, while a complex custom DMF that scans terabytes of data will cost more.
Scheduling Cost: A small overhead charge for the serverless resources used to schedule and trigger the runs. This is typically a very minor component of the total cost.

Snowflake provides a dedicated view to track these costs, allowing you to have full visibility and control.

-- Query to see daily credit consumption for data quality monitoring
SELECT
  TO_DATE(start_time) as usage_date,
  table_name,
  metric_name,
  SUM(credits_used) as total_credits_used
FROM SNOWFLAKE.ACCOUNT_USAGE.DATA_QUALITY_MONITORING_USAGE_HISTORY
WHERE start_time >= DATEADD(day, -30, CURRENT_TIMESTAMP())
GROUP BY 1, 2, 3
ORDER BY 1 DESC, 4 DESC;

Cost Optimization Tips:

* Be selective with TRIGGER_ON_CHANGES. Use it for critical tables, not for every table in your warehouse.

* For less critical tables, use a SCHEDULE that aligns with your business needs (e.g., daily instead of hourly).

* When writing custom DMFs, ensure the SQL is as optimized as possible to reduce its compute footprint.

Final Thoughts

Data quality is the foundation of data trust. By integrating Data Metric Functions and Expectations directly into the database, Snowflake elevates data quality from a periodic task to a continuous, platform-level responsibility.

This shift allows Analytics Engineers to build a robust "trust layer" that is declarative, automated, and serverless. By understanding the automation patterns and the transparent, usage-based cost model, you can now operationalize data quality at scale, ensuring the business can always rely on the insights you deliver.

The Trust Layer – Proactive Data Quality with Snowflake's Native Tools

What are Data Metric Functions (DMFs)?

Part1: The "How": A Practical Walkthrough

Step 1: The Setup

Step 2: Applying a System DMF

Step 3: Setting an Expectation

Step 4: Scheduling and Running the Checks

Step 5: Reviewing the Results

Part 2: Advanced Techniques with Custom DMFs

Step 2.1: Creating a Custom DMF for Email Validation

Step 2.2: Applying and Testing the Custom DMF

Step 2.3: Finding the "Bad Rows"

Part 3: Monitoring and Alerting on Failures

Part 4: Understanding the Costs

The Cost Components

Final Thoughts

Subscribe to my newsletter

Sriram Krishnan

Sriram Krishnan

The Trust Layer – Proactive Data Quality with Snowflake's Native Tools

What are Data Metric Functions (DMFs)?

Part1: The "How": A Practical Walkthrough

Step 1: The Setup

Step 2: Applying a System DMF

Step 3: Setting an Expectation

Step 4: Scheduling and Running the Checks

Step 5: Reviewing the Results

Part 2: Advanced Techniques with Custom DMFs

Step 2.1: Creating a Custom DMF for Email Validation

Step 2.2: Applying and Testing the Custom DMF

Step 2.3: Finding the "Bad Rows"

Part 3: Monitoring and Alerting on Failures

Part 4: Understanding the Costs

The Cost Components

How to Monitor Your Quality-Related Spend

Final Thoughts

Subscribe to my newsletter

Sriram Krishnan

Sriram Krishnan