The Trust Layer – Proactive Data Quality with Snowflake's Native Tools

In our previous blogs, we mastered the art of building sophisticated data pipelines with dbt inside Snowflake. We can now transform raw data into well-structured, production-grade tables. We see the green "PASS" in our dbt logs, and we deploy our models on a schedule. Our job is done, right?
Not quite. A successful dbt build guarantees that our code ran correctly, but it doesn't guarantee that the data itself is trustworthy. What if an upstream API change starts sending NULL values for order_total? What if a bug in a source system generates duplicate order_ids? These are not code failures; they are data failures, and they silently erode the most valuable asset we have as a data team: trust.
Until now, data quality was often a periodic check run within a dbt job or handled by a separate, third-party tool. With the introduction of its native Data Quality Monitoring features, Snowflake is aiming to make data quality a continuous, declarative, and integral part of the data platform itself.
In this blog, we will explore this new paradigm, learning how to use Data Metric Functions (DMFs) and Expectations to build a robust, automated trust layer directly on top of our data.
What are Data Metric Functions (DMFs)?
A Data Metric Function is a pre-built or user-defined SQL function that measures a specific aspect of data quality. Think of them as specialized tools for asking questions about your data's health. Snowflake provides a powerful set of system DMFs out of the box, including:
SNOWFLAKE.CORE.NULL_COUNT: Counts the number of null values in a column.
SNOWFLAKE.CORE.UNIQUE_COUNT: Counts the number of distinct values.
SNOWFLAKE.CORE.ROW_COUNT: Measures the total number of rows.
SNOWFLAKE.CORE.FRESHNESS: Measures the time elapsed since the table was last updated.
The true power of this framework is its extensibility. If a system DMF doesn't fit your needs, you can write your own Custom DMF to encapsulate any business-specific quality check you can imagine.
Part1: The "How": A Practical Walkthrough
Let's put this into practice. We'll monitor a raw_orders table to ensure its integrity.
Step 1: The Setup
First, let's create a sample table to work with.
USE ROLE DBT_ROLE;
USE WAREHOUSE DBT_WH;
USE DATABASE RAW;
USE SCHEMA PUBLIC;
CREATE OR REPLACE TABLE raw_orders (
order_id INT,
customer_email STRING,
order_total NUMBER(10, 2),
order_date DATE
);
INSERT INTO raw_orders VALUES
(1, 'michael.scott@dundermifflin.com', 150.25, '2026-01-10'),
(2, 'dwight.schrute@dundermifflin.com', 25.00, '2026-01-11'),
(NULL, 'sriram.krishnan@aol', 99.99, '2026-01-11'); -- A row with a null order_id & invalid Customer_email
Step 2: Applying a System DMF
Let's measure the number of nulls in the order_id column. We can call the DMF directly to get an immediate measurement.
SELECT SNOWFLAKE.CORE.NULL_COUNT(
SELECT order_id FROM raw_orders
);
-- Result will be a JSON object: { "MEASUREMENT": 1 }
Step 3: Setting an Expectation
A metric is just a number. An Expectation is what makes it actionable. It defines the acceptable range for a metric's value. Let's set an expectation that the order_id column should have zero nulls.
CALL SYSTEM$SET_EXPECTATION(
'raw_orders', -- The object I'm attaching this to
'expect_no_null_order_ids', -- The name of my rule
'SNOWFLAKE.CORE.NULL_COUNT', -- What to measure
'order_id', -- Which column
{},
'equals 0' -- The condition for success
);
Step 4: Scheduling and Running the Checks
We can trigger our checks manually or, more powerfully, schedule them to run automatically.
-- Manually trigger all metrics on the table
EXECUTE IMMEDIATE 'CALL SNOWFLAKE.DATA_QUALITY.RUN_ALL_METRICS_FOR_TABLE(''raw_orders'')';
-- or Trigger on changes
ALTER TABLE raw_orders SET DATA_METRIC_SCHEDULE = 'TRIGGER_ON_CHANGES';
-- Or, create a Snowflake Task to run the checks daily
CREATE OR REPLACE TASK daily_quality_checks
WAREHOUSE = DBT_WH
SCHEDULE = 'USING CRON 0 8 * * * UTC'
AS
CALL SNOWFLAKE.DATA_QUALITY.RUN_ALL_METRICS_FOR_TABLE('raw_orders');
Step 5: Reviewing the Results
Snowflake logs the results of every check in a centralized, queryable history.
SELECT
metric_name,
measurement_time,
status, -- FAILED or PASSED
measurement_value
FROM SNOWFLAKE.DATA_QUALITY_MONITORING_RESULTS.METRIC_RESULTS
WHERE table_name = 'RAW_ORDERS'
ORDER BY measurement_time DESC
LIMIT 1;
-- Result:
-- | METRIC_NAME | MEASUREMENT_TIME | STATUS | MEASUREMENT_VALUE |
-- |-----------------------------|------------------|--------|-------------------|
-- | expect_no_null_order_ids | ... | FAILED | 1 |
We can see that our check failed because it found one null value, proving our monitoring is working.
Part 2: Advanced Techniques with Custom DMFs
What if you need a business-specific check that doesn't exist out of the box? This is where Custom Data Metric Functions shine. They allow you to encapsulate any business logic you can express in SQL into a reusable, schedulable quality check.
Step 2.1: Creating a Custom DMF for Email Validation
System DMFs are great for technical checks, but they don't understand our business rules. For example, they can't tell if a string is a valid email address. Let's build a custom DMF for that.
-- This function will count the number of rows where the email format is invalid.
CREATE OR REPLACE DATA_METRIC_FUNCTION raw.public.invalid_email_count(ARG_TABLE_NAME STRING, ARG_COL_NAME STRING)
RETURNS NUMBER
AS
$$
SELECT COUNT(*)
FROM TABLE(ARG_TABLE_NAME)
WHERE NOT REGEXP_LIKE(ARG_COL_NAME, '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}$')
$$;
Step 2.2: Applying and Testing the Custom DMF
Now that our custom DMF is created, we can apply it to our raw_orders
table with an expectation, just like a system DMF.
-- Set an expectation that the number of invalid emails should be zero.
CALL SYSTEM$SET_EXPECTATION(
'raw_orders',
'expect_valid_customer_emails',
'raw.public.invalid_email_count', -- Use our custom DMF
'customer_email', -- The column to pass to the DMF
{},
'equals 0'
);
From this point on, Snowflake treats our custom function as a first-class citizen in the quality framework. It can be scheduled, monitored, and alerted on just like any other metric.
Step 2.3: Finding the "Bad Rows"
Knowing that a check failed is good; knowing which rows caused the failure is better. We can use our custom DMF with the TABLE() keyword to return the actual failing records, making them immediately actionable.
-- Return the actual rows with an invalid email format
SELECT *
FROM TABLE(
raw.public.invalid_email_count(
(SELECT * FROM raw_orders),
'customer_email'
)
);
-- Result:
-- | ORDER_ID | CUSTOMER_EMAIL | ORDER_TOTAL |
-- |----------|----------------|-------------|
-- | 3 | invalid-email | 99.99 |
Part 3: Monitoring and Alerting on Failures
Automation is useless without alerting. You can build a simple alerting system by creating a Snowflake Alert that queries the results table and sends an email on failure.
-- Create a Snowflake Alert that checks for new failures every hour
CREATE OR REPLACE ALERT quality_check_failure_alert
WAREHOUSE = DBT_WH
SCHEDULE = '60 MINUTE'
IF (EXISTS (
SELECT 1
FROM SNOWFLAKE.DATA_QUALITY_MONITORING_RESULTS.METRIC_RESULTS
WHERE status = 'FAILED'
AND measurement_time > DATEADD(hour, -1, CURRENT_TIMESTAMP())
))
THEN
CALL SYSTEM$SEND_EMAIL(
'my_email_integration',
'data.team@example.com',
'Data Quality Alert: A check has failed!',
'A data quality check has failed in the last hour. Please check the METRIC_RESULTS table for details.'
);
Part 4: Understanding the Costs
Snowflake's data quality features are billed using serverless compute credits, separate from your virtual warehouse usage. This means you are only charged for the resources you consume when a measurement is taken.
The Cost Components
Measurement Cost: This is the primary cost. It's the serverless compute used to execute the SQL inside the Data Metric Function. A simple NULL_COUNT is very cheap, while a complex custom DMF that scans terabytes of data will cost more.
Scheduling Cost: A small overhead charge for the serverless resources used to schedule and trigger the runs. This is typically a very minor component of the total cost.
How to Monitor Your Quality-Related Spend
Snowflake provides a dedicated view to track these costs, allowing you to have full visibility and control.
-- Query to see daily credit consumption for data quality monitoring
SELECT
TO_DATE(start_time) as usage_date,
table_name,
metric_name,
SUM(credits_used) as total_credits_used
FROM SNOWFLAKE.ACCOUNT_USAGE.DATA_QUALITY_MONITORING_USAGE_HISTORY
WHERE start_time >= DATEADD(day, -30, CURRENT_TIMESTAMP())
GROUP BY 1, 2, 3
ORDER BY 1 DESC, 4 DESC;
Cost Optimization Tips:
* Be selective with TRIGGER_ON_CHANGES
. Use it for critical tables, not for every table in your warehouse.
* For less critical tables, use a SCHEDULE
that aligns with your business needs (e.g., daily instead of hourly).
* When writing custom DMFs, ensure the SQL is as optimized as possible to reduce its compute footprint.
Final Thoughts
Data quality is the foundation of data trust. By integrating Data Metric Functions and Expectations directly into the database, Snowflake elevates data quality from a periodic task to a continuous, platform-level responsibility.
This shift allows Analytics Engineers to build a robust "trust layer" that is declarative, automated, and serverless. By understanding the automation patterns and the transparent, usage-based cost model, you can now operationalize data quality at scale, ensuring the business can always rely on the insights you deliver.
Subscribe to my newsletter
Read articles from Sriram Krishnan directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Sriram Krishnan
Sriram Krishnan
Sharing lessons, tools & patterns to build scalable, modern data platforms —one post at a time.