Cost Optimization in Snowflake: Best Practices for Analytics Engineers

Sriram KrishnanSriram Krishnan
5 min read

As an Analytics Engineer, your primary goal is to transform raw data into reliable, performant, and trusted data products. But in the cloud, there's a crucial fourth dimension: cost-efficiency. An elegant dbt model that runs on an oversized warehouse or an inefficient query that burns credits unnecessarily is only doing half the job.

Understanding Snowflake's cost model isn't just for managers—it's a core competency for elite Analytics Engineers. It allows you to build data products that are not only fast but also financially sustainable. This guide breaks down Snowflake's cost structure into six actionable best practices you can apply every day.

Snowflake’s costs are broken down into three main pillars:

  1. Compute Cost: Billed for virtual warehouses and serverless features. This is your biggest area of opportunity.

  2. Storage Cost: Billed for the data you store (including Time Travel and Fail-safe).

  3. Cloud Services Cost: Billed for the coordination layer when it exceeds the daily free quota.

Let's focus on what you can control.


Best Practice 1: Isolate Your Workloads with Dedicated Warehouses

This is the most critical and foundational best practice. Do not use a single, large warehouse for all company operations. Isolating workloads into dedicated warehouses provides three key benefits:

  1. No Resource Contention: A long-running data load job won't slow down a time-sensitive executive dashboard.

  2. Tailored Sizing and Configuration: You can match the warehouse configuration to the specific needs of the job.

  3. Clear Cost Attribution: You can easily see how much each business process (loading, transformation, BI) is costing.

Your Action Plan: Create dedicated warehouses for each distinct workload, with settings tailored to that process.

Example Setup:

  • For Data Loading: A warehouse for COPY jobs. These are often short, intense workloads.

      CREATE OR REPLACE WAREHOUSE LOADING_WH
        WAREHOUSE_SIZE = 'MEDIUM'
        AUTO_SUSPEND = 60 -- Suspend after 1 minute of inactivity
        AUTO_RESUME = TRUE
        COMMENT = 'For loading raw data from stages';
    
  • For Transformations: A warehouse for dbt runs or other scheduled ETL/ELT jobs.

      CREATE OR REPLACE WAREHOUSE TRANSFORM_WH
        WAREHOUSE_SIZE = 'LARGE'
        AUTO_SUSPEND = 300 -- Suspend after 5 minutes
        AUTO_RESUME = TRUE
        STATEMENT_TIMEOUT_IN_SECONDS = 7200 -- Kill queries that run longer than 2 hours
        COMMENT = 'For running dbt models and other transformations';
    
  • For Business Intelligence: A warehouse for BI tools like Tableau or Looker. This workload is often characterized by many small, concurrent queries.

      CREATE OR REPLACE WAREHOUSE BI_WH
        WAREHOUSE_SIZE = 'XSMALL'
        MIN_CLUSTER_COUNT = 1
        MAX_CLUSTER_COUNT = 4 -- Enable multi-clustering to handle high user concurrency
        AUTO_SUSPEND = 600 -- Suspend after 10 minutes to serve cached results
        AUTO_RESUME = TRUE
        COMMENT = 'For BI tools and user-facing dashboards';
    

Best Practice 2: Right-Size Your Compute (Don't Just Scale Up)

Once your workloads are isolated, you can fine-tune the size of each warehouse. The key is to match the warehouse to the workload's specific needs.

  • Scaling Up (M → L → XL): Provides more memory and CPU for a single, complex query. Use this when you see queries spilling to disk.

  • Scaling Out (Multi-cluster): Provides more warehouses of the same size to handle more concurrent users or queries. This is ideal for the BI_WH.

Your Action Plan: Start small and only scale up a warehouse when you have diagnosed a specific performance issue like disk spilling.


Best Practice 3: Write Cost-Efficient Queries

Every query you write has a direct cost implication. An inefficient query scans more data than necessary, running for longer and burning more credits.

Common Anti-Patterns to Avoid:

  • SELECT * on Large Tables: This prevents column pruning and increases network traffic.

  • "Exploding" Joins: Review your join logic to ensure you aren't creating unintentional Cartesian products.

  • Filtering Too Late: Always filter your largest tables as early as possible in your query.

Scenario: The "Bad" vs. "Good" Transformation

The "Bad" Query (Filters after joining):

-- Inefficient: Joins the entire 10TB events table before filtering
SELECT e.event_id, u.user_name
FROM raw.events e
JOIN raw.users u ON e.user_id = u.user_id
WHERE DATE(e.event_timestamp) = '2025-09-01';

The "Good" Query (Filters early with a CTE):

-- Efficient: Filters the 10TB table down to a single day's data first
WITH events_today AS (
  SELECT event_id, user_id
  FROM raw.events
  WHERE DATE(event_timestamp) = '2025-09-01'
)
SELECT e.event_id, u.user_name
FROM events_today e
JOIN raw.users u ON e.user_id = u.user_id;

Best Practice 4: Optimize Data Structures

The way your data is modeled directly impacts query cost. A well-designed table allows Snowflake to scan less data.

  • Clustering: For very large tables (1TB+), adding a clustering key on a frequently filtered, low-cardinality column (like event_date) can dramatically improve pruning.

  • Use Appropriate Data Types: Storing numbers as VARCHAR or dates as STRING forces Snowflake to perform implicit casting, which can degrade performance.


Best Practice 5: Leverage Caching and Materialization Strategically

Not every query needs to run from scratch. Snowflake provides powerful caching layers that can eliminate compute costs.

  • The Result Cache: If you run the exact same query twice within 24 hours, Snowflake returns the result instantly, using zero compute credits.

  • Materialized Views: For predictable, expensive dashboard queries, a Materialized View is a strategic trade-off. You pay a small cost for storage and refresh compute in exchange for making a slow query nearly instantaneous.


Best Practice 6: Embrace a Culture of Cost-Awareness

You cannot optimize what you cannot see. As an Analytics Engineer, you should be empowered to monitor the cost impact of your own work.

Your Action Plan: Run this query to find your most expensive queries from the last week. This is your personal optimization list.

-- Find your most expensive queries
SELECT
  query_text,
  warehouse_name,
  total_elapsed_time / 1000 AS duration_sec,
  credits_used_cloud_services
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE user_name = CURRENT_USER()
  AND start_time >= DATEADD(day, -7, CURRENT_TIMESTAMP())
ORDER BY duration_sec DESC
LIMIT 10;

Final Thoughts

Cost optimization is not about saying "no" to new projects; it's about engineering excellence. By isolating workloads, right-sizing warehouses, writing efficient queries, and monitoring your own work, you transition from simply building data pipelines to delivering true business value. A cost-aware Analytics Engineer is an invaluable asset to any data team.

0
Subscribe to my newsletter

Read articles from Sriram Krishnan directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Sriram Krishnan
Sriram Krishnan

Sharing lessons, tools & patterns to build scalable, modern data platforms —one post at a time.