How to Set Up dbt Locally and Integrate It with Snowflake

VipinVipin
3 min read

Introduction

In today’s data-driven world, the ability to efficiently manage and transform data is crucial. This is where dbt (Data Build Tool) comes in—a powerful tool that enables data teams to transform raw data into clean, documented datasets. In this guide, we'll walk through how to set up dbt on your local machine and integrate it with Snowflake, a leading cloud-based data warehouse.

By the end of this tutorial, you'll have a local dbt setup that connects seamlessly with Snowflake, allowing you to manage your analytics workflows efficiently.

Prerequisites

Before we dive in, make sure you have the following tools installed:

Setting Up the Environment

  • Confirm pip3 and python3 are correctly installed.

    python3 --version
    pip3 --version

  • Install Python virtual environment

    pip3 install virtualenv

  • Create a folder where you what to setup dbt project.

  • Create virtual environment and activate it

    virtualenv my_dbt_venv

    source my_dbt_venv/bin/activate

  • Install dbt: Use pip to install dbt along with the Snowflake adapter:

      pip install dbt-core dbt-snowflake
    
  • Create .dbt folder in the home directory.

  • Create a New dbt Project: Run the following command to initialize a new dbt project:

  • Open the newly created project in VSCODE.

  • Ensure .dbt/profiles.yml updated with configuration value.

  • Activate the virtual environment and test the connection using dbt debug

    command.

  • Running Your First dbt Model

    Creating a Simple Model

    • Create a SQL Model: Inside the models/ directory, create a new file named example_model.sql:

        WITH order_distribution
             AS (SELECT pin_code,
                        Count(*) AS NUMBER_OF_ORDERS
                 FROM   snowflake_dw_demo.online_food_order.online_food_order
                 GROUP  BY pin_code)
        SELECT pin_code,
               number_of_orders,
               Row_number()
                 OVER (
                   ORDER BY number_of_orders DESC) AS RANK
        FROM   order_distribution
        ORDER  BY number_of_orders DESC
      

      This SQL query calculates Distribution of Orders by Pin Code.

Running dbt Commands

  • Run the Model: Execute the following command to run your dbt model:

      dbt run
    

  • This command compiles the SQL file and runs it against your Snowflake database. Observe that a new view is created with the same name of the model.

  • Verify the object in snowflake.

  • Note: The default materialization for dbt model is view. This can be configured for the table as well with {{ config(materialized='table') }} option.

  • Troubleshooting Common Issues

    Connection Errors

    1. Invalid Credentials: Ensure that your Snowflake username and password are correct and that the account field is correctly formatted.

    2. Permission Denied: Make sure the Snowflake user has the necessary permissions to access the warehouse and database.

Model Run Errors

  1. SQL Errors: If you encounter SQL errors, check your model’s SQL syntax and the structure of the source tables.

  2. Data Quality Issues: Ensure that the source data adheres to the expected schema and data types.

Conclusion

In this tutorial, we walked through setting up dbt locally and integrating it with Snowflake. We covered the initial setup, configuring the profiles.yml file, creating and running dbt models, and troubleshooting common issues. With this setup, you can now efficiently manage and transform your data in Snowflake using dbt.

Next Steps

  • Explore more advanced dbt features, such as macros, snapshots, and incremental models.

  • Consider integrating dbt with other data platforms or using it in a CI/CD pipeline.

Resources and Further Reading

0
Subscribe to my newsletter

Read articles from Vipin directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Vipin
Vipin