Ways of Loading Data in Snowflake

Abhishek KumarAbhishek Kumar
8 min read

Snowflake offers multiple ways to load data depending on your use case, data size, and the tools you’re comfortable with.

1️⃣ Loading Data from Snowflake Marketplace

The Snowflake Marketplace provides free and paid datasets from various providers. You can directly access and query these datasets without needing to download them.

Steps:

  1. Log in to your Snowflake Web UI*.*

  2. Click Marketplace in the navigation bar.

  3. Browse or search for the dataset.

  4. Click Get Data → Select Database Name Create Database from Listing*.*

  5. Query the data directly.

2️⃣ Loading Data Using Snowflake’s Web UI

The Web UI provides a point-and-click interface to upload local files into tables.

Steps:

  1. In Snowflake Web UI*, go to the **Database → Choose Table**.*

  2. Click Load Data*.*

  3. Choose File (CSV, JSON, Parquet, etc.).

  4. Select File Format or create a new one.

  5. Map columns → Load.

Example (via SQL equivalent):

-- Create a file format
CREATE OR REPLACE FILE FORMAT my_csv_format
  TYPE = 'CSV'
  SKIP_HEADER = 1
  FIELD_OPTIONALLY_ENCLOSED_BY = '"';

-- Create stage (internal)
CREATE OR REPLACE STAGE my_stage;

-- Upload via UI (equivalent to PUT command in CLI)

-- Load into table
COPY INTO my_table
FROM @my_stage
FILE_FORMAT = (FORMAT_NAME = my_csv_format)
ON_ERROR = 'CONTINUE';

3️⃣ Loading Data Using the COPY INTO Command

The COPY INTO command is Snowflake’s most powerful way to load bulk data from internal or external stages.

The COPY INTO command is a powerful tool in Snowflake for bulk loading data from staged files into tables. Here's a brief cheat sheet focusing on JSON, CSV, and TXT files, with concise SQL examples:

1. Creating a file format

Define how Snowflake interprets your file’s structure. You can either create a named file format once or embed the format options directly in your COPY INTO statement.

-- Creating a named file format (recommended for reuse)

--for csv
CREATE OR REPLACE FILE FORMAT my_csv_format
  TYPE = CSV  -- Type of the file (CSV, JSON, etc.)
  FIELD_DELIMITER = ',' -- Delimiter between fields (e.g., comma, tab, pipe)
  SKIP_HEADER = 1 -- Number of header rows to skip
  NULL_IF = ('NULL', '\\N') -- Strings to interpret as NULL
  EMPTY_FIELD_AS_NULL = TRUE; -- Treat empty fields as NULL

--for json
CREATE OR REPLACE FILE FORMAT my_json_format
  TYPE = JSON
  STRIP_OUTER_ARRAY = TRUE; -- Strip the outer array if your JSON is an array of objects || remove the outermost square brackets [] of a JSON array during the loading process.

--for txt file formats
CREATE OR REPLACE FILE FORMAT my_txt_format
  TYPE = CSV -- TXT files are often treated as CSV with a specific delimiter
  FIELD_DELIMITER = '|' --  Delimiter for fields in your TXT file
  SKIP_HEADER = 0; -- Assuming no header row in this TXT file
  • Without STRIP_OUTER_ARRAY = TRUE: The entire file contents would be loaded as a single row into a VARIANT column in your table.

  • With STRIP_OUTER_ARRAY = TRUE: Each JSON object (e.g., {"id": 1, "name": "Alice"}) would be loaded as a separate row in the table.

2. Staging your files

Before loading, place your data files in either an internal or external stage.

Snowflake supports two primary types of stages:

  • Internal Stages: These store data files within Snowflake’s managed storage infrastructure. Internal stages are fully managed by Snowflake, ensuring high performance, security, and simplified management.

  • External Stages: These store data files in an external cloud storage location (e.g., AWS S3, Google Cloud Storage, or Azure Blob Storage) that is referenced by Snowflake. External stages offer flexibility, cost-effectiveness, and seamless integration with other cloud services

-> CREATING AN INTERNAL NAMED STAGE

-- Creating an internal named stage
CREATE OR REPLACE STAGE my_internal_stage;
  • CREATE OR REPLACE STAGE: This statement creates a new stage or replaces an existing one with the same name.

  • my_internal_stage: This is the name given to the stage, making it easy to refer to it in other commands.

-> UPLOADING FILES TO STAGE USING PUT COMMAND

Once a stage is created, you need to upload your data files to it. The PUT command is used for this purpose.

PUT file:///local/path/to/mydata.csv @my_internal_stage/csv_files/;
PUT file:///local/path/to/mydata.json @my_internal_stage/json_files/;
PUT file:///local/path/to/mydata.txt @my_internal_stage/txt_files/;
  • PUT: Uploads one or more data files from a local file system onto an internal stage.

  • file:///local/path/to/mydata.csv: This specifies the absolute path to the local data file being uploaded. The path format varies depending on your operating system (e.g., Windows vs. Linux/macOS).

  • @my_internal_stage/csv_files/: This indicates the target location within the internal stage. The @ symbol precedes the stage name. You can also specify a subfolder (e.g., /csv_files/) within the stage to organize your files.

Important Notes:

  • The PUT command is primarily used for uploading files to internal stages from your local file system.

  • When working with external stages, use the utilities provided by your cloud service provider (e.g., AWS S3 console, Azure Storage Explorer, or Google Cloud Console) to upload files to the external location.

3. Loading data using COPY INTO

[a] CSV FILES

COPY INTO my_table_csv  -- Target table for the loaded data
FROM @my_internal_stage/csv_files/ -- Source stage and path of the data file(s)
FILE_FORMAT = my_csv_format     --  Using a named file format
ON_ERROR = 'CONTINUE'           -- How to handle errors encountered during the load operation
PURGE = TRUE;                   -- Deletes the files from the stage after a successful load

[b] JSON FILES

-- Loading JSON into a single VARIANT column (for semi-structured data)
COPY INTO my_table_json
FROM @my_internal_stage/json_files/
FILE_FORMAT = my_json_format
ON_ERROR = 'SKIP_FILE'; -- Skips the entire file if an error is encountered

-- Loading and flattening JSON data into separate columns (using a SELECT statement for transformation)
COPY INTO my_table_json_flattened (
    id INT,
    name STRING,
    city STRING
)
FROM (
    SELECT
        json_data:id::INT, -- Accessing JSON fields using colon notation
        json_data:name::STRING,
        json_data:address.city::STRING
    FROM @my_internal_stage/json_files/json_data.json (FILE_FORMAT => my_json_format) -- In-line file format definition
);

[c] TXT FILES

COPY INTO my_table_txt
FROM @my_internal_stage/txt_files/
FILE_FORMAT = my_txt_format
TRUNCATECOLUMNS = TRUE; -- Truncate strings that exceed target column length

Note: For unstructured TXT files, where fields aren’t consistently delimited, the entire file might be loaded into a single VARCHAR column. Snowflake's string manipulation and regular expression functions can then be used to extract and transform the relevant data.

4️⃣Loading Data Using Snowflake CLI (SnowSQL)

The Snowflake Command Line Interface (CLI), known as SnowSQL, provides a powerful and programmatic way to interact with your Snowflake environment, including bulk loading data.

1. Prerequisites

  • SnowSQL Installation: Download and install the SnowSQL client on your local machine.

  • SnowSQL Configuration: Configure SnowSQL to connect to your Snowflake account. This typically involves specifying the account ID, region, username, and password in the configuration file (~/.snowsql/config) or as command-line flags.

  • Data File(s) in a Compatible Format: Ensure your data is in a supported format like CSV, JSON, TXT, Avro, ORC, or Parquet.

  • Warehouse and Database/Schema: A running virtual warehouse and a target database/schema must exist in Snowflake.

  • Permissions: You need the necessary privileges for the target table and stage.

  • Stage Creation (if using internal stages): An internal or external stage needs to be set up to act as an intermediate storage location for your data files.

Steps:

1. Install SnowSQL.

2. Connect to Snowflake:

snowsql -a <account_name> -u <username>

3. Create a stage (if needed):

CREATE OR REPLACE STAGE my_stage;

4. Upload file from local to stage:

PUT file://C:/data/myfile.csv @my_stage;

5. Create file format:

CREATE OR REPLACE FILE FORMAT my_csv_format
  TYPE = 'CSV'
  SKIP_HEADER = 1;

6. Load data:

COPY INTO my_table
FROM @my_stage
FILE_FORMAT = (FORMAT_NAME = my_csv_format)
ON_ERROR = 'CONTINUE';

Important considerations and best practices

  • Error handling: The ON_ERROR option in the COPY INTO statement is important for managing how Snowflake responds to errors during data loading. Possible values include CONTINUE, SKIP_FILE, SKIP_FILE_num, 'SKIP_FILE_num%', and ABORT_STATEMENT.

  • Data transformation: Using a SELECT statement within the FROM clause of the COPY INTO statement allows basic data transformations, such as column reordering, omission, and type casting, during the load operation.

  • Semi-structured data: Snowflake’s VARIANT data type provides native support for handling JSON and other semi-structured data. JSON can be directly loaded into a VARIANT column, and then functions like FLATTEN and colon notation (json_data:field_name) can be used to extract specific values.

  • Performance optimization: For large data volumes, leverage file compression, partition your data, and consider using external stages. Ensure proper virtual warehouse sizing for efficient parallel loading of multiple files.

  • Monitoring and logging: Use Snowflake’s query history and logs to monitor the progress and status of your COPY INTO commands and identify potential issues or bottlenecks.

By adhering to these best practices and tailoring COPY INTO statements and file formats to your specific data and requirements, you can optimize the efficiency, performance, and accuracy of your data loading processes in Snowflake.

Example — Loading a json data through copy into command. refer Section 3


//Create an Ingestion Table for JSON Data
create table library_card_catalog.public.author_ingest_json
(
  raw_author variant
);

//Create File Format for JSON Data 
create file format library_card_catalog.public.json_file_format
type = 'JSON' 
compression = 'AUTO' 
enable_octal = FALSE
allow_duplicate = FALSE 
strip_outer_array = TRUE
strip_null_values = FALSE 
ignore_utf8_errors = FALSE;  

--stage your file in util_db.public.my_internal_stage through either UI or SQL Script

//Load data
copy into library_card_catalog.public.author_ingest_json
from @util_db.public.my_internal_stage
files = ( 'author_with_header.json')
file_format = ( format_name=library_card_catalog.public.json_file_format);

//returns AUTHOR_UID value from top-level objects attribute
select raw_author:AUTHOR_UID
from author_ingest_json;

//returns the data in a way that makes it look like a normalized table
SELECT 
 raw_author:AUTHOR_UID
,raw_author:FIRST_NAME::STRING as FIRST_NAME
,raw_author:MIDDLE_NAME::STRING as MIDDLE_NAME
,raw_author:LAST_NAME::STRING as LAST_NAME
FROM AUTHOR_INGEST_JSON;

Example — Loading a .csv file using copy into command

create or replace table vegetable_details_soil_type
( plant_name varchar(25)
 ,soil_type number(1,0)
);

create file format garden_plants.veggies.PIPECOLSEP_ONEHEADROW 
    type = 'CSV'--csv is used for any flat file (tsv, pipe-separated, etc)
    field_delimiter = '|' --pipes as column separators
    skip_header = 1 --one header row to skip
    ;

--stage your file in util_db.public.my_internal_stage through either UI or SQL Script

//Load data
copy into vegetable_details_soil_type
from @util_db.public.my_internal_stage
files = ( 'VEG_NAME_TO_SOIL_TYPE_PIPE.txt')
file_format = ( format_name=GARDEN_PLANTS.VEGGIES.PIPECOLSEP_ONEHEADROW );

Example — Loading a .txt file through copy into command

create or replace table vegetable_details_soil_type
( plant_name varchar(25)
 ,soil_type number(1,0)
);

create file format garden_plants.veggies.PIPECOLSEP_ONEHEADROW 
    type = 'txt'--csv is used for any flat file (tsv, pipe-separated, etc)
    field_delimiter = '|' --pipes as column separators
    skip_header = 1 --one header row to skip
    ;

--stage your file in util_db.public.my_internal_stage through either UI or SQL Script

//Load data
copy into vegetable_details_soil_type
from @util_db.public.my_internal_stage
files = ( 'VEG_NAME_TO_SOIL_TYPE_PIPE.txt')
file_format = ( format_name=GARDEN_PLANTS.VEGGIES.PIPECOLSEP_ONEHEADROW );
10
Subscribe to my newsletter

Read articles from Abhishek Kumar directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Abhishek Kumar
Abhishek Kumar

I am a versatile full-stack developer with expertise in both modern and traditional web technologies. My skill set encompasses the MERN (MongoDB, Express.js, React.js, Node.js) stack, enabling me to build scalable and efficient web applications with ease. Additionally, I have extensive experience in PHP, allowing me to tackle a wide range of projects and integrate legacy systems seamlessly. With a passion for problem-solving and a keen eye for detail, I strive to deliver high-quality solutions that exceed expectations. My dedication to staying updated with the latest industry trends and best practices ensures that my work is always cutting-edge and future-proof.