Setting Up Snowflake Iceberg Tables with External S3 Storage: A Complete Guide Learn how to configure Snowflake Iceberg table with your own S3 storage

Setting Up Snowflake Iceberg Tables with External S3 Storage: A Complete Guide

Learn how to configure Snowflake Iceberg tables with your own S3 storage using a proven step-by-step approach that handles AWS IAM complexities correctly.

Overview

This guide walks you through setting up externally managed Iceberg tables in Snowflake, where your data is stored in your own S3 bucket while Snowflake manages the table catalog. This approach gives you data ownership, cost control, and backup flexibility.

Prerequisites

  • Snowflake account with ACCOUNTADMIN privileges

  • AWS account with S3 and IAM access

  • AWS CLI configured (optional but helpful)

Step 1: Create S3 Bucket

# Create S3 bucket (replace with your desired name)
aws s3 mb s3://{s3-bucket-name}

Step 2: Create AWS IAM Policy

Create an IAM policy that grants Snowflake the necessary S3 permissions:

AWS Console: IAM > Policies > Create Policy > JSON

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject",
                "s3:ListBucket",
                "s3:GetObjectVersion",
                "s3:GetBucketLocation",
                "s3:ListBucketVersions",
                "s3:ListBucketMultipartUploads",
                "s3:AbortMultipartUpload",
                "s3:ListMultipartUploadParts"
            ],
            "Resource": [
                "arn:aws:s3:::{s3-bucket-name}",
                "arn:aws:s3:::{s3-bucket-name}/*"
            ]
        }
    ]
}

Save this policy as: SnowflakeIcebergS3Policy

Step 3: Create AWS IAM Role with Placeholder Trust Policy

Create an IAM role with a temporary trust policy that we'll update later:

AWS Console: IAM > Roles > Create Role > Another AWS Account

  1. Account ID: Enter {aws-account-id} (your AWS account ID)

  2. External ID: Enter 000000 (placeholder)

  3. Attach Policy: Select SnowflakeIcebergS3Policy

  4. Role Name: SnowflakeIcebergRole

Initial Trust Policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::{aws-account-id}:root"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "sts:ExternalId": "000000"
                }
            }
        }
    ]
}

Record the Role ARN: arn:aws:iam::{aws-account-id}:role/SnowflakeIcebergRole

Step 4: Set Up Snowflake Environment

-- Create a database for our learning
CREATE DATABASE IF NOT EXISTS iceberg_learning;
USE DATABASE iceberg_learning;

-- Create a schema
CREATE SCHEMA IF NOT EXISTS sales_analytics;
USE SCHEMA sales_analytics;

-- Create a warehouse for compute
CREATE WAREHOUSE IF NOT EXISTS iceberg_wh
WITH 
  WAREHOUSE_SIZE = 'X-SMALL'
  AUTO_SUSPEND = 300
  AUTO_RESUME = TRUE;

USE WAREHOUSE iceberg_wh;

Step 5: Create External Volume

-- Create external volume with your real AWS account and role
CREATE OR REPLACE EXTERNAL VOLUME iceberg_volume
STORAGE_LOCATIONS = (
    (
        NAME = 'iceberg-s3-storage'
        STORAGE_PROVIDER = 'S3'
        STORAGE_BASE_URL = 's3://{s3-bucket-name}/iceberg-data/'
        STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::{aws-account-id}:role/SnowflakeIcebergRole'
        -- Don't specify STORAGE_AWS_EXTERNAL_ID - let Snowflake generate it
    )
);

Step 6: Get Snowflake's Generated IAM Values

-- Get the values Snowflake generated
DESCRIBE EXTERNAL VOLUME iceberg_volume;

Look for these two critical values:

  • STORAGE_AWS_IAM_USER_ARN: arn:aws:iam::SNOWFLAKE-ACCOUNT:user/snowflake-user-id

  • STORAGE_AWS_EXTERNAL_ID: generated-external-id

Record both values - you'll need them for the next step.

Step 7: Update IAM Role Trust Policy

Update your IAM role trust policy with the actual Snowflake values from Step 6:

AWS Console: IAM > Roles > SnowflakeIcebergRole > Trust relationships > Edit trust policy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::SNOWFLAKE-ACCOUNT:user/snowflake-user-id"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "sts:ExternalId": "generated-external-id"
                }
            }
        }
    ]
}

Replace the placeholder values with the actual values from Step 6.

Step 8: Test the Connection

-- Test external volume
DESC EXTERNAL VOLUME iceberg_volume;

-- Create test table
CREATE OR REPLACE ICEBERG TABLE test_s3_connection (
    id INT,
    name STRING,
    created_at TIMESTAMP
)
CATALOG = 'SNOWFLAKE'
EXTERNAL_VOLUME = 'iceberg_volume'
BASE_LOCATION = 'test_connection/';

-- Insert test data
INSERT INTO test_s3_connection VALUES (1, 'Test Record', CURRENT_TIMESTAMP());

-- Verify it worked
SELECT * FROM test_s3_connection;

Step 9: Create Production Table

-- Create organized production table
CREATE ICEBERG TABLE sales_transactions (
    transaction_id STRING,
    customer_id STRING,
    amount DECIMAL(10,2),
    transaction_date DATE,
    region STRING
)
CATALOG = 'SNOWFLAKE'
EXTERNAL_VOLUME = 'iceberg_volume'
BASE_LOCATION = 'production/sales/transactions/';

-- Insert sample record
INSERT INTO sales_transactions VALUES (
    'txn_20240705_001',
    'customer_12345', 
    599.99,
    '2024-07-05',
    'US_WEST'
);

-- Verify the data
SELECT * FROM sales_transactions;

Step 10: Verify S3 Structure

Check your S3 bucket to confirm Snowflake created the Iceberg table structure:

aws s3 ls s3://{s3-bucket-name}/iceberg-data/production/sales/ --recursive

You should see:

Understanding Iceberg File Structure

When you check your S3 bucket, you'll see a structured directory layout that Iceberg uses to manage your table data and metadata efficiently. Based on your actual S3 structure, here's what each file represents:

Complete S3 Directory Structure

s3://{bucket}/iceberg-data/production/sales/
└── transactions.WsRKitWc/
    ├── metadata/
    │   ├── 00001-b34b6077-fd16-43ff-8a75-0a3f85c9214e.metadata.json
    │   ├── 00000-5537d66f-70e1-411e-9d87-87311f7118a9.metadata.json
    │   └── manifests/
    │       ├── 1751741215139000000-gWlRpVzHt9A3EM3-Y5yWKg.avro
    │       └── snap-1751741215139000000-930293cf-315c-4ad0-8531-71060b9df766.avro
    └── data/
        └── s5/snow_GyllAFnfFXc_gl70B05vTxg_0_1_002.parquet

Metadata Files (JSON) - 2 Files Visible

File 1: Current Metadata

  • 00001-b34b6077-fd16-43ff-8a75-0a3f85c9214e.metadata.json

  • Created: After your INSERT operation

  • Contains: Updated table schema, current snapshot info, file locations

File 2: Initial Metadata

  • 00000-5537d66f-70e1-411e-9d87-87311f7118a9.metadata.json

  • Created: When table was first created

  • Contains: Empty table state, initial schema definition

Manifest Files (Avro) - 2 Files Visible

File 1: Data File Manifest

  • 1751741215139000000-gWlRpVzHt9A3EM3-Y5yWKg.avro

  • Created: With your INSERT operation

  • Contains: List of data files, row counts, column statistics for efficient querying

File 2: Snapshot Manifest

  • snap-1751741215139000000-930293cf-315c-4ad0-8531-71060b9df766.avro

  • Created: With your INSERT operation

  • Contains: Snapshot metadata linking to manifest files, commit information

Data Files (Parquet) - 1 File Visible

Actual Data File

  • snow_GyllAFnfFXc_gl70B05vTxg_0_1_002.parquet

  • Created: When you inserted your record

  • Contains: Your actual business data (1 transaction record) in columnar format

  • Location: Stored in s5/ subdirectory for data organization

File Creation Timeline

  1. Table Creation - Creates initial metadata JSON (00000-5537d66f...)

  2. Single INSERT - Creates data file (snow_GyllAFnfFXc_gl70B05vTxg_0_1_002.parquet)

  3. Metadata Update - Creates new metadata and manifest files to track the change

Why 5 Files for 1 Insert?

  • 2 JSON files: Before and after states (enables time travel)

  • 2 Avro files: Efficient binary metadata for query optimization

  • 1 Parquet file: Your actual data (organized in s5/ subdirectory)

This structure enables Iceberg's powerful features:

  • Time Travel: Query table as it existed before or after the INSERT

  • ACID Transactions: Atomic updates with consistent metadata

  • Query Optimization: Manifest files help skip irrelevant data during queries

  • Schema Evolution: Track changes over time without data rewrites

Each subsequent INSERT, UPDATE, or schema change will create additional metadata files while keeping your complete history accessible.

Key Benefits of This Setup

  • Data Ownership: Your data lives in your S3 bucket

  • Cost Control: S3 storage costs appear on your AWS bill

  • Backup Control: You manage data retention and disaster recovery

  • ACID Transactions: Full Iceberg capabilities with time travel and schema evolution

  • Performance: Optimized for analytical workloads

Troubleshooting

"Access Denied" Error: Verify the IAM role trust policy uses the exact Snowflake IAM user ARN and external ID from DESCIBE EXTERNAL VOLUME.

"Role Not Found" Error: Confirm the role ARN in the external volume matches your actual IAM role.

"Bucket Not Found" Error: Verify S3 bucket name and ensure it exists in the correct region.

Next Steps

With your externally managed Iceberg tables set up, you can now for your specific use case:

  • Build time travel queries and schema evolution

  • Scale to production workloads with full data ownership

Important Notes

  • Replace {aws-account-id} with your 12-digit AWS account ID

  • Replace {s3-bucket-name} with your actual S3 bucket name

  • The placeholder trust policy approach is crucial - Snowflake generates unique values that must be used in the final configuration

  • Snowflake automatically adds random suffixes to BASE_LOCATION paths for uniqueness

This approach ensures a secure, scalable setup where you maintain full control over your data while leveraging Snowflake's powerful Iceberg table capabilities.

References

Configure an external volume for Amazon S3 | Snowflake Documentation

0
Subscribe to my newsletter

Read articles from Chandrasekar(Chan) Rajaram directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Chandrasekar(Chan) Rajaram
Chandrasekar(Chan) Rajaram