Setting Up Snowflake Iceberg Tables with External S3 Storage: A Complete Guide

Learn how to configure Snowflake Iceberg tables with your own S3 storage using a proven step-by-step approach that handles AWS IAM complexities correctly.

Overview

This guide walks you through setting up externally managed Iceberg tables in Snowflake, where your data is stored in your own S3 bucket while Snowflake manages the table catalog. This approach gives you data ownership, cost control, and backup flexibility.

Prerequisites

Snowflake account with ACCOUNTADMIN privileges
AWS account with S3 and IAM access
AWS CLI configured (optional but helpful)

Step 1: Create S3 Bucket

# Create S3 bucket (replace with your desired name)
aws s3 mb s3://{s3-bucket-name}

Step 2: Create AWS IAM Policy

Create an IAM policy that grants Snowflake the necessary S3 permissions:

AWS Console: IAM > Policies > Create Policy > JSON

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject",
                "s3:ListBucket",
                "s3:GetObjectVersion",
                "s3:GetBucketLocation",
                "s3:ListBucketVersions",
                "s3:ListBucketMultipartUploads",
                "s3:AbortMultipartUpload",
                "s3:ListMultipartUploadParts"
            ],
            "Resource": [
                "arn:aws:s3:::{s3-bucket-name}",
                "arn:aws:s3:::{s3-bucket-name}/*"
            ]
        }
    ]
}

Save this policy as: SnowflakeIcebergS3Policy

Step 3: Create AWS IAM Role with Placeholder Trust Policy

Create an IAM role with a temporary trust policy that we'll update later:

AWS Console: IAM > Roles > Create Role > Another AWS Account

Account ID: Enter {aws-account-id} (your AWS account ID)
External ID: Enter 000000 (placeholder)
Attach Policy: Select SnowflakeIcebergS3Policy
Role Name: SnowflakeIcebergRole

Initial Trust Policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::{aws-account-id}:root"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "sts:ExternalId": "000000"
                }
            }
        }
    ]
}

Record the Role ARN: arn:aws:iam::{aws-account-id}:role/SnowflakeIcebergRole

Step 4: Set Up Snowflake Environment

-- Create a database for our learning
CREATE DATABASE IF NOT EXISTS iceberg_learning;
USE DATABASE iceberg_learning;

-- Create a schema
CREATE SCHEMA IF NOT EXISTS sales_analytics;
USE SCHEMA sales_analytics;

-- Create a warehouse for compute
CREATE WAREHOUSE IF NOT EXISTS iceberg_wh
WITH 
  WAREHOUSE_SIZE = 'X-SMALL'
  AUTO_SUSPEND = 300
  AUTO_RESUME = TRUE;

USE WAREHOUSE iceberg_wh;

Step 5: Create External Volume

-- Create external volume with your real AWS account and role
CREATE OR REPLACE EXTERNAL VOLUME iceberg_volume
STORAGE_LOCATIONS = (
    (
        NAME = 'iceberg-s3-storage'
        STORAGE_PROVIDER = 'S3'
        STORAGE_BASE_URL = 's3://{s3-bucket-name}/iceberg-data/'
        STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::{aws-account-id}:role/SnowflakeIcebergRole'
        -- Don't specify STORAGE_AWS_EXTERNAL_ID - let Snowflake generate it
    )
);

Step 6: Get Snowflake's Generated IAM Values

-- Get the values Snowflake generated
DESCRIBE EXTERNAL VOLUME iceberg_volume;

Look for these two critical values:

STORAGE_AWS_IAM_USER_ARN: arn:aws:iam::SNOWFLAKE-ACCOUNT:user/snowflake-user-id
STORAGE_AWS_EXTERNAL_ID: generated-external-id

Record both values - you'll need them for the next step.

Step 7: Update IAM Role Trust Policy

Update your IAM role trust policy with the actual Snowflake values from Step 6:

AWS Console: IAM > Roles > SnowflakeIcebergRole > Trust relationships > Edit trust policy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::SNOWFLAKE-ACCOUNT:user/snowflake-user-id"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "sts:ExternalId": "generated-external-id"
                }
            }
        }
    ]
}

Replace the placeholder values with the actual values from Step 6.

Step 8: Test the Connection

-- Test external volume
DESC EXTERNAL VOLUME iceberg_volume;

-- Create test table
CREATE OR REPLACE ICEBERG TABLE test_s3_connection (
    id INT,
    name STRING,
    created_at TIMESTAMP
)
CATALOG = 'SNOWFLAKE'
EXTERNAL_VOLUME = 'iceberg_volume'
BASE_LOCATION = 'test_connection/';

-- Insert test data
INSERT INTO test_s3_connection VALUES (1, 'Test Record', CURRENT_TIMESTAMP());

-- Verify it worked
SELECT * FROM test_s3_connection;

Step 9: Create Production Table

-- Create organized production table
CREATE ICEBERG TABLE sales_transactions (
    transaction_id STRING,
    customer_id STRING,
    amount DECIMAL(10,2),
    transaction_date DATE,
    region STRING
)
CATALOG = 'SNOWFLAKE'
EXTERNAL_VOLUME = 'iceberg_volume'
BASE_LOCATION = 'production/sales/transactions/';

-- Insert sample record
INSERT INTO sales_transactions VALUES (
    'txn_20240705_001',
    'customer_12345', 
    599.99,
    '2024-07-05',
    'US_WEST'
);

-- Verify the data
SELECT * FROM sales_transactions;

Step 10: Verify S3 Structure

Check your S3 bucket to confirm Snowflake created the Iceberg table structure:

aws s3 ls s3://{s3-bucket-name}/iceberg-data/production/sales/ --recursive

You should see:

Understanding Iceberg File Structure

When you check your S3 bucket, you'll see a structured directory layout that Iceberg uses to manage your table data and metadata efficiently. Based on your actual S3 structure, here's what each file represents:

Complete S3 Directory Structure

s3://{bucket}/iceberg-data/production/sales/
└── transactions.WsRKitWc/
    ├── metadata/
    │   ├── 00001-b34b6077-fd16-43ff-8a75-0a3f85c9214e.metadata.json
    │   ├── 00000-5537d66f-70e1-411e-9d87-87311f7118a9.metadata.json
    │   └── manifests/
    │       ├── 1751741215139000000-gWlRpVzHt9A3EM3-Y5yWKg.avro
    │       └── snap-1751741215139000000-930293cf-315c-4ad0-8531-71060b9df766.avro
    └── data/
        └── s5/snow_GyllAFnfFXc_gl70B05vTxg_0_1_002.parquet

Metadata Files (JSON) - 2 Files Visible

File 1: Current Metadata

00001-b34b6077-fd16-43ff-8a75-0a3f85c9214e.metadata.json
Created: After your INSERT operation
Contains: Updated table schema, current snapshot info, file locations

File 2: Initial Metadata

00000-5537d66f-70e1-411e-9d87-87311f7118a9.metadata.json
Created: When table was first created
Contains: Empty table state, initial schema definition

Manifest Files (Avro) - 2 Files Visible

File 1: Data File Manifest

1751741215139000000-gWlRpVzHt9A3EM3-Y5yWKg.avro
Created: With your INSERT operation
Contains: List of data files, row counts, column statistics for efficient querying

File 2: Snapshot Manifest

snap-1751741215139000000-930293cf-315c-4ad0-8531-71060b9df766.avro
Created: With your INSERT operation
Contains: Snapshot metadata linking to manifest files, commit information

Data Files (Parquet) - 1 File Visible

Actual Data File

snow_GyllAFnfFXc_gl70B05vTxg_0_1_002.parquet
Created: When you inserted your record
Contains: Your actual business data (1 transaction record) in columnar format
Location: Stored in s5/ subdirectory for data organization

File Creation Timeline

Table Creation - Creates initial metadata JSON (00000-5537d66f...)
Single INSERT - Creates data file (snow_GyllAFnfFXc_gl70B05vTxg_0_1_002.parquet)
Metadata Update - Creates new metadata and manifest files to track the change

Why 5 Files for 1 Insert?

2 JSON files: Before and after states (enables time travel)
2 Avro files: Efficient binary metadata for query optimization
1 Parquet file: Your actual data (organized in s5/ subdirectory)

This structure enables Iceberg's powerful features:

Time Travel: Query table as it existed before or after the INSERT
ACID Transactions: Atomic updates with consistent metadata
Query Optimization: Manifest files help skip irrelevant data during queries
Schema Evolution: Track changes over time without data rewrites

Each subsequent INSERT, UPDATE, or schema change will create additional metadata files while keeping your complete history accessible.

Key Benefits of This Setup

Data Ownership: Your data lives in your S3 bucket
Cost Control: S3 storage costs appear on your AWS bill
Backup Control: You manage data retention and disaster recovery
ACID Transactions: Full Iceberg capabilities with time travel and schema evolution
Performance: Optimized for analytical workloads

Troubleshooting

"Access Denied" Error: Verify the IAM role trust policy uses the exact Snowflake IAM user ARN and external ID from DESCIBE EXTERNAL VOLUME.

"Role Not Found" Error: Confirm the role ARN in the external volume matches your actual IAM role.

"Bucket Not Found" Error: Verify S3 bucket name and ensure it exists in the correct region.

Next Steps

With your externally managed Iceberg tables set up, you can now for your specific use case:

Build time travel queries and schema evolution
Scale to production workloads with full data ownership

Important Notes

Replace {aws-account-id} with your 12-digit AWS account ID
Replace {s3-bucket-name} with your actual S3 bucket name
The placeholder trust policy approach is crucial - Snowflake generates unique values that must be used in the final configuration
Snowflake automatically adds random suffixes to BASE_LOCATION paths for uniqueness

This approach ensures a secure, scalable setup where you maintain full control over your data while leveraging Snowflake's powerful Iceberg table capabilities.

References

Configure an external volume for Amazon S3 | Snowflake Documentation

Setting Up Snowflake Iceberg Tables with External S3 Storage: A Complete Guide Learn how to configure Snowflake Iceberg table with your own S3 storage

Table of contents