Setting Up Snowflake Iceberg Tables with External S3 Storage: A Complete Guide Learn how to configure Snowflake Iceberg table with your own S3 storage

Table of contents
- Setting Up Snowflake Iceberg Tables with External S3 Storage: A Complete Guide
- Overview
- Prerequisites
- Step 1: Create S3 Bucket
- Step 2: Create AWS IAM Policy
- Step 3: Create AWS IAM Role with Placeholder Trust Policy
- Step 4: Set Up Snowflake Environment
- Step 5: Create External Volume
- Step 6: Get Snowflake's Generated IAM Values
- Step 7: Update IAM Role Trust Policy
- Step 8: Test the Connection
- Step 9: Create Production Table
- Step 10: Verify S3 Structure
- Understanding Iceberg File Structure
- Key Benefits of This Setup
- Troubleshooting
- Next Steps
- Important Notes
- References

Setting Up Snowflake Iceberg Tables with External S3 Storage: A Complete Guide
Learn how to configure Snowflake Iceberg tables with your own S3 storage using a proven step-by-step approach that handles AWS IAM complexities correctly.
Overview
This guide walks you through setting up externally managed Iceberg tables in Snowflake, where your data is stored in your own S3 bucket while Snowflake manages the table catalog. This approach gives you data ownership, cost control, and backup flexibility.
Prerequisites
Snowflake account with
ACCOUNTADMIN
privilegesAWS account with S3 and IAM access
AWS CLI configured (optional but helpful)
Step 1: Create S3 Bucket
# Create S3 bucket (replace with your desired name)
aws s3 mb s3://{s3-bucket-name}
Step 2: Create AWS IAM Policy
Create an IAM policy that grants Snowflake the necessary S3 permissions:
AWS Console: IAM > Policies > Create Policy > JSON
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:ListBucket",
"s3:GetObjectVersion",
"s3:GetBucketLocation",
"s3:ListBucketVersions",
"s3:ListBucketMultipartUploads",
"s3:AbortMultipartUpload",
"s3:ListMultipartUploadParts"
],
"Resource": [
"arn:aws:s3:::{s3-bucket-name}",
"arn:aws:s3:::{s3-bucket-name}/*"
]
}
]
}
Save this policy as: SnowflakeIcebergS3Policy
Step 3: Create AWS IAM Role with Placeholder Trust Policy
Create an IAM role with a temporary trust policy that we'll update later:
AWS Console: IAM > Roles > Create Role > Another AWS Account
Account ID: Enter
{aws-account-id}
(your AWS account ID)External ID: Enter
000000
(placeholder)Attach Policy: Select
SnowflakeIcebergS3Policy
Role Name:
SnowflakeIcebergRole
Initial Trust Policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::{aws-account-id}:root"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "000000"
}
}
}
]
}
Record the Role ARN: arn:aws:iam::{aws-account-id}:role/SnowflakeIcebergRole
Step 4: Set Up Snowflake Environment
-- Create a database for our learning
CREATE DATABASE IF NOT EXISTS iceberg_learning;
USE DATABASE iceberg_learning;
-- Create a schema
CREATE SCHEMA IF NOT EXISTS sales_analytics;
USE SCHEMA sales_analytics;
-- Create a warehouse for compute
CREATE WAREHOUSE IF NOT EXISTS iceberg_wh
WITH
WAREHOUSE_SIZE = 'X-SMALL'
AUTO_SUSPEND = 300
AUTO_RESUME = TRUE;
USE WAREHOUSE iceberg_wh;
Step 5: Create External Volume
-- Create external volume with your real AWS account and role
CREATE OR REPLACE EXTERNAL VOLUME iceberg_volume
STORAGE_LOCATIONS = (
(
NAME = 'iceberg-s3-storage'
STORAGE_PROVIDER = 'S3'
STORAGE_BASE_URL = 's3://{s3-bucket-name}/iceberg-data/'
STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::{aws-account-id}:role/SnowflakeIcebergRole'
-- Don't specify STORAGE_AWS_EXTERNAL_ID - let Snowflake generate it
)
);
Step 6: Get Snowflake's Generated IAM Values
-- Get the values Snowflake generated
DESCRIBE EXTERNAL VOLUME iceberg_volume;
Look for these two critical values:
STORAGE_AWS_IAM_USER_ARN
:arn:aws:iam::SNOWFLAKE-ACCOUNT:user/snowflake-user-id
STORAGE_AWS_EXTERNAL_ID
:generated-external-id
Record both values - you'll need them for the next step.
Step 7: Update IAM Role Trust Policy
Update your IAM role trust policy with the actual Snowflake values from Step 6:
AWS Console: IAM > Roles > SnowflakeIcebergRole > Trust relationships > Edit trust policy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::SNOWFLAKE-ACCOUNT:user/snowflake-user-id"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "generated-external-id"
}
}
}
]
}
Replace the placeholder values with the actual values from Step 6.
Step 8: Test the Connection
-- Test external volume
DESC EXTERNAL VOLUME iceberg_volume;
-- Create test table
CREATE OR REPLACE ICEBERG TABLE test_s3_connection (
id INT,
name STRING,
created_at TIMESTAMP
)
CATALOG = 'SNOWFLAKE'
EXTERNAL_VOLUME = 'iceberg_volume'
BASE_LOCATION = 'test_connection/';
-- Insert test data
INSERT INTO test_s3_connection VALUES (1, 'Test Record', CURRENT_TIMESTAMP());
-- Verify it worked
SELECT * FROM test_s3_connection;
Step 9: Create Production Table
-- Create organized production table
CREATE ICEBERG TABLE sales_transactions (
transaction_id STRING,
customer_id STRING,
amount DECIMAL(10,2),
transaction_date DATE,
region STRING
)
CATALOG = 'SNOWFLAKE'
EXTERNAL_VOLUME = 'iceberg_volume'
BASE_LOCATION = 'production/sales/transactions/';
-- Insert sample record
INSERT INTO sales_transactions VALUES (
'txn_20240705_001',
'customer_12345',
599.99,
'2024-07-05',
'US_WEST'
);
-- Verify the data
SELECT * FROM sales_transactions;
Step 10: Verify S3 Structure
Check your S3 bucket to confirm Snowflake created the Iceberg table structure:
aws s3 ls s3://{s3-bucket-name}/iceberg-data/production/sales/ --recursive
You should see:
Understanding Iceberg File Structure
When you check your S3 bucket, you'll see a structured directory layout that Iceberg uses to manage your table data and metadata efficiently. Based on your actual S3 structure, here's what each file represents:
Complete S3 Directory Structure
s3://{bucket}/iceberg-data/production/sales/
└── transactions.WsRKitWc/
├── metadata/
│ ├── 00001-b34b6077-fd16-43ff-8a75-0a3f85c9214e.metadata.json
│ ├── 00000-5537d66f-70e1-411e-9d87-87311f7118a9.metadata.json
│ └── manifests/
│ ├── 1751741215139000000-gWlRpVzHt9A3EM3-Y5yWKg.avro
│ └── snap-1751741215139000000-930293cf-315c-4ad0-8531-71060b9df766.avro
└── data/
└── s5/snow_GyllAFnfFXc_gl70B05vTxg_0_1_002.parquet
Metadata Files (JSON) - 2 Files Visible
File 1: Current Metadata
00001-b34b6077-fd16-43ff-8a75-0a3f85c9214e.metadata.json
Created: After your INSERT operation
Contains: Updated table schema, current snapshot info, file locations
File 2: Initial Metadata
00000-5537d66f-70e1-411e-9d87-87311f7118a9.metadata.json
Created: When table was first created
Contains: Empty table state, initial schema definition
Manifest Files (Avro) - 2 Files Visible
File 1: Data File Manifest
1751741215139000000-gWlRpVzHt9A3EM3-Y5yWKg.avro
Created: With your INSERT operation
Contains: List of data files, row counts, column statistics for efficient querying
File 2: Snapshot Manifest
snap-1751741215139000000-930293cf-315c-4ad0-8531-71060b9df766.avro
Created: With your INSERT operation
Contains: Snapshot metadata linking to manifest files, commit information
Data Files (Parquet) - 1 File Visible
Actual Data File
snow_GyllAFnfFXc_gl70B05vTxg_0_1_002.parquet
Created: When you inserted your record
Contains: Your actual business data (1 transaction record) in columnar format
Location: Stored in
s5/
subdirectory for data organization
File Creation Timeline
Table Creation - Creates initial metadata JSON (
00000-5537d66f...
)Single INSERT - Creates data file (
snow_GyllAFnfFXc_gl70B05vTxg_0_1_002.parquet
)Metadata Update - Creates new metadata and manifest files to track the change
Why 5 Files for 1 Insert?
2 JSON files: Before and after states (enables time travel)
2 Avro files: Efficient binary metadata for query optimization
1 Parquet file: Your actual data (organized in
s5/
subdirectory)
This structure enables Iceberg's powerful features:
Time Travel: Query table as it existed before or after the INSERT
ACID Transactions: Atomic updates with consistent metadata
Query Optimization: Manifest files help skip irrelevant data during queries
Schema Evolution: Track changes over time without data rewrites
Each subsequent INSERT, UPDATE, or schema change will create additional metadata files while keeping your complete history accessible.
Key Benefits of This Setup
Data Ownership: Your data lives in your S3 bucket
Cost Control: S3 storage costs appear on your AWS bill
Backup Control: You manage data retention and disaster recovery
ACID Transactions: Full Iceberg capabilities with time travel and schema evolution
Performance: Optimized for analytical workloads
Troubleshooting
"Access Denied" Error: Verify the IAM role trust policy uses the exact Snowflake IAM user ARN and external ID from DESCIBE EXTERNAL VOLUME
.
"Role Not Found" Error: Confirm the role ARN in the external volume matches your actual IAM role.
"Bucket Not Found" Error: Verify S3 bucket name and ensure it exists in the correct region.
Next Steps
With your externally managed Iceberg tables set up, you can now for your specific use case:
Build time travel queries and schema evolution
Scale to production workloads with full data ownership
Important Notes
Replace
{aws-account-id}
with your 12-digit AWS account IDReplace
{s3-bucket-name}
with your actual S3 bucket nameThe placeholder trust policy approach is crucial - Snowflake generates unique values that must be used in the final configuration
Snowflake automatically adds random suffixes to
BASE_LOCATION
paths for uniqueness
This approach ensures a secure, scalable setup where you maintain full control over your data while leveraging Snowflake's powerful Iceberg table capabilities.
References
Configure an external volume for Amazon S3 | Snowflake Documentation
Subscribe to my newsletter
Read articles from Chandrasekar(Chan) Rajaram directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
