Amazon S3 on AWS: Your Scalable, Durable Object Storage

saumya singhsaumya singh
6 min read

If you build on AWS, you’ll touch S3—whether it’s static websites, data lakes, ML pipelines, or backups. This guide gives you a clear overview, hands-on setup, and best practices you can put to work today.


What is Amazon S3?

Amazon Simple Storage Service (S3) is object storage for any amount of data with 99.999999999% (11 nines) durability. You store data as objects inside buckets and access them using keys (paths). S3 scales automatically, supports event-driven workflows, and integrates with almost every AWS service.


Where S3 Fits: Quick Architecture Diagram

flowchart LR
    subgraph Client
      A[App / Browser / CLI]
    end

    subgraph S3[Amazon S3]
      B[(Bucket)]
      C[Object<br/>(Key + Metadata + Data)]
      D[Bucket Policy / ACL]
      E[Lifecycle Rules]
      F[Event Notifications]
    end

    subgraph Security[Security & Access]
      G[IAM Policy]
      H[KMS CMK for SSE-KMS]
      I[Block Public Access]
    end

    subgraph Integrations[Downstream]
      J[Lambda]
      K[SQS]
      L[EventBridge]
      M[Glue / Athena]
      N[CloudFront]
    end

    A -- HTTPS / SDK / REST --> B
    B --> C
    G --> B
    D --> B
    I --> B
    H --> B
    E --> B
    F --> J
    F --> K
    F --> L
    B --> M
    B --> N

Key Concepts (fast!)

  • Bucket: Top-level container (globally unique name, region-scoped).

  • Object: Data + metadata (up to 5TB per object).

  • Prefix: “Folder-like” key path (e.g., images/2025/), useful for organizing and performance.

  • Versioning: Keep multiple versions, protect against accidental deletes/overwrites.

  • Encryption: SSE-S3 (default in most regions), SSE-KMS (with your CMK), or client-side.

  • Access: Prefer IAM policies & bucket policies; avoid ACLs unless you really need them.

  • Eventing: Trigger Lambda/SQS/EventBridge on PUT/DELETE to build pipelines.


Create Your First Bucket (Console & CLI)

Console (quick)

  1. S3 → Create bucket

  2. Name: my-portfolio-assets-<random> (must be globally unique)

  3. Region: Choose where your users or compute live.

  4. Block Public Access: Keep ON (recommended). For static sites, you’ll use CloudFront.

  5. Bucket Versioning: Enable (recommended).

  6. Default encryption: SSE-S3 or SSE-KMS.

  7. Create!

CLI (scriptable)

# 1) Create bucket (replace region & name)
aws s3api create-bucket \
  --bucket my-portfolio-assets-12345 \
  --region ap-south-1 \
  --create-bucket-configuration LocationConstraint=ap-south-1

# 2) Enable versioning
aws s3api put-bucket-versioning \
  --bucket my-portfolio-assets-12345 \
  --versioning-configuration Status=Enabled

# 3) Default encryption (SSE-S3)
aws s3api put-bucket-encryption \
  --bucket my-portfolio-assets-12345 \
  --server-side-encryption-configuration '{
     "Rules":[{"ApplyServerSideEncryptionByDefault":{"SSEAlgorithm":"AES256"}}]
  }'

# 4) Upload a file
aws s3 cp ./logo.png s3://my-portfolio-assets-12345/images/logo.png

# 5) Sync a folder
aws s3 sync ./public s3://my-portfolio-assets-12345/site/

Storage Classes (pick the right tier)

  • S3 Standard: Hot, frequent access, low latency.

  • S3 Intelligent-Tiering: Auto-moves objects across Frequent/ Infrequent/ Archive tiers; great when access patterns change.

  • S3 Standard-IA / One Zone-IA: Infrequent access; lower cost, retrieval charged; One Zone trades availability zone redundancy for lower price.

  • S3 Glacier Instant Retrieval: Archival with ms retrieval.

  • S3 Glacier Flexible Retrieval: Minutes to hours, cheaper.

  • S3 Glacier Deep Archive: Lowest cost; hours retrieval.

  • S3 Express One Zone: High-throughput, single-AZ, low-latency for data-intense apps.

Rule of thumb: If you’re not sure, start with Intelligent-Tiering for organic cost optimization.


Security & Access: Non-negotiables

  • Block Public Access: Keep ON globally unless you’re fronting S3 with CloudFront OAC (origin access control).

  • IAM first: Use IAM roles and bucket policies; avoid static access keys.

  • Encryption by default: SSE-S3 or SSE-KMS. For audit/compliance, prefer SSE-KMS.

  • Least privilege: Scope permissions to bucket/prefix and allowed actions only.

  • Object Ownership: Set to Bucket owner enforced to disable ACLs and avoid cross-account confusion.

  • Logging: Enable S3 server access logs or CloudTrail data events for auditing.

Sample restrictive bucket policy (read/write only from a specific IAM role):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "OnlyAllowSpecificRole",
      "Effect": "Allow",
      "Principal": {"AWS": "arn:aws:iam::111122223333:role/AppRole"},
      "Action": ["s3:GetObject","s3:PutObject","s3:ListBucket"],
      "Resource": [
        "arn:aws:s3:::my-portfolio-assets-12345",
        "arn:aws:s3:::my-portfolio-assets-12345/*"
      ]
    }
  ]
}

Lifecycle, Versioning & Replication

Lifecycle rules help you balance cost and retention:

  • Transition logs/* to Intelligent-Tiering after 30 days.

  • Transition to Glacier Deep Archive after 180 days.

  • Expire delete markers and incomplete multi-part uploads.

Example lifecycle JSON:

{
  "Rules": [
    {
      "ID": "logs-tiering",
      "Status": "Enabled",
      "Filter": {"Prefix": "logs/"},
      "Transitions": [{"Days": 30, "StorageClass": "INTELLIGENT_TIERING"}],
      "AbortIncompleteMultipartUpload": {"DaysAfterInitiation": 7}
    }
  ]
}

Replication (CRR/SRR) for compliance and DR:

  • Turn on versioning on source & destination.

  • Use KMS multi-Region keys if encrypting with SSE-KMS.

  • Add Replica Modifications if you want metadata/ACL changes replicated too.


Build Patterns You’ll Reuse

1) Static website hosting (the modern, secure way)

  • Keep bucket private.

  • Put your site in s3://my-bucket/site/.

  • Front with CloudFront + Origin Access Control (OAC).

  • Add S3 Bucket Policy allowing only CloudFront OAC.

  • Result: HTTPS, caching, and no public S3 URLs.

2) Data lake landing zone

  • Land raw data in raw/, curated in curated/, analytics in analytics/.

  • Trigger Lambda on PUT events to validate/transform.

  • Query with Athena/Glue without moving data.

3) Backup and archival

  • Use S3 Lifecycle to transition to Glacier tiers.

  • Enable Object Lock (WORM) for compliance if required.


Performance Tips

  • Use prefixes to increase request parallelism (S3 auto-scales, but prefixes help with downstream systems).

  • Multipart uploads for objects >100MB improves throughput.

  • Prefer S3 Transfer Acceleration for global uploads with high latency.

  • For analytics, store data in columnar formats (Parquet/ORC) and partitioned paths by date.


Cost Essentials

  • You pay for: storage, requests, data transfer out, and management features (e.g., Inventory, Replication).

  • Reduce cost with: Intelligent-Tiering, Lifecycle, CloudFront (cache hits), and avoiding unnecessary cross-region traffic.

  • Use S3 Storage Lens and Cost Explorer to monitor and optimize.


Common Pitfalls (and fixes)

  • Public data leak: Turn on Block Public Access and review policies; use CloudFront OAC instead of public buckets.

  • Access denied with KMS: Add both S3 and KMS permissions (key policy + IAM policy).

  • Unexpected costs: Large numbers of small files or frequent LIST/GET—consolidate, cache, and use Lifecycle.

  • “Missing” objects after overwrite/delete: Enable Versioning and use List Object Versions to recover.


Quick Checklist (copy/paste for new projects)

  • Bucket created in the right region, with a unique name

  • Versioning ON

  • Default encryption ON (SSE-KMS if compliance)

  • Block Public Access ON (use CloudFront OAC if web)

  • Object Ownership: Bucket owner enforced

  • Lifecycle rules set (tiering/expiry)

  • Event notifications wired (Lambda/SQS/EventBridge)

  • Logging/Audit enabled (S3 logs or CloudTrail data events)

  • Tags added for cost tracking


FAQs

Q: Should I use folders?
A: S3 has prefixes, not real folders—but using folder/like/keys/ is great for organization and lifecycle rules.

Q: Static website: public bucket or CloudFront?
A: CloudFront with OAC. It’s more secure, faster, and easier to add HTTPS and custom domains.

Q: Which storage class by default?
A: Intelligent-Tiering is a safe, cost-aware default for unpredictable access.


Wrap-up

S3 is the backbone of many AWS architectures: simple to start, powerful as you grow. With versioning, encryption, lifecycle rules, and the CloudFront pattern, you can run production-grade storage confidently.

10
Subscribe to my newsletter

Read articles from saumya singh directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

saumya singh
saumya singh

Welcome to my corner of the cloud, where ideas scale faster than servers and downtime is not an option! Here, I write about everything from spinning up VPCs to tearing down myths about the cloud. Whether you’re an engineer, a curious learner, or someone who just likes seeing words like 'serverless' and 'auto-scaling,' you’re in the right place. Consider this blog your high-availability zone for tips, tutorials, and tech thoughts—delivered with 99.99% uptime .