Stop Exposing Secrets: A Step-by-Step Guide to Securing Terraform State with AWS S3 and DynamoDB

Atif FarrukhAtif Farrukh
7 min read

Every DevOps engineer knows this story: You’re building infrastructure with Terraform. Things move fast. But then you realize your terraform.tfstate file—containing IAM ARNs, secrets, database connection strings, and even cloud resource IDs—is sitting unprotected in a Git repo or some shared drive. This isn’t just a rookie mistake. I’ve seen it at scale—startups and enterprises both—leading to critical data leaks and security breaches.

Managing infrastructure as code is a non-negotiable part of modern DevOps, but leaving your state management as an afterthought is professional negligence. When you work alone, maybe you can get away with it. When you work on a team, it’s the first thing that will grind your workflow to a halt, creating merge conflicts from hell and state files that are perpetually out of sync. Let's fix this, permanently. This is my blueprint for a secure, scalable, and collaborative Terraform backend using AWS.

Architecture Context

Before we write a single line of code, let's understand the strategy. We're not just moving a file; we're building a system. By default, Terraform stores its state file locally. This is untenable for any serious project for two reasons:

  1. Collaboration: Without a shared state file, your teammates can't see your infrastructure changes, leading to conflicts and resource duplication.

  2. Security: Local state files often contain sensitive data in plain text—database passwords, application keys, you name it.

Our architecture will solve both problems by using two core AWS services:

  • AWS S3 (Simple Storage Service): This will be the durable, centralized home for our terraform.tfstate file. We'll lock it down, enable versioning for rollback capabilities, and enforce encryption.

  • AWS DynamoDB: This will act as our locking mechanism. When one engineer runs terraform apply, DynamoDB puts a lock on the state file. If another engineer tries to run it simultaneously, they'll get a clear "State Locked" message. This prevents race conditions and state corruption, which can be catastrophic in complex environments.

Here's what it looks like at a high level:

                  +----------------------+
                  |   Engineer / CI/CD   |
                  +----------------------+
                             |
                             | User runs `terraform apply`
                             v
                  +----------------------+
                  |    Terraform CLI     |
                  +----------------------+
                             |
   +-------------------------------------------------------------------------+
   |                            AWS Cloud                                    |
   |                                                                         |
   |       +--------------------------------+--------------------------+     |
   |       |                                |                          |     |
   |       |  --- 1. Attempt Lock --------> +--------------------+     |     |
   |       | (Fails if already locked)      |   DynamoDB Table   |     |     |
   |       |                                |     (For Locks)    |     |     |
   |       |  <-- (Lock Acquired) --------- +--------------------+     |     |
   |       |                                          ^                |     |
   |       |                                          |                |     |
   |       |                                4. Release Lock            |     |
   |       |                                          |                |     |
   |       +<-----------------------------------------+----------------+     |
   |       |                                                           |     |
   |       |  --- 2. Read State File ------> +--------------------+    |     |
   |       |                                  |     S3 Bucket      |   |     |
   |       |  <-- (Returns terraform.tfstate) | (For State File)   |   |     |
   |       |                                 +--------------------+    |     |
   |       |                                          ^                |     |
   |       |                                          |                |     |
   |       |                                3. Write Updated State     |     |
   |       |                                          |                |     |
   +-------+------------------------------------------+----------------------+
           ^                                          |
           |                                          |
   (Terraform computes changes and applies them locally)

Workflow Explained:

  1. Attempt Lock: Before performing any action, Terraform sends a request to the DynamoDB table to acquire a lock. If another process already holds the lock, this step fails, preventing concurrent runs.

  2. Read State: Once the lock is acquired, Terraform reads the current terraform.tfstate file from the S3 Bucket.

  3. Write State: After successfully applying the changes, Terraform writes the new, updated state file back to the S3 Bucket.

  4. Release Lock: Finally, Terraform removes the lock entry from the DynamoDB table, allowing other engineers or processes to run Terraform

This setup is the industry standard for a reason: it's robust, secure, and built on battle-tested AWS services.

Implementation Details

Talk is cheap. Let's build it. We'll use Terraform to provision the very resources needed to manage our Terraform state. It's a bit meta, but it's the right way to ensure the entire setup is reproducible and managed as code from day one.

#### Step 1: Create the S3 Bucket for State Storage

First, we define the S3 bucket. This isn't just any bucket; it's a fortress. We're enabling server-side encryption by default, blocking all public access, and turning on versioning to facilitate recovery from unintended modifications.

Create a file named terraform_backend.tf:

Terraform

# terraform_backend.tf

resource "aws_s3_bucket" "terraform_state" {
  bucket = "devops-unlocked-tfstate-bucket" # Use a unique name!

  # Prevent accidental deletion of the state file bucket
  lifecycle {
    prevent_destroy = true
  }
}

resource "aws_s3_bucket_versioning" "terraform_state_versioning" {
  bucket = aws_s3_bucket.terraform_state.id
  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state_sse" {
  bucket = aws_s3_bucket.terraform_state.id

  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "AES256"
    }
  }
}

resource "aws_s3_bucket_public_access_block" "terraform_state_pab" {
  bucket                  = aws_s3_bucket.terraform_state.id
  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

Run terraform init and terraform apply to create these resources in a temporary, local state. We'll migrate this state in the final step.

#### Step 2: Create the DynamoDB Table for State Locking

Next, we provision the DynamoDB table. The only requirement for a Terraform lock table is a primary key named LockID of type String. We don't need any fancy provisioning; on-demand capacity is perfect for this use case.

Add this to your terraform_backend.tf file:

Terraform

# terraform_backend.tf (continued)

resource "aws_dynamodb_table" "terraform_locks" {
  name         = "devops-unlocked-terraform-locks"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }
}
💡
Architect's Note: Your Terraform state backend is a Tier 0, foundational service. It's the key to your entire kingdom. The IAM permissions you grant for accessing this S3 bucket and DynamoDB table should be the most scrutinized policies in your entire AWS organization. Create a dedicated IAM role with the principle of least privilege. It needs only s3:GetObject, s3:PutObject, and s3:DeleteObject on the state file objects, and dynamodb:GetItem, dynamodb:PutItem, and dynamodb:DeleteItem on the lock table. Nothing more. Do not give your CI/CD pipeline or engineers s3:* on this bucket. That's how state files get leaked.

Step 3: Configure the Terraform Backend

Now that our S3 bucket and DynamoDB table exist, we can tell our main Terraform project to use them. In your primary Terraform project (not the one we just used to create the backend), add the backend configuration block.

Create a file named backend.tf:

# backend.tf

terraform {
  backend "s3" {
    bucket         = "devops-unlocked-tfstate-bucket" # Must match the bucket name you created
    key            = "global/terraform.tfstate"       # The path to the state file within the bucket
    region         = "us-east-1"                      # The region where you created the resources
    dynamodb_table = "devops-unlocked-terraform-locks"
    encrypt        = true
  }
}

After adding this block, run terraform init. Terraform will detect the local state file and the new backend configuration. It will ask if you want to migrate your state to the new S3 backend. Type yes.

The fundamental setup is now complete. Your state is now stored securely in S3, with locking managed by DynamoDB.

Pitfalls & Optimisations

Getting the basics right is a huge win, but senior engineers think about failure modes and optimization. Here's where people get tripped up:

  • Forgetting prevent_destroy: I've seen a junior engineer accidentally run terraform destroy on the state management infrastructure itself. The prevent_destroy = true lifecycle block is a crucial but powerful guardrail.

  • Using a Single State File: The key in the backend configuration is your friend. Don't dump your entire organization's infrastructure into one massive state file. This is a performance and security nightmare. Break it down logically. A good pattern is key = "networking/vpc/terraform.tfstate" or key = "services/my-app/prod/terraform.tfstate". Use Terraform workspaces to manage environments (dev, staging, prod) within a single configuration.

  • IAM Misconfigurations: As mentioned in the Architect's Note, overly permissive IAM policies are the biggest threat. Audit these policies regularly. Assume the identity running Terraform could be compromised and limit the blast radius.

  • Cross-Region Disaster Recovery: For mission-critical systems, consider enabling S3 Cross-Region Replication on your state bucket. If your primary AWS region goes down, you have a read-only copy of your state file, which can be invaluable for recovery analysis.

Unlocked: Your Key Takeaways

  • Never Use Local State: Storing terraform.tfstate locally is insecure and prevents collaboration. It's a sign of an amateur setup.

  • S3 is for State, DynamoDB is for Locking: Use S3 for durable, encrypted storage of your state file. Use DynamoDB to prevent concurrent executions and state corruption.

  • Automate the Backend: Provision your backend resources using a separate, minimal Terraform project to ensure your entire setup is managed as code.

  • Lock Down IAM: Your state backend is your most sensitive piece of infrastructure. Apply the principle of least privilege with surgical precision to the IAM roles that can access it.

  • Structure Your State: Use a logical key structure and workspaces to avoid a monolithic state file. This improves performance, reduces blast radius, and makes your infrastructure easier to reason about.

Securing your Terraform state isn't just a best practice; it's a foundational requirement for building professional, production-grade infrastructure.

If your team is facing this challenge, I specialize in architecting these secure, audit-ready systems.

Email me for a strategic consultation: atif@devopsunlocked.dev

Explore my projects and connect on Upwork

0
Subscribe to my newsletter

Read articles from Atif Farrukh directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Atif Farrukh
Atif Farrukh