Picture this: You're deploying a production-ready EKS cluster for a high-security organization using Terraform. Everything is going smoothly. Your VPC is being created, subnets are spinning up, and then... your internet connection drops.

What happens next is a perfect storm of Terraform state management issues that every infrastructure engineer will face at some point.

The Perfect Storm

When network connectivity fails during a⁣,terraform apply several things go wrong simultaneously:

State Lock Limbo: Terraform can't release the DynamoDB state lock, leaving it orphaned
State Upload Failure: Your S3 backend becomes unreachable, so state changes can't be persisted
Resource Creation Interruption: AWS resources may be partially created but not tracked properly
Local State Backup: Terraform saves an errored.tfstate file as a safety net

Here's what the error looked like:

Error: Failed to save state

│ 
│ Error saving state: failed to upload state: operation error S3: PutObject,
│ https response error StatusCode: 0, RequestID: , HostID: , request send
│ failed, Put
│ "https://prod-infra-tf-state-2025.s3.us-west-2.amazonaws.com/main-infra/terraform.tfstate?x-id=PutObject":
│ dial tcp: lookup
│ prod-infra-tf-state-2025.s3.us-west-2.amazonaws.com: no such host

The Recovery Process

The beauty of Terraform's design shines in moments like these. Here's the systematic recovery:

Step 1: Force Unlock the State

terraform force-unlock xxx8b0e7e1-4f7c-4cxx-9c2a-xxx71a6a9b1c

The lock ID is conveniently displayed in the error message. This releases the orphaned lock from DynamoDB.

Step 2: Recover the Local State

terraform state push errored.tfstate

This pushes the locally saved state back to your S3 backend, ensuring no infrastructure changes are lost.

Step 3: Assess and Continue

terraform plan
terraform apply

Check what actually got created and complete the deployment.

Lessons Learned

Terraform's Resilience: The tool gracefully handles network failures by saving state locally
State Lock Transparency: Error messages provide all the information needed for recovery
Remote State Benefits: Using S3 + DynamoDB backends provides robust state management even during failures
Infrastructure as Code Reliability: Well-designed IaC can recover from unexpected interruptions

Prevention Tips

Use stable internet connections for critical deployments
Consider running Terraform from cloud instances with reliable connectivity
Monitor deployment progress and be prepared for recovery procedures
Always use remote state backends for production workloads

Conclusion

Network failures during infrastructure deployments are inevitable, but Terraform's state management system makes recovery straightforward. What could have been a disaster—losing track of partially created AWS resources—becomes a minor inconvenience with the right recovery steps.

The production EKS cluster deployment continued successfully after this hiccup, proving that robust tooling and systematic recovery processes are essential for production infrastructure management.

This incident occurred during the deployment of a production EKS cluster with VPC, managed node groups, and security configurations using Terraform modules.

When the Internet Dies During Terraform Apply: A Recovery Story