When working with Terraform in a collaborative or production environment, managing the state file correctly is crucial. A widely adopted solution is to store the Terraform state remotely using AWS S3, and use DynamoDB for state locking. This setup ensures that multiple people or automation tools do not modify the same state simultaneously, which can lead to conflicts or corruption.

However, even with best practices in place, issues can arise especially related to stale locks in DynamoDB.

What Is State Locking in Terraform?

Terraform uses a mechanism called state locking to prevent concurrent operations on the same infrastructure. When someone applies a change, Terraform creates a lock entry (usually in a DynamoDB table). This lock ensures that no one else can make changes to the same state until the process is complete.

Once the operation finishes, Terraform automatically removes the lock.

The Problem: ConditionalCheckFailedException

Sometimes, you might encounter an error indicating a lock acquisition failure, often shown as a ConditionalCheckFailedException. This happens when:

A Terraform operation was interrupted (like closing a terminal or unexpected system crash).
A process holding the lock failed to release it.
The DynamoDB lock table still has an entry, even though no operation is in progress.

This is commonly referred to as a stale lock.

Why It Happens

Terraform does not always get a chance to clean up the lock if something goes wrong. When this occurs, the lock remains in the DynamoDB table, and any new attempts to use the state file will be blocked, even if no one is actively modifying it.

How It’s Resolved (Theoretically)

To resolve this issue:

The lock status must be checked in DynamoDB.
If confirmed that no active process is holding the lock, the stale lock entry should be removed manually or via automation.
After cleanup, Terraform will be able to acquire the lock again and continue operations as expected.

It’s important that such manual intervention is done with caution, as deleting a valid lock could result in multiple people making conflicting changes at the same time.

Best Practices to Avoid This

Always allow Terraform operations to finish gracefully.
Use retry logic or tools to detect and clear stale locks automatically if safe to do so.
Communicate with your team when working on shared infrastructure states.
Periodically audit your DynamoDB lock table to ensure old locks don’t block progress.

Final Thoughts

Terraform state management is powerful but sensitive. While remote backends with S3 and DynamoDB offer reliability and safety, issues like stale locks are still possible. Understanding how the locking mechanism works helps you resolve issues confidently without affecting the integrity of your infrastructure.

By learning to recognize and handle these scenarios, you’re not just troubleshooting — you’re building a more resilient and collaborative DevOps practice.

Understanding Terraform State Locking with S3 and DynamoDB: A Common Issue Explained