Terraform State: Why I had to break up with my monolith (and how you can do it too!)

Jan TymińskiJan Tymiński
7 min read

Let’s talk about something every DevOps engineer eventually faces - the moment you realize your Terraform state file has become a monster.
You know, that creeping feeling when terraform plan takes longer than your coffee to brew, and running terraform apply feels like launching a rocket - except you’re not sure where it’ll land.

I’ve been there. Early on, I loved the simplicity of a single state file.
One place to rule them all! But as my infrastructure grew, so did the headaches.
Here’s why I (and probably you, too) needed to split up that Terraform state - whether you’re still running everything in one file, or you’ve already split things up and are wondering if you need to go further.

The monolith: Why one state file feels good… Until it doesn’t

At first, having everything in one state file feels like freedom.
All your resources, all your environments, one command to manage them all.
But then:

  • Performance Tanks: Suddenly, terraform plan is slow.
    So slow, you start checking your WiFi.
    But it’s not your connection - it’s the state file, bloated with hundreds (or thousands) of resources.

  • The “Oh Crap” Moment: Ever accidentally deleted a production resource while working on dev?
    With one state file, the blast radius of a mistake is the whole infrastructure.
    Definitely not fun.

  • Team Gridlock: Terraform locks the state file during changes.
    So if Victoria is updating a subnet and Leon is tweaking a security group, someone’s waiting.
    And waiting.
    And waiting…
    And… You get the point ;)

  • Secrets Everywhere: That one state file?
    It’s got all your secrets.
    Anyone who can read it can see everything.
    Not ideal when you want to keep prod secrets, well, secret.

  • Environments Collide: Mixing dev, staging, and prod in one state is a recipe for disaster.
    One wrong move, and your “test” change hits prod.

So, what do you do?
You start splitting.

Already split? Here’s why you might need to split again

Maybe you’ve already broken things up - a state file for networking, another for compute, maybe one per environment.
But the pain isn’t gone:

  • Still Too Big: Even after splitting, some state files keep growing.
    That “networking” state now covers VPCs, subnets, gateways, peering, and more.
    Time to split again - maybe VPC in one, subnets in another.

  • Team Ownership: As teams grow, so do ownership boundaries.
    The DB team doesn’t want to wait for the network team to finish their changes.
    Or maybe you have split teams and responsibilities?
    Give them their own state, let them move fast.

  • Parallel Deployments: CI/CD pipelines are happiest when they can run in parallel.
    Multiple state files mean multiple pipelines, all running at once, no waiting.

  • Refactoring Time: Infrastructure evolves.
    Maybe you want to turn that old-school monolith into shiny new modules.
    Moving resources into their own state files makes this possible (and safe).

  • Compliance & Auditing: Sometimes, it’s not about what you want - it’s what the auditors want.
    Need to separate PCI or GDPR workloads?
    Split the state.

  • Complex Dependencies: When you’re referencing resources across modules, things get messy.
    Splitting state helps keep dependencies clear and manageable.

My rule of thumb

Start simple.
But as soon as you feel the pain - slow plans, team friction, scary blast radius - split.
And don’t be afraid to split again as your infra grows.
It’s not a sign of bad design - it’s a sign that you’re scaling.
In Poland we say: “Kto to Panu tak spierdolił?”
Don’t think like that…
Something that may seem as poorly designed, worked in past and delivered a value - it just doesn’t necessarily do anymore and it’s time for a refactor.

If you’re using Terragrunt (like I sometimes do), it makes this even easier.
Each module, each environment, its own state.
Clean, fast, and safe.

TL;DR

Splitting Terraform state files isn’t just a best practice - it’s a survival tactic as your infrastructure grows.
It keeps your team moving, your secrets safe, and your weekends free from “terraform panic” moments.

Been there, done that, got the (split) state files to prove it.

You convinced me! So how do I split a Terraform state?

Prerequisites

  1. Ensure your S3 bucket for holding states has versioning enabled.
    You don’t want to find out you have no backup in case of messing the state up.

  2. Have a state locking mechanism to avoid external changes in the middle of process.
    If you use Atlantis, you’re probably on the safe side.
    If you orchestrate Terraform with different CI/CD solution, please ensure it will lock the states.

  3. Be aware that Terraform state may contain sensitive information.
    You will download this state locally.
    Be very careful so you don’t commit local state file to the repository!
    Ensure you remove all the local state files after you finish.

Steps to move resources between Terraform states

Before you begin, a short clarification note - I use current state to refer to the state you already have before split and that will most likely still exist, just with less resources; and I use new state to refer to the state you create and move resources to it.
In case something is not clear, leave me a comment and I will improve the writings.

  1. Create a new state directory in your configuration, with terraform.tf pointing to the new state location in your S3 bucket for states.
    Be sure you don’t change state path in current state!

  2. [OPTIONAL] If you use Atlantis, add the new directory to the atlantis.yml file.

  3. Open a PR that will lock both states and ensure nobody will interact with these states during migration.

    1. if you use Atlantis, it will create locks. Just make it clear in the PR, that nobody should take the lock down (e.g. add DO NOT TAKE LOCK to the PR title if you don’t have any mechanism that would guarantee nobody takes the lock).

    2. If you don’t use Atlantis, ensure how your CI/CD solution can lock the state you work on.

    3. If you don’t use any CI/CD for Terraform execution, then clearly communicate with all your Terraform contributors, that you are splitting this particular state and they cannot work with it unless you finish the split.

  4. Move Terraform resources, modules and outputs to the new state, as needed.

  5. Create a terraform_remote_state data source pointing to the new state - the same path you used in new terraform.tf

  6. In the current state refer to the outputs of this new terraform_remote_state, for the resources that you moved

  7. Pull both states locally with terraform state pull > state.backup, respectively in current and new state directories.
    Remember: Terraform states may contain highly sensitive information - work with caution!

  8. In both directories make copies of pulled states with cp state.backup modified.state

  9. Now you can move your resources between states with:

     For regular resources:
     terraform state mv -state /path/to/current/state/modified.state -state-out /path/to/new/state/networking/modified.state aws_vpc.my_vpc aws_vpc.my_vpc
    
     For modules:
     terraform state mv -state /path/to/current/state/modified.state -state-out /path/to/new/state/networking/modified.state module.my_module module.my_module
    

    Moving modules is actually awesome here, as you can move entire module, without moving it’s internal resources one by one - this simplifies a lot!

  10. Push both states, in both directories, with terraform state push modified.state

  11. Plan the new state - you should see no changes to resources if everything went well - but you will see new outputs to be created.
    If not, get back to 9. and move what you missed.

  12. Apply the new state so it has the outputs - they are referred in the current state.

  13. Plan the current state - you should see no changes if everything went well so far.
    If not, get back to 9. and move what you missed.

  14. Apply the current state.

  15. At this point everything is good, so merge your PR.

  16. Now the process is completed.
    Moving resources produced some modified.state.*.backup in both directories - you can now remove them, as well as modified.state and state.backup files.

  17. Search your repositories for other states references to the current state.
    On GitHub you can search through your entire Organization.

    1. If any state refers to the moved resources, add terraform_remote_state data source pointing to the new state, as in 5.

    2. Point relevant resources to that new terraform_remote_state data source

    3. Plan that state - you should see no changes.
      If not - verify previous steps, if you haven’t missed or mistaken anything.

    4. Apply that state.

    5. Repeat for every additional state that referred the state that was split.

  18. The process is over, you have cleaned up the state 💪

If you want to dive deeper, or need help with your own Terraform breakup story, hit me up - always happy to talk about infra (or scuba stories 🤿).

You can also hire me as a consultant if you feel you need support - just drop me a message!

0
Subscribe to my newsletter

Read articles from Jan Tymiński directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Jan Tymiński
Jan Tymiński

I started my professional career in 2012 as Systems Administrator and continued it until 2018 to become DevOps Engineer. I work with AWS since 2016 and I am 5 times certified AWS Specialist.