[EN] Be careful before applying immediate modifications in AWS RDS

tl;dr

  • "Apply Immediately" applies everything in the pending modifications queue, not just your change.

  • Always check pending modifications and maintenance tabs first.

  • Don't accidentally trigger a db-upgrade unless you're ready for downtime.

What just happened?

Today I was preparing for a planned maintenance, just an hour before the scheduled time. We planned to upgrade one of our RDS instance to the latest minor version. It came the time when I needed to change the Parameter Group. It looked like a simple change and shouldn’t cause any downtime except when you reboot the instance to apply the changes. I have done this several times before and I can confirm it stays that way.

I checked my modifications again just to make sure I didn’t misclick anything. Auto minor upgrade was disabled, and the maintenance window wasn’t scheduled today. I chose “Apply Immediately”, assuming it would apply only to my intended modification. Instead, I got unexpected downtime. AWS started showing “Upgrading” status instead of “Available”, and the database went offline for a few minutes. Tried to figure out what just happened and keep calm while investigating how it happened. Turns out it was my modification that triggered the pending mandatory engine upgrade to run immediately.

So, how did that happen?

When the instance was still in the Upgrading state, I realized there was a required db-upgrade in the Maintenance & Backup tab. It wasn’t scheduled today, so why did AWS apply it now? Turns out when you choose to apply the modification immediately, AWS RDS will think “Oh, the user is changing something. We think it’s a good time to apply any other pending modifications as well”, instead of just what I want.

According to the official AWS documentation:

If you don't choose to apply changes immediately, RDS puts the changes into the pending modifications queue. During the next maintenance window, RDS applies any pending changes in the queue. If you choose to apply changes immediately, your new changes and any changes in the pending modifications queue are applied.

So if you have any required maintenance actions queued up, like a minor version upgrade, they will be applied instantly along with your changes, even if you didn’t intend to touch the engine version. And yes, a db-upgrade restarts your instance, which means downtime.

Check before you fall into the trap

tl;dr: Don’t Apply Immediately Blindly.

Before you make any changes that need to be applied immediately, always check for pending queue for that RDS instance! There may be a pending modification or pending maintenance action. Here’s how you do it:

From Console

  1. Go to RDS console and find your instance.

  2. Look for Maintenance column.

  3. Click the “Maintenance & Backups” tab.

  4. Look for any changes listed in the “Pending maintenance” and “Pending modifications”.

From AWS CLI

  1. Look for pending modifications:

     aws rds describe-db-instances \
       --db-instance-identifier your-db-name \
       --query "DBInstances[*].PendingModifiedValues"
    
  2. Look for your instances in this list of pending maintenance actions:

     aws rds describe-pending-maintenance-actions --region <your region>
    

Undo or cancel changes

AFAIK, I haven’t found a way to cancel, defer, or make immediate changes without triggering the required update or changes in the queue. But you can cancel some of the non-required pending modifications by following this answer from Stack Overflow.

Final thoughts

AWS managed services, such as RDS, take a lot of the burden off an engineer’s shoulders. But it requires you to understand how it works. Otherwise, we might run into a problem similar to what I experienced. In my opinion, AWS should give the option to choose which modification will be applied instead of applying all pending modifications at once. Or just a list of what modifications (including in the queue) will be made before I click that final “Modify”.

Show the list here

Luckily, in my case, it was nighttime and only a few people were using the database, so the incident didn’t cause significant harm.

0
Subscribe to my newsletter

Read articles from Mochammad Syaifuddin directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Mochammad Syaifuddin
Mochammad Syaifuddin

Started my IT career as a Technical Support at an Indonesian web hosting provider, then progressed through various roles as a Linux SysAdmin, Network Engineer, Product Designer, and DevOps Engineer. I moved to a SaaS company and since then I’ve built hands-on experience mainly with AWS and GCP and work daily with popular cloud native tools.