How to Minimize Downtime Caused by Table Locks During PostgreSQL Schema Changes?

Arindam GhoshArindam Ghosh
2 min read

🚨 Ever faced an unexpected downtime from a seemingly harmless database deployment?
Imagine you're adding a new column to an existing table (e.g., users) in PostgreSQL. You decided to skip specifying a default value, thinking it would be a metadata-only change. So no rows need updating, and the operation should finish in the blink of an eye. Easy-peasy, right?

But then… the query runs for an unexpectedly long time, blocking other queries, and before you know it, your application grinds to a halt.😨

🔍 Why does a metadata change take longer time and disrupt everything?

PostgreSQL DDL statements, like ALTER TABLE, acquire an ACCESS EXCLUSIVE lock on the target table. This lock is the most restrictive, conflicting with all other lock types.🔒

Here’s the chain of chaos:

1️⃣ A long-running query (like a SELECT) already holds an ACCESS SHARE lock on the users table.

2️⃣ Your ALTER TABLE query tries to acquire an ACCESS EXCLUSIVE lock and has to wait. The DDL statement can simply wait patiently until it's able to acquire its ACCESS EXCLUSIVE lock, right? Here’s the problem.

3️⃣ While waiting, all other queries needing locks on the table queue up behind it—even simple SELECT queries! 🚦

If that long-running query drags on, the entire table becomes inaccessible for reads and writes until the ALTER TABLE completes, causing widespread downtime.

🛠️ How to prevent this?

Here’s lock_timeout to the rescue!🦸‍♂️ It defines how long a statement will wait to acquire a lock before giving up.

Here’s an example:

SET lock_timeout = '5s'; -- Set lock timeout to 5 seconds
ALTER TABLE users ADD COLUMN last_login TIMESTAMP;
RESET lock_timeout; -- Restore to default

With lock_timeout set ⏱️ -

  • If the operation can’t acquire its lock within specified time, it gracefully exits. Adding retries with exponential backoff can help handle these transient conflicts.✅

  • This ensures other queries like SELECT and UPDATE can run without being stuck in the queue for an extended period.💡

📌 The takeaway

Setting a reasonable lock_timeout for potentially locking operations is a simple yet effective way to mitigate the impact of DDL changes on your database's availability. 🕒✨

Next time you’re about to make such database change, don’t forget to safeguard your app’s availability with this simple trick! 🚀

0
Subscribe to my newsletter

Read articles from Arindam Ghosh directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Arindam Ghosh
Arindam Ghosh

Database reliability engineer 👨‍💻 with expertise in designing, implementing, and maintaining highly available and scalable database systems 💻 Enjoy automating various database tasks ⚙️ and implementing database DevOps 🚀 🙏