Deleted the Database: A Tech Pro's Guide to Handling Major Oops Moments
Last week, at a little past 8 PM, my phone pinged with a Slack message from a former mentee, Alex. Alex, who is both ASD and ADHD, had some initial hiccups fitting into the workplace, and I had mentored him through those early days. Even though he has been on track for a while now, seeing a message from him made me check immediately—and boy, was it a shocker! In an attempt to assist another team, Alex had accidentally deleted a production database. His first message was a panicked, "Am I going to be fired?" I reassured him quickly and then dove into the problem. Here’s a breakdown of how we tackled this crisis, which might help you navigate similar waters.
1. Keep Your Cool (Equanimity) The first reaction of any junior engineer to such mishaps is usually doom-laden: "That’s it, I've messed up big time." But it's crucial not to view mistakes as solely one's own. In this instance, not only was the database unprotected against deletions, but Alex, an external helper, was also given write access to production—a foundational flaw. Sometimes SDEs are made scapegoats, but most reputable companies don't operate that way.
2. Assess the Impact (Evaluation) Take a moment with your team to evaluate the extent of the damage. Understand which services are affected and how—whether they are simply inaccessible or if there are other negative repercussions.
3. Communicate Promptly (Engage) Maintaining open lines of communication is critical. Alex faced a dilemma about whether to inform his manager, especially since it involved waking them up over an issue caused by another team. However, it’s always better for your manager to hear about such issues from you rather than through the grapevine.
4. Focus on the Fix (Execute) The most crucial part of crisis management is focusing on solutions. Make sure every step you take is one you are confident about. Alex's initial instinct was to quickly rebuild the database, but this approach had its pitfalls. New data would start populating, making it harder to reconcile with the old data. The correct move was to restore the database from a backup, even though it took longer, to avoid data inconsistencies.
Conclusion: Dealing with workplace mistakes requires more than technical agility; it demands mental resilience. Remember, every crisis is an opportunity for growth. Learn from them, and you'll become increasingly indispensable.
Subscribe to my newsletter
Read articles from Denny Wang directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Denny Wang
Denny Wang
I'm Denny, a seasoned senior software engineer and AI enthusiast with a rich background in building robust backend systems and scalable solutions across accounts and regions at Amazon AWS. My professional journey, deeply rooted in the realms of cloud computing and machine learning, has fueled my passion for the transformative power of AI. Through this blog, I aim to share my insights, learnings, and the innovative spirit of AI and cloud engineering beyond the corporate horizon.