SQL Detective: Precision Deletion with Self-Joins and Subqueries

📌 Problem Overview
You're given a Person
table containing email addresses. Some emails appear more than once, and your task is to delete duplicates, keeping only the row with the smallest id
for each email.
🗂️ Table Schema
sql
Person(id INT, email VARCHAR)
🧪 Sample Input
id | |
1 | john@example.com |
2 | bob@example.com |
3 | john@example.com |
✅ Expected Output
id | |
1 | john@example.com |
2 | bob@example.com |
🔁 Method 1: Self-Join Delete
sql
DELETE p1
FROM Person p1
JOIN Person p2
ON p1.email = p2.email AND p1.id > p2.id;
🧠 Why p1.id
>
p2.id
?
This condition ensures:
Only the row with the higher ID gets deleted.
The first occurrence (lowest ID) is preserved.
No duplicate logic is missed.
👀 Preview Before Deleting
sql
SELECT p1.*
FROM Person p1
JOIN Person p2
ON p1.email = p2.email AND p1.id > p2.id;
Use this to audit which rows will be deleted before running the actual DELETE
.
🧠 Method 2: Subquery-Based Delete
sql
DELETE FROM Person
WHERE id NOT IN (
SELECT MIN(id)
FROM Person
GROUP BY email
);
🔍 Explanation:
GROUP BY email
: groups duplicates.MIN(id)
: keeps the lowest ID per email.NOT IN (...)
: deletes all other rows.
👀 Preview Before Deleting
sql
SELECT *
FROM Person
WHERE id IN (
SELECT id
FROM Person
GROUP BY email
HAVING COUNT(*) > 1
);
This shows all rows involved in duplication—great for annotation or tagging.
🧵 Bonus: Annotated Duplicate Detection with ROW_NUMBER()
sql
WITH ranked AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS rn
FROM Person
)
SELECT *
FROM ranked
WHERE rn > 1;
This flags all but the first occurrence—ideal for tagging, deletion, or dashboard logic.
🧠 Final Thoughts
This challenge is a masterclass in:
Self-joins for targeted deletion
Subqueries for clean logic
Previewing before mutation
Whether you're building annotation pipelines or cleaning production datasets, these patterns are essential. Add this to your SQL toolkit and your data will thank you.
Subscribe to my newsletter
Read articles from Ardhendu Ghosh directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Ardhendu Ghosh
Ardhendu Ghosh
🧵 Stitching logic into data, culture into pixels, and clarity into workflows. I’m Ardhendu Ghosh—a systems architect in the making, blending SQL precision, annotation logic, and creative restoration into scalable, story-driven solutions. I build: 🧠 Reusable query modules for streaks, duplicates & classification 🎨 Emotionally attentive image enhancements (yes, even bindis matter) 📊 Dashboard-ready logic for annotation pipelines & public sharing ✍️ SEO-optimized content for Hashnode, LinkedIn & beyond If it’s repeatable, teachable, and culturally meaningful—I’m building it.