SQL Detective: Precision Deletion with Self-Joins and Subqueries

Ardhendu GhoshArdhendu Ghosh
2 min read

📌 Problem Overview

You're given a Person table containing email addresses. Some emails appear more than once, and your task is to delete duplicates, keeping only the row with the smallest id for each email.

🗂️ Table Schema

sql

Person(id INT, email VARCHAR)

🧪 Sample Input

✅ Expected Output

🔁 Method 1: Self-Join Delete

sql

DELETE p1
FROM Person p1
JOIN Person p2
ON p1.email = p2.email AND p1.id > p2.id;

🧠 Why p1.id > p2.id?

This condition ensures:

  • Only the row with the higher ID gets deleted.

  • The first occurrence (lowest ID) is preserved.

  • No duplicate logic is missed.

👀 Preview Before Deleting

sql

SELECT p1.*
FROM Person p1
JOIN Person p2
ON p1.email = p2.email AND p1.id > p2.id;

Use this to audit which rows will be deleted before running the actual DELETE.

🧠 Method 2: Subquery-Based Delete

sql

DELETE FROM Person
WHERE id NOT IN (
    SELECT MIN(id)
    FROM Person
    GROUP BY email
);

🔍 Explanation:

  • GROUP BY email: groups duplicates.

  • MIN(id): keeps the lowest ID per email.

  • NOT IN (...): deletes all other rows.

👀 Preview Before Deleting

sql

SELECT *
FROM Person
WHERE id IN (
    SELECT id
    FROM Person
    GROUP BY email
    HAVING COUNT(*) > 1
);

This shows all rows involved in duplication—great for annotation or tagging.

🧵 Bonus: Annotated Duplicate Detection with ROW_NUMBER()

sql

WITH ranked AS (
    SELECT *,
           ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS rn
    FROM Person
)
SELECT *
FROM ranked
WHERE rn > 1;

This flags all but the first occurrence—ideal for tagging, deletion, or dashboard logic.

🧠 Final Thoughts

This challenge is a masterclass in:

  • Self-joins for targeted deletion

  • Subqueries for clean logic

  • Previewing before mutation

Whether you're building annotation pipelines or cleaning production datasets, these patterns are essential. Add this to your SQL toolkit and your data will thank you.

10
Subscribe to my newsletter

Read articles from Ardhendu Ghosh directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Ardhendu Ghosh
Ardhendu Ghosh

🧵 Stitching logic into data, culture into pixels, and clarity into workflows. I’m Ardhendu Ghosh—a systems architect in the making, blending SQL precision, annotation logic, and creative restoration into scalable, story-driven solutions. I build: 🧠 Reusable query modules for streaks, duplicates & classification 🎨 Emotionally attentive image enhancements (yes, even bindis matter) 📊 Dashboard-ready logic for annotation pipelines & public sharing ✍️ SEO-optimized content for Hashnode, LinkedIn & beyond If it’s repeatable, teachable, and culturally meaningful—I’m building it.