SQL Self-Joins with Examples

After learning about different joins, it’s now time to dive into self-joins! At first glance, self-joins might sound complicated, but they’re not. Think of it as a table joining itself, like a table of Amazon transactions comparing itself for insights. It’s simpler than it seems! 😉

In this tutorial, we’ll explore self-joins using the Goodreads dataset to create personalized book recommendations.


Self-Joins Example: Personalizing Book Suggestions

Imagine you’re part of the Goodreads team designing a recommendation system. Your goal is to suggest books based on a user’s preferences. For example, if a user loves romance, you’ll suggest more romantic books.

Finding Similar Books Within the Same Genre

Here’s a query that generates meaningful recommendations:

SELECT
  b1.genre,
  b1.book_title AS current_book,
  b2.book_title AS suggested_book
FROM goodreads AS b1
INNER JOIN goodreads AS b2
  ON b1.genre = b2.genre
WHERE b1.book_id != b2.book_id
ORDER BY b1.book_title;

Breakdown:

  • Self-Join on Genre: The ON clause matches rows with the same genre.

  • Exclude Identical Books: The WHERE clause ensures no book is matched with itself.

  • Output: A list of genres, current books, and suggested books.

Example Output:

current_genrecurrent_booksuggested_book
Non-FictionAce the Data Science InterviewData Engineering Cookbook
Non-FictionAce the Data Science InterviewBuilding Data-Intensive Apps with Flask and SQLAlchemy
Non-FictionAce the Data Science InterviewBlink: The Power of Thinking Without Thinking
Non-FictionAce the Data Science InterviewThe Power of Habit
Non-FictionAce the Data Science InterviewStorytelling with Data: A Data Visualization Guide

Cool, right? 🤓 You’re crafting personalized recommendations for book enthusiasts!


Expanding the Suggestions: A Second Book

What if you wanted to recommend not one but two additional books? Here’s how you can level up your query:

Taking Book Recommendations to the Next Level

SELECT
  b1.genre,
  b1.book_title AS current_book,
  b2.book_title AS suggested_book_1,
  b3.book_title AS suggested_book_2
FROM goodreads AS b1
INNER JOIN goodreads AS b2
  ON b1.genre = b2.genre
INNER JOIN goodreads AS b3
  ON b1.genre = b3.genre
WHERE b1.book_id != b2.book_id
  AND b1.book_id != b3.book_id
  AND b2.book_id != b3.book_id
ORDER BY b1.book_title
LIMIT 50;

Breakdown:

  • Additional Self-Joins: The query includes two more self-joins to fetch a second suggestion.

  • Avoid Repeated Suggestions: Conditions ensure that all three books in the recommendation set are distinct.

  • Performance Tip: The LIMIT clause prevents performance issues with large datasets.

Example Output:

current_genrecurrent_booksuggested_book_1suggested_book_2
Non-FictionAce the Data Science InterviewPython for Data AnalysisDesigning Data-Intensive Applications
Non-FictionAce the Data Science InterviewPython for Data AnalysisData Engineering with Python
Non-FictionAce the Data Science InterviewPython for Data AnalysisFundamentals of Data Engineering
Non-FictionAce the Data Science InterviewPython for Data AnalysisEducated: A Memoir
Non-FictionAce the Data Science InterviewPython for Data AnalysisData Science for Business

And there you have it! Self-joins in action to make recommendations more insightful.


Self-Join SQL Interview Question

Well-Paid Employees

Here’s a classic SQL interview question: Identify employees who earn more than their direct managers.

Problem Setup:

You’re given a table employee:

employee_idnamesalarydepartment_idmanager_id
1Emma Thompson380016
2Daniel Rodriguez223017
3Olivia Smith700018
4Noah Johnson680029
5Sophia Martinez1750111
6Liam Brown130003NULL
7Ava Garcia125003NULL
8William Davis68002NULL

Query:

SELECT
  e1.employee_id,
  e1.name AS employee_name
FROM employee AS e1
INNER JOIN employee AS e2
  ON e1.manager_id = e2.employee_id
WHERE e1.salary > e2.salary;

Breakdown:

  • Self-Join on Manager ID: Matches employees with their managers.

  • Condition: Filters rows where the employee’s salary is greater than their manager’s salary.

Example Output:

employee_idemployee_name
3Olivia Smith

Olivia Smith earns $7,000, which is more than her manager, William Davis, who earns $6,800.


Summary

  • Self-Joins allow a table to join itself, often to find relationships within the same dataset.

  • They’re useful for tasks like recommendations and hierarchical data analysis.

  • Practice with queries like book suggestions or comparing employee salaries to master self-joins confidently.

9
Subscribe to my newsletter

Read articles from Anubhav Kumar Gupta directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Anubhav Kumar Gupta
Anubhav Kumar Gupta

I'm Anubhav Kumar Gupta, a passionate Data Engineer at Infometry Inc. with expertise in Python, SQL, and Data Engineering. I love solving complex problems, optimizing data pipelines, and integrating back-end technologies.