SQL JOIN for Beginners Part 1: Splitting Tables

Suin LeeSuin Lee
3 min read

SQL Joins explained using Sets

Hello! This is my very first blog post. I’ll be sharing knowledge about computer science and programming development here from now on :)

In this post, I want to walk you through the concept of the SQL JOIN clause. I’ve written it to be as clear and easy to understand as possible — though you’ll need a basic understanding of SQL syntax and CRUD operations to follow along.

Before we get into JOIN, let's first take a look at an example table

💡 Computer Table

Take a look at the computer table below. It stores various information about computers:

At first glance, it looks fine — but you’ll soon notice that some columns contain lots of duplicate values, such as:

  • software_name

  • category

  • os_name

  • latest_update_version

Let’s say you want to update Xcode to Xcode2. You’d have to go through every row where Xcode appears and update it manually.

That might be manageable for 4 rows, but what if you have 5,000 or 10,000 rows? It’s not only time-consuming but also error-prone.

📦 Solution: Splitting the Table (Normalization)

The solution is to split the table into multiple related tables, removing the redundancy. This process is called normalization.

We can split the table. In other words, we can separate the duplicated parts into their own tables. Let’s extract the repeated fields from the Computer table and create a new table.

We removed the columns with duplicated data and added a software_id. From here, we can eliminate all redundant entries.

After cleaning up the data…

Software table is now complete!

In the Computer table, we only need to store the software_id

Now, if we ever need to update "Xcode" to "Xcode2", we only have to update it once in the Software table, and it will reflect across all referencing records. Convenient, right?

However, there are still some columns with duplicate values. Let’s go ahead and normalize the rest of the data just like we did earlier.

We added os_id to the Software table.

Now, even if os_name or latest_update_version needs to be updated, we only have to do it in one place!

✅Benefits of Splitting Tables

  • Eliminates redundancy

  • Makes updates easier

  • Ensures data consistency

⚠️ What’s the Downside?

The main drawback is that it becomes harder to view all data at a glance since it's now spread across multiple tables.

🔗 The Role of JOIN

To solve that, we use JOIN.
JOIN is a SQL clause that combines data from multiple tables based on shared values.

For example, if computer.software_id matches software.software_id, JOIN can retrieve all the relevant data in one result set.

(Data retrieved using JOIN)

In the next chapter, we’ll dive deeper into the JOIN syntax!

👉Check out the JOIN SQL queries example here

👉Check out the spreadsheet JOIN example here

1
Subscribe to my newsletter

Read articles from Suin Lee directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Suin Lee
Suin Lee