Entity-relationship Data Models

Lord AbiollaLord Abiolla
9 min read

We are already aware that a database is a collection of structured data. When the database is relational, the tables have certain relationships with each other known as relations.

To organize and manipulate this data, we use a relational database management system (RDBMS). Each relation store an instance of entity (The things we store information about in tables).

The relations furthermore have columns known as attributes and rows referred to as records.

Entity-relationship data models serve as conceptual frameworks used to design and visually represent the structure of a database.

The three main types of relations that entities can have with each other include:

  • One-to-One relationship: Means that a row in one table relates to another one row in another table. e.g., a single customer in the customer’s table would have one unique address in the addresses table, and therefore, we would like the two tables using one to one relationship. This relationship is represented visually using one relation line on both ends of the line to represent one row on the first table and another on the second table.

  • One-to-Many relationship: This means that one row in a table can be linked to multiple rows in another and vice verse. e.g., a customer may buy from a company multiple times generating multiple invoices This means that one unique row in the customer’s table may be linked to multiple rows in the invoices table.

  • Many-to-Many relationship: These refers to rows that can be linked either individually, or in groups to individuals or groups in another table. e.g., invoices and products, a products can be listed on multiple invoices because the same product can be bought by different people multiple times, equally, multiple products can be listed in one invoices creating a many to many relationship.

  • Self-referencing table: This relationship occurs when an attribute in a table has a relation to another attribute in the same table.

Entity-Relationship Diagrams (ERD)

This is a graphical representation of the relationship among entities in a database. ERD is used in data modelling to visualize and communicate the structure of a database system.

An ERD is drawn at up to three levels of abstraction, depending on the purpose it is created for and the audience it is meant to target. These include;

  1. Conceptual data model

  2. Logical data model

  3. Physical data model

Conceptual data model

This is the highest level of abstraction in the data modelling process. It represents the overall picture of a database system, by primarily focusing on the identification of entities and relationships and cardinalities only.

Logical data model

This is an expansion of the Conceptual data model. Logical data model focuses on describing the attributes in more details without taking into account which attribute in one table relates to another in another table.

Physical data model

It bridges the gap between the logical design and the actual implementation of this model in an RDBMS. It retains entities, attributes and cardinalities from the logical data model, and also has specific key implementation details such as keys and data types.

Database Keys

Database keys are fields or combination of fields in a database that uniquely identify each records in a table. They are essential for maintaining data integrity, preventing duplicates, and establishing relationships between different tables.

The three main types of keys include:

  1. Primary Keys

  2. Foreign Keys

  3. Composite Keys

A primary Key is a column that uniquely identifies each row in a table. It cannot have missing or duplicate values because it is a unique identifier.

Finally, a table cannot have more than one primary.

A Foreign Key can refer to a column or set of columns that correspond to the primary key in another table. It establishes a relationship between two tables know as Parent Child relationship with the foreign key in the child table referencing the primary key in the parent table, creating a link between the two tables*.*

This relationship ensures referential integrity, meaning that the data in the child table is consistent with the data in the parent table.

A composite Key is a combination of two or more columns that together uniquely identify a record in a table. This is useful when a single column cannot be sufficient to ensure uniqueness for each row.

SQL Join and Set Operations

Set Theory

A set is a well defined collection of distinct objects, each of which is an element of a set. Sets are denoted by listing the elements within curly brackets {}.

Sets can contain any number of elements, and when a set contains zero elements, it is referred to as an empty set.

A subset is a set that is entirely contained in another set. It is denoted by a subset symbol, which is a ‘C’ with a dash below it.

Sets are related to databases in that, a table can viewed as a set with the rows representing the set elements. Through SQL queries, we can select subset of a set and present them to users as collections of rows and columns.

Venn Diagram

This is a graphical representation used to showcase larger correlations between sets. Each set is depicted as a circle, with the area within the circle representing the set elements.

The primary purpose of Venn diagrams is to visualize the similarities and differences between sets allowing for easy comparison and analysis.

If two circles do not overlap at all, it indicates that the corresponding sets have no elements in common. Overlapping portions of the circles represent elements that belong to multiple sets, the more circles overlap, the more elements sets have in common.

The area enclosing all the circles represents the Universal set which includes all elements relevant to the context of the Venn diagram.

Set Operators

These are symbols or operations used in set theory to manipulate and combine sets.

In set operations:

  1. The tables being compared must have same number of columns.

  2. The columns used in set operations must have compatible data types.

These set operators include:

  1. Union Operator

This operator is used to combine records from multiple tables, culminating in a single unified table. It is represented by the Union symbol (U). Meaning, to combine A and B, we write: A U B. Union operator also eliminates duplicates records that may exist between A and B ensuring each record in the result set is unique.

  1. Intersect Operator

This operator is used to return a table that contain records that are common in all tables involved.It is represented by the intersect symbol (n)

  1. Except Operator

It is represented by the minus symbol (-). It is used to retrieve unique records in the first table, not the common records of both tables. e.g., A - B.

SQL Join

Join are used to combine data from two or more tables based on relate column(s) between them. The result is always a table including columns from both tables arranged side by side.

Types of Join Operations

  • INNER JOIN

  • LEFT JOIN (or LEFT OUTER JOIN)

  • RIGHT JOIN (or RIGHT OUTER JOIN)

  • FULL JOIN (or FULL OUTER JOIN)

  1. INNER JOIN

An INNER JOIN is a join that returns only the rows from both tables where there is a match between the specified columns in each table. It filters out rows that do not have corresponding matches in both tables.

Syntax

SELECT
    columns
FROM
    table1
INNER JOIN
    table2
ON
    table1.column_name = table2.column_name;

For instance; To create dataset with columns Country_name, Population, and Water_consumed_m3, containing only the information for countries where there is data available in both the Countries and Water-usage tables, we use INNER JOIN as shown below;

SELECT
    Countries.Country_name,
    Countries.Population,
    Water-usage.Water_consumed_m3
FROM
    Countries
INNER JOIN
    Water_usage
ON
    Countries.Country_id = Water_usage.Country_id;

Output:

  1. LEFT JOIN

LEFT JOIN is also known as LEFT OUTER JOIN. It returns all rows from the left table and only the matching rows from the right table. If there is no match in the right table, it returns NULL values for columns of the right table.

SELECT
    columns
FROM
    table1
LEFT JOIN
    table2
ON
    table1.column_name = table3.column_name;

Example: Lets create a merged dataset with columns Country_name, Population, and Water_consumed_m3, with details for all countries in the Countries table, and incorporating data from Water_usage table (when available), you can use the LEFT JOIN operation.

SELECT
    Countries.Country_name,
    Countries.Population,
    Water-usage.Water_consumed_m3
FROM
    Countries
LEFT JOIN
    Water_usage
ON
    Countries.Country_id = Water_usage.Country_id;

Output

  1. RIGHT JOIN

This is also known as RIGHT OUTER JOIN, and it returns all the rows from right table and the matching rows from the left table. If there are no matches, NULL values are returned for the columns of the table table.

SELECT
    columns
FROM
    table1
RIGHT JOIN
    table2
ON
    table1.column_name = table3.column_name;

Example; We can create dataset comprising the columns Country_name, Population, and Water_consumed-m3, which will include data usage records from Water-usage table, along with any available associated country details from Countries table.

SELECT
    Countries.Country_name,
    Countries.Population,
    Water-usage.Water_consumed_m3
FROM
    Countries
RIGHT JOIN
    Water_usage
ON
    Countries.Country_id = Water_usage.Country_id;

Output:

  1. FULL OUTER JOIN

This one returns all the rows from both tables, including unmatched rows from both the left and right tables. If no matches are found, NULL values are returned for the columns of the respective table.

SELECT
    columns
FROM
    table1
FULL OUTER JOIN
    table2
ON
    table1.column_name = table3.column_name;

Example: Let’s create dataset that combines columns Country_name, Population, and Water_consumed_m3, incorporating data for all countries from Countries table and water usage records from the Water-usage table.

SELECT
    Countries.Country_name,
    Countries.Population,
    Water-usage.Water_consumed_m3
FROM
    Countries
FULL OUTER JOIN
    Water_usage
ON
    Countries.Country_id = Water_usage.Country_id;

Output:

UNION

UNION operator is used to combine results of two or more SELECT queries into a single result set. It merges rows from all queries while removing duplicates.

Syntax

SELECT
    columns
FROM
    table1
UNION
SELECT
    columns
FROM
    table2;

The difference between FULL OUTER JOIN and UNION is that FULL OUTER JOIN combines rows from two tables, keeping all records and matching them when there is a common key, while including NULLs for non matching rows. UNION combines results of multiple SELECT queries, removing duplicates and returning only distinct rows. It doesn’t include all unmatched rows from both sources like FULL OUTER JOIN does.

For instance, we can combine data from Water_usage table in a combine result set while reserving the structure of the Countries table, with NULL values for Country_name and Population in place of specific data from the Countries table.

SELECT
    Country_id, Country_name,Population
FROM
    Countries
UNION
SELECT
    Country_id, NULL, NULL
FROM
    Water_usage;

Output:

INTERSECT

It’s used to combine results of two or more SELECT queries and returns only the rows that appear in all the result sets. It retrieves common rows between the queries while eliminating any duplicate rows.

Example: Let’s used INTERSECT to find common rows between two datasets.

SELECT
    Country_id
FROM
    Countries
INTERSECT
SELECT
    Country_id
FROM
    Water_usage;

EXCEPT

EXCEPT is used to retrieve rows that appear in the first SELECT query but not in the second SELECT query. It is used to find the difference between the results of two queries.

Example: We can use EXCEPT to get distinct Country_id values from the Countries table that are absent in the Water_usage table.

SELECT
    Country_id
FROM
    Countries
EXCEPT
SELECT
    Country_id
FROM
    Water_usage;

SUMMARY TABLE

0
Subscribe to my newsletter

Read articles from Lord Abiolla directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Lord Abiolla
Lord Abiolla

Passionate Software Developer with a strong enthusiasm for data, technology, and entrepreneurship to solve real-world problems. I enjoy building innovative digital solutions and currently exploring new advancements in data, and leveraging my skills to create impactful software solutions. Beyond coding, I have a keen interest in strategic thinking in business and meeting new people to exchange ideas and collaborate on exciting projects.