Database Design Gone Bad
It is important that you understand the contents of this article. Together with the next article, it forms the knowledge you need to do good database designs. So, please make sure you understand the things discussed here before you proceed.
Data Redundancy
Database design aims to minimize redundancy without losing data. Redundancy is where the same piece of data exists in more than one place in the database. This cause all sort of data quality issues.
Data redundancy is caused by anomalies. An anomaly is something that deviates from an expected standard. I know it does not say much, but by the end of the article, you will understand what an anomaly is.
We can drastically improve the quality of our data by preventing these anomalies. In this article we are going to look at three types of anomalies:
insert anomalies,
delete anomalies, and
update anomalies.
We will also look at Null values.
The Mighty Spreadsheet
The spreadsheet is still the number one option when developing simple but powerful databases. We will follow suit and look at an Employee database built with Google Sheets or Microsoft Excel. After six months of hard work, HR (human resources) ended up with the example below:
Row # | Name | Department | Job Title |
1 | John Smith | IT | Programmer |
2 | Peter King | Information Technology | Network Administrator |
3 | Jane Peters | IT | DBA |
4 | Sue Jenkins | I.T. | Database Administrator |
5 | J. Smith | IT | Programmer |
6 | Peters, Jane | InfoTech | D.B.A. |
7 | Jenkins, S | IT | Databse Administrator |
We have been called to assist in making this database better (whatever better means at this point). So let us work through this database, row by row.
Insert Anomalies
After a quick look, we realized that the database was a total mess! Let's see why.
Row 1
Row # | Name | Department | Job Title |
1 | John Smith | IT | Programmer |
This is the first entry, so everything seems fine for now.
Row 2
Row # | Name | Department | Job Title |
2 | Peter King | Information Technology | Network Administrator |
Why does the department say Information Technology and not IT like in row 1? And if Information Technology is the preferred term, why does Row 1 still say IT? This is an example of an Insert Anomaly.
Row 3
Row # | Name | Department | Job Title |
3 | Jane Peters | IT | DBA |
Everything seems fine on the condition that IT and not Information Technology, as in row 2, is the standard.
Row 4
Row # | Name | Department | Job Title |
4 | Sue Jenkins | I.T. | Database Administrator |
In row 4, we have I.T. instead of IT or Information Technology. This is again an example of an Insert Anomaly. But, hang on, DBA in row 3 is short for Database Administrator in row 4. Another Insert Anomaly.
Row 5
Row # | Name | Department | Job Title |
5 | J. Smith | IT | Programmer |
Is J. Smith in row 5 and John Smith in row 1 by any chance the same person entered twice? I mean, both work in IT and both are programmers. Another Insert Anomaly maybe. However, never assume it is a duplicate entry. Confirm with HR!
Row 6
Row # | Name | Department | Job Title |
6 | Peters, Jane | InfoTech | D.B.A. |
By now, everything seems to fall apart. Row 6 and row 3 probably refer to the same person. Also, now IT is called InfoTech, and DBA is D.B.A.
Row 7
Row # | Name | Department | Job Title |
7 | Jenkins, S | IT | Databse Administrator |
Rows 4 and 7 refer to the same person. Also, the Job Title has a spelling error.
Update Anomalies
Data change over time. For example, after our initial assessment, HR made some corrections. Also, people move to other departments and the database needs to be updated accordingly. We have once again been called in to assess the database. This is what we saw:
Row # | Name | Department | Job Title |
1 | John Smith | IT | Programmer |
2 | Peter King | IT | Network Administrator |
3 | Jane Peters | IT | Database Administrator |
4 | Sue Jenkins | I.T. | Database Administrator |
5 | J. Smith | IT | Programmer |
6 | Peters, Jane | IT | Database Administrator |
7 | Jenkins, S | IT | Database Administrator |
The company decided to standardize the term IT, which by the way is an important improvement. Always standardize terms! HR also updated rows 2 and 6 used to read Information Technology and InfoTech respectively. However, they missed row 4, which still reads I.T. This is an example of an update anomaly. Our brain knows that IT and I.T. is most probably the same thing. The computer does not know that though. So, this is going to become a problem at some point in the future and needs fixing.
Jane Peters and Sue Jenkins were entered twice, so HR deleted rows 3 and 7, leaving us with Sue Jenkins (row 3) and Peters, Jane (row 5).
Row # | Name | Department | Job Title |
1 | John Smith | IT | Programmer |
2 | Peter King | IT | Network Administrator |
3 | Sue Jenkins | IT | Database Administrator |
4 | J. Smith | IT | Programmer |
5 | Peters, Jane | IT | Database Administrator |
We now have one record for each employee, the Department is standardized on IT, and there are no typos in the Job Title anymore. Things have certainly improved.
However, there is inconsistency with the naming conventions of the Name column. For example, in rows 1, 2 and 3, the order is First Name, Last Name. In row 5, the order is Last Name, First Name. In row 4, the order is Initials and Last Name.
Also, after confirming with HR, it turned out that John Smith in row 1 and J. Smith in row 4 refer to different people. As they both work in IT and are both Programmers, we need to be able to distinguish between them. We will fix this problem in the next article.
Delete Anomalies
One day Peter King decided he had had it and resigned. HR removed him from the database and ended up with the example below:
Row # | Name | Department | Job Title |
1 | John Smith | IT | Programmer |
2 | Sue Jenkins | IT | Database Administrator |
3 | J. Smith | IT | Programmer |
4 | Peters, Jane | IT | Database Administrator |
This leaves the company with a potential problem though. According to the database, Peter King never worked for the company. Also, looking at the database, there is no indication anymore that the company had a Network Administrator. Again, we will fix this problem in a future article.
I will give you a hint for now. Never delete data!
Null Values
Mystery Employee
Bob Hansen joined our company, and after HR updated the database, we ended up with the example below:
Row # | Name | Department | Job Title |
1 | John Smith | IT | Programmer |
2 | Sue Jenkins | IT | Database Administrator |
3 | J. Smith | IT | Programmer |
4 | Peters, Jane | IT | Database Administrator |
5 | Bob Hansen | IT |
So what exactly does Bob do in IT? We don’t know because HR left the Job Title blank. We call this absence of a value a Null Value.
The word null has a particular meaning in programming and databases. It means there is nothing, absolutely nothing! A good example is when you have an active short-term insurance policy. There are two important dates, the Inception Date and the Cancellation Date. Because the policy is active, the Inception Date will have a value, and, because the policy is still active, the Cancellation Date will not have a value. Thus the Cancelation Date will be null. Once the policy is canceled, the Cancellation Date will be updated and will not be null anymore.
Null is Not Zero!
Let's look at another thing that sometimes confuses newcomers to null. Here we have a little database used by a local grocer to sell fruits:
Fruit | Price |
Apples | $2.50 |
Pears | $0.00 |
Bananas |
Apples cost $2.50. Pears are free. However, we have not determined what bananas cost, as the price is not zero. It is null. There is a big difference between zero and null in the programming world. As we progress, the concept of null will become clear.
Conclusion
It is clear from this short discussion that data can become close to useless if you do not follow a proper database design philosophy. We looked at some things that can (and will) go wrong. In the next post, we will improve the design of this database. See you there!
Subscribe to my newsletter
Read articles from Deon Slabbert directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Deon Slabbert
Deon Slabbert
I am a fullstack software developer with 35 years experience building enterprise applications for the financial sector and have a passion for learning & teaching others the tools of the trade