What is Normalization in DBMS? Explain All Normal Forms With a Real-Time Example?
After reading the above heading you might have got an understanding of what we are gonna cover today.
Still, Let me make you comfortable with the topic and related terms.
Lets, Dive in.
DBMS stands for the Database Management System.
Now, You’ll ask what Database Management System is? Well, the name itself specifies the meaning. DBMS is a system that is used to manage the database.
Now, When I say to manage a database, I mean it is used for creating, storing, and manipulating a database.
There is various software available that helps us to create, store and manipulate databases.
But there is one important concept about databases that very few are fully understood.
And today I will explain the Normalization concept with a real-time example explaining all three forms of normalization.
What is Normalization?
Normalization is a process that minimizes redundancy. Redundancy in relation may cause insertion, deletion, and update anomalies.
So, it helps to minimize the redundancy in relations.
What is redundancy?
Redundancy means having multiple copies of the same data in the database.
For example: In your phone, if you have a contact number saved with the same name. Obviously, it will cause ambiguity about which one to choose.
Same with database, When you have multiple copies of the same information, ambiguity occurs. This problem arises when a database is not normalized.
For information to be precise, we need unique information stored in the database. So, for this, we use normalization which consists of different normal forms.
Normal forms are used to eliminate or reduce redundancy in database tables.
Mainly, there are 3 types of normal forms
So, today’s goal is to create a database from scratch and normalize it so that it has minimal redundancy.
- Understand Database Requirements.
- Create ER diagram of all entities
- Convert to 1NF
- Convert to 2NF
- Convert to 3NF
First, we will represent our data in the form of an ER diagram. Then we will move further to normalizing our data
What is an ER diagram?
ER diagram stands for Entity Relationship Diagram. ER diagram is a high-level conceptual data model diagram. ER Diagrams contain various symbols which are used to represent relationships among tables.
Why is it necessary to use an ER diagram?
ER diagrams are a kind of flow chart that helps us clearly understand how databases are connected or how they need to be arranged.
To create ER diagram, I have used LucidChart, which is a tool where we can create such diagrams.
Following are the notations and ways we are going to today.
Step 1: Understand Database Requirements.
Before creating any database analysis of the problem statement is crucial.
Let’s look at the problem statement
“ Create a database to help universities keep track of
students that have enrolled, what subjects they are taking,
and teachers of those subjects.”
Initially, our first task is to identify the table to be created. For this, we need to look for nouns. In this case, we have nouns like university, student, subject, teacher.
Identifying these nouns helps us create entities for databases.
so, our entities are fixed now.
Step 2: Create ER diagram of all entities
Following are the tables and necessary fields needed to build a strong database.
Step 3: Convert to 1NF
First, we will convert it into 1NF in this it must have a unique field that uniquely identifies every tuple. Also, it must not include any composite attribute.
Every table must have a primary key. Primary Key is a field that uniquely identifies each tuple.
Each table has the following primary key:
An attribute composed of many other attributes is called a composite attribute.
For example, the Address attribute of student Entity type consists of Street, City, State, and Country, etc.
Hence we have divided the address into fields like
This is the result of the above statements :
By this, our database is in 1NF.
Step 4: Convert to 2NF
Now, let's move to 2nd step ie to convert into 2NF.
In this, we will remove the partial dependency. Partial Dependency occurs when a non-prime attribute is functionally dependent on part of a candidate key.
Many subjects can be under the same category in the subject table, and if one category name changes, all instances need to be updated.
Hence we’ll create another table of categories and link them using foreign keys. Using this every change will be reflected in respective tables,
Now, a category has many subjects therefore category ID will be a foreign key in the subject table.
A foreign key is a key that is used to link two tables using a common entity present in both tables.
In the subject table, the student name is a field that needs to be in 2NF many students can have many subjects. therefore it is a many-to-many relationship.
But we cannot use such relationships in real-time. Hence we’ll break them into 2 tables of one-to-many relationships.
In the teacher's table, the subject taught is the subject taught by the teacher and to be related it must be the primary key which currently is.
Therefore, we should refer to the subject column where the teacher has many subjects. Here we will add the teacher’s primary key into the subject table as a foreign key.
in the university table, it is not related to any table. but it has a unique ID and also the university has many subjects. Hence we will add university ID to the subject table as a foreign key,
Now, our database is in 2NF.
Step 5: Convert to 3NF
Let’s look at our last step ie to convert into 3NF.
In 3NF, we have to remove the transitive dependency.
Transitive dependency means every non-key attribute in the table must be dependant only on PK
For Example A->B B->C which means A->C is a transitive dependency.
In our ER diagram address is being repeated over almost all the tables and this address is not dependent on the primary key.
Therefore it must have a separate table. In that table, we must include address ID to uniquely identify that tuple.
Now, our database is in 3NF.
There are more normal forms as you go advanced in this. for example., BCNF(Boyce-Codd normal form) will be cover in the further article.
As you can see now, our database is less redundant and can be implemented in real-time.
I hope you have a clear understanding of what normalization is.
Thank you all for reading!
Note: The example mentioned above is taken from one of the courses I have learned.