Normalization 1,2,3NF
Normalization of Database
Database Normalization is a technique of organizing the data in the database. Normalization is a systematic approach of decomposing tables to eliminate data redundancy(repetition) and undesirable characteristics like Insertion, Update and Deletion Anamolies. It is a multi-step process that puts data into tabular form, removing duplicated data from the relation tables.
Normalization is used for mainly two purposes,
- Eliminating reduntant(useless) data.
- Ensuring data dependencies make sense i.e data is logically stored.
1st Normal Form Definition
A database is in first normal form if it satisfies the following conditions:
- Contains only atomic values
- There are no repeating groups
An atomic value is a value that cannot be divided. For example, in the table shown below, the values in the [Color] column in the first row can be divided into "red" and "green", hence [TABLE_PRODUCT] is not in 1NF.
A repeating group means that a table contains two or more columns that are closely related. For example, a table that records data on a book and its author(s) with the following columns: [Book ID], [Author 1], [Author 2], [Author 3] is not in 1NF because [Author 1], [Author 2], and [Author 3] are all repeating the same attribute.
1st Normal Form Example
How do we bring an unnormalized table into first normal form? Consider the following example:
This table is not in first normal form because the [Color] column can contain multiple values. For example, the first row includes values "red" and "green."
To bring this table to first normal form, we split the table into two tables and now we have the resulting tables:
Now first normal form is satisfied, as the columns on each table all hold just one value
Second Normal Form (2NF)
- In the 2NF, relational must be in 1NF.
- In the second normal form, all non-key attributes are fully functional dependent on the primary key
Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a school, a teacher can teach more than one subject.
TEACHER table
TEACHER_ID | SUBJECT | TEACHER_AGE |
---|---|---|
25 | Chemistry | 30 |
25 | Biology | 30 |
47 | English | 35 |
83 | Math | 38 |
83 | Computer | 38 |
In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which is a proper subset of a candidate key. That's why it violates the rule for 2NF.
To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL table:
TEACHER_ID | TEACHER_AGE |
---|---|
25 | 30 |
47 | 35 |
83 | 38 |
TEACHER_SUBJECT table:
TEACHER_ID | SUBJECT |
---|---|
25 | Chemistry |
25 | Biology |
47 | English |
83 | Math |
83 | Computer |
Third Normal Form (3NF)
- A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.
- 3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
- If there is no transitive dependency for non-prime attributes, then the relation must be in third normal form.
A relation is in third normal form if it holds atleast one of the following conditions for every non-trivial function dependency X → Y.
- X is a super key.
- Y is a prime attribute, i.e., each element of Y is part of some candidate key. Example:
EMPLOYEE_DETAIL table:
EMP_ID | EMP_NAME | EMP_ZIP | EMP_STATE | EMP_CITY |
---|---|---|---|---|
222 | Harry | 201010 | UP | Noida |
333 | Stephan | 02228 | US | Boston |
444 | Lan | 60007 | US | Chicago |
555 | Katharine | 06389 | UK | Norwich |
666 | John | 462007 | MP | Bhopal |
Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.
Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID. The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super key(EMP_ID). It violates the rule of third normal form.
That's why we need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
EMPLOYEE table:
EMP_ID | EMP_NAME | EMP_ZIP |
---|---|---|
222 | Harry | 201010 |
333 | Stephan | 02228 |
444 | Lan | 60007 |
555 | Katharine | 06389 |
666 | John | 462007 |
EMP_ZIP | EMP_STATE | EMP_CITY |
---|---|---|
201010 | UP | Noida |
02228 | US | Boston |
60007 | US | Chicago |
06389 | UK | Norwich |
462007 | MP | Bhopal |