Normalization is a fundamental concept in relational database design that helps to reduce redundancy and avoid undesirable anomalies, such as insertion, update, and deletion anomalies. The process of normalization involves organizing data into multiple related tables. These tables are designed based on certain rules called normal forms. In this article, we will explore the different normal forms—1NF, 2NF, 3NF, and BCNF—and how to apply them in SQL database design.
Normalization is the process of decomposing a database into smaller, more manageable tables while preserving data integrity. It helps to eliminate redundant data and ensure that data dependencies are logical and well-organized. There are several stages of normalization, each building on the previous one to ensure that the database structure is as efficient and non-redundant as possible.
1NF is the first step of normalization. A table is in 1NF if it meets the following conditions:
In simpler terms, 1NF ensures that each column holds only one value per row, and each row is unique.
Example: Consider the following table, which violates 1NF because the phone_numbers
column contains multiple phone numbers for a single person:
CREATE TABLE employees ( employee_id INT PRIMARY KEY, employee_name VARCHAR(100), phone_numbers VARCHAR(255) ); -- Sample Data (Not in 1NF) INSERT INTO employees VALUES (1, 'John Doe', '555-1234, 555-5678'), (2, 'Jane Smith', '555-8765');
To bring this table into 1NF, we split the phone_numbers
column into multiple rows:
-- Corrected table (1NF) CREATE TABLE employees ( employee_id INT, employee_name VARCHAR(100), phone_number VARCHAR(15), PRIMARY KEY (employee_id, phone_number) ); -- Sample Data (Now in 1NF) INSERT INTO employees VALUES (1, 'John Doe', '555-1234'), (1, 'John Doe', '555-5678'), (2, 'Jane Smith', '555-8765');
Now, each row contains a single phone number, satisfying 1NF.
A table is in 2NF if it is in 1NF and all non-key columns are fully dependent on the primary key. In other words, a table is in 2NF if it has no partial dependency (i.e., no non-key column depends on part of a composite primary key).
Example: Consider the following table where a composite primary key is used:
CREATE TABLE order_items ( order_id INT, product_id INT, product_name VARCHAR(100), quantity INT, PRIMARY KEY (order_id, product_id) ); -- Sample Data (Not in 2NF) INSERT INTO order_items VALUES (101, 1, 'Laptop', 2), (101, 2, 'Mouse', 1), (102, 1, 'Laptop', 1);
The product_name
column is dependent only on product_id
, not on the entire composite key (order_id
, product_id
). This is a partial dependency, which violates 2NF.
To bring this table into 2NF, we split the data into two tables: one for orders and one for products:
-- Corrected tables (2NF) CREATE TABLE orders ( order_id INT PRIMARY KEY, order_date DATE ); CREATE TABLE products ( product_id INT PRIMARY KEY, product_name VARCHAR(100) ); CREATE TABLE order_items ( order_id INT, product_id INT, quantity INT, PRIMARY KEY (order_id, product_id), FOREIGN KEY (order_id) REFERENCES orders(order_id), FOREIGN KEY (product_id) REFERENCES products(product_id) );
Now, product_name
is stored in the products
table, and the order_items
table only contains information related to the order and product IDs, fulfilling 2NF.
A table is in 3NF if it is in 2NF and there is no transitive dependency. A transitive dependency occurs when a non-key column depends on another non-key column.
Example: Consider the following table, which violates 3NF because the employee_address
depends on employee_id
, and employee_state
depends on employee_address
:
CREATE TABLE employees ( employee_id INT PRIMARY KEY, employee_name VARCHAR(100), employee_address VARCHAR(255), employee_state VARCHAR(100) ); -- Sample Data (Not in 3NF) INSERT INTO employees VALUES (1, 'John Doe', '123 Main St', 'California'), (2, 'Jane Smith', '456 Oak St', 'Texas');
To bring the table into 3NF, we separate the address-related columns into a new table:
-- Corrected tables (3NF) CREATE TABLE employees ( employee_id INT PRIMARY KEY, employee_name VARCHAR(100), employee_address_id INT, FOREIGN KEY (employee_address_id) REFERENCES addresses(address_id) ); CREATE TABLE addresses ( address_id INT PRIMARY KEY, address VARCHAR(255), state VARCHAR(100) );
By removing the transitive dependency, the database design is now in 3NF.
A table is in BCNF if it is in 3NF and if, for every functional dependency, the left-hand side is a superkey (a set of attributes that uniquely identifies a record in a table).
Example: Consider the following table, which is in 3NF but violates BCNF because course_name
determines instructor_name
, but course_name
is not a superkey:
CREATE TABLE courses ( course_id INT PRIMARY KEY, course_name VARCHAR(100), instructor_name VARCHAR(100) ); -- Sample Data (Not in BCNF) INSERT INTO courses VALUES (1, 'Math 101', 'Dr. Smith'), (2, 'History 101', 'Dr. Johnson');
To bring the table into BCNF, we split it into two tables where course_name
becomes part of the primary key in the second table:
-- Corrected tables (BCNF) CREATE TABLE courses ( course_id INT PRIMARY KEY, course_name VARCHAR(100) ); CREATE TABLE instructors ( course_name VARCHAR(100), instructor_name VARCHAR(100), PRIMARY KEY (course_name) );
Now, the course_name
is part of the primary key in the instructors
table, satisfying BCNF.
Normalization is a key concept in relational database design. By applying 1NF, 2NF, 3NF, and BCNF, you can eliminate redundancy, prevent anomalies, and improve the integrity and efficiency of your database. While normalization is important for the structural integrity of the database, denormalization may be considered in specific situations where performance is critical, especially in read-heavy systems.