Working with DISTINCT to Remove Duplicates in SQL

The DISTINCT keyword in SQL is used to remove duplicate values from the result set of a query. It ensures that only unique values are returned, which is particularly useful when you want to eliminate redundancy and retrieve distinct data.

1. Basic Syntax of DISTINCT

The DISTINCT keyword is used in a SELECT statement to filter out duplicate records. It applies to all columns specified in the query, meaning that only unique combinations of those columns will appear in the results.

Syntax:

      SELECT DISTINCT column1, column2, ...
      FROM table_name;

In this syntax, the query will return only unique combinations of the values in the specified columns.

2. Removing Duplicates from a Single Column

When you use DISTINCT with a single column, SQL will return only unique values from that column, eliminating any duplicates.

Example: Removing Duplicates from a Single Column

      -- Select all unique job titles from the Employees table
      SELECT DISTINCT JobTitle
      FROM Employees;

This query retrieves the distinct job titles from the 'Employees' table. If there are multiple employees with the same job title, it will only return that title once.

3. Removing Duplicates from Multiple Columns

When you use DISTINCT with multiple columns, SQL will return unique combinations of the values from all specified columns. A combination is considered unique if no other row has the same values across all the specified columns.

Example: Removing Duplicates from Multiple Columns

      -- Select all unique combinations of department and job title
      SELECT DISTINCT Department, JobTitle
      FROM Employees;

This query retrieves unique pairs of 'Department' and 'JobTitle' from the 'Employees' table. If multiple employees share the same department and job title, that pair will appear only once in the result set.

4. DISTINCT with COUNT

You can use the DISTINCT keyword in conjunction with aggregate functions such as COUNT to count the number of unique values in a column or combination of columns.

Example: Counting Distinct Values

      -- Count the number of unique job titles
      SELECT COUNT(DISTINCT JobTitle)
      FROM Employees;

This query returns the count of unique job titles in the 'Employees' table. It will count how many different job titles exist, excluding duplicates.

5. DISTINCT with ORDER BY

You can combine the DISTINCT keyword with the ORDER BY clause to remove duplicates and then sort the result set according to specific criteria.

Example: Using DISTINCT with ORDER BY

      -- Select unique job titles and sort them alphabetically
      SELECT DISTINCT JobTitle
      FROM Employees
      ORDER BY JobTitle;

This query retrieves unique job titles from the 'Employees' table and sorts them in ascending alphabetical order.

6. Using DISTINCT with NULL Values

When DISTINCT is used, NULL values are treated as a unique value. If a column contains multiple NULL values, they will be counted as distinct from each other.

Example: DISTINCT with NULL Values

      -- Select all unique departments, including NULL values
      SELECT DISTINCT Department
      FROM Employees;

This query retrieves all unique departments, including NULL values. If some employees do not have a department assigned (i.e., their 'Department' field is NULL), NULL will be treated as a distinct value.

7. Performance Considerations when Using DISTINCT

While DISTINCT is very useful for eliminating duplicates, it can have performance implications, especially when dealing with large datasets. Sorting the data and performing the distinct operation can require more processing time. It's important to use DISTINCT only when necessary and consider other ways to optimize your queries if performance becomes an issue.

Example: Performance Considerations

      -- Select all unique customer cities from the Customers table
      SELECT DISTINCT City
      FROM Customers;

This query will retrieve the distinct 'City' values from the 'Customers' table. However, if the table is large, SQL may need to sort the data to identify unique values, which can impact query performance.

8. Conclusion

The DISTINCT keyword is an essential tool in SQL for removing duplicates from your query results. Whether you're working with a single column or multiple columns, DISTINCT ensures that only unique values are returned. It's especially useful when you need to count distinct values, remove redundancy, or ensure that your results are not repeated. By understanding how to use DISTINCT, you can write more efficient and effective SQL queries.

Q3 Schools : India

Database Basics

DDL

DML

SQL Queries

Advance filtering

Functions

Grouping Data

Joins

Subqueries

Views

Indexing

Stored Procedures

Triggers

Concurrency Control

Database Security

Performance Optimization

Database Design

Advanced SQL

Data Analysis

NoSQL and New SQL