The DISTINCT keyword in SQL is used to remove duplicate values from the result set of a query. It ensures that only unique values are returned, which is particularly useful when you want to eliminate redundancy and retrieve distinct data.
The DISTINCT keyword is used in a SELECT statement to filter out duplicate records. It applies to all columns specified in the query, meaning that only unique combinations of those columns will appear in the results.
SELECT DISTINCT column1, column2, ... FROM table_name;
In this syntax, the query will return only unique combinations of the values in the specified columns.
When you use DISTINCT with a single column, SQL will return only unique values from that column, eliminating any duplicates.
-- Select all unique job titles from the Employees table SELECT DISTINCT JobTitle FROM Employees;
This query retrieves the distinct job titles from the 'Employees' table. If there are multiple employees with the same job title, it will only return that title once.
When you use DISTINCT with multiple columns, SQL will return unique combinations of the values from all specified columns. A combination is considered unique if no other row has the same values across all the specified columns.
-- Select all unique combinations of department and job title SELECT DISTINCT Department, JobTitle FROM Employees;
This query retrieves unique pairs of 'Department' and 'JobTitle' from the 'Employees' table. If multiple employees share the same department and job title, that pair will appear only once in the result set.
You can use the DISTINCT keyword in conjunction with aggregate functions such as COUNT to count the number of unique values in a column or combination of columns.
-- Count the number of unique job titles SELECT COUNT(DISTINCT JobTitle) FROM Employees;
This query returns the count of unique job titles in the 'Employees' table. It will count how many different job titles exist, excluding duplicates.
You can combine the DISTINCT keyword with the ORDER BY clause to remove duplicates and then sort the result set according to specific criteria.
-- Select unique job titles and sort them alphabetically SELECT DISTINCT JobTitle FROM Employees ORDER BY JobTitle;
This query retrieves unique job titles from the 'Employees' table and sorts them in ascending alphabetical order.
When DISTINCT is used, NULL values are treated as a unique value. If a column contains multiple NULL values, they will be counted as distinct from each other.
-- Select all unique departments, including NULL values SELECT DISTINCT Department FROM Employees;
This query retrieves all unique departments, including NULL values. If some employees do not have a department assigned (i.e., their 'Department' field is NULL), NULL will be treated as a distinct value.
While DISTINCT is very useful for eliminating duplicates, it can have performance implications, especially when dealing with large datasets. Sorting the data and performing the distinct operation can require more processing time. It's important to use DISTINCT only when necessary and consider other ways to optimize your queries if performance becomes an issue.
-- Select all unique customer cities from the Customers table SELECT DISTINCT City FROM Customers;
This query will retrieve the distinct 'City' values from the 'Customers' table. However, if the table is large, SQL may need to sort the data to identify unique values, which can impact query performance.
The DISTINCT keyword is an essential tool in SQL for removing duplicates from your query results. Whether you're working with a single column or multiple columns, DISTINCT ensures that only unique values are returned. It's especially useful when you need to count distinct values, remove redundancy, or ensure that your results are not repeated. By understanding how to use DISTINCT, you can write more efficient and effective SQL queries.