Statistical Functions in SQL

SQL provides a set of powerful functions to perform statistical analysis on data stored in relational databases. These functions help users to compute and analyze various statistical measures like averages, sums, counts, and variances. In this article, we will explore some commonly used statistical functions in SQL with examples.

1. COUNT()

The COUNT() function is used to count the number of rows that match a specified condition. It is particularly useful for counting the number of records in a table or the number of records that satisfy certain criteria.

Example: Counting the Number of Employees

If you want to count the total number of employees in the employees table:

        SELECT COUNT(*) AS total_employees
        FROM employees;

This query will return the total number of rows (employees) in the table.

2. SUM()

The SUM() function is used to calculate the total sum of a numeric column. It is useful when you need to calculate the total sales, total expenses, or any other aggregate value.

Example: Calculating Total Sales

To calculate the total sales from the sales table:

        SELECT SUM(sales_amount) AS total_sales
        FROM sales;

This query will return the sum of all the values in the sales_amount column.

3. AVG()

The AVG() function computes the average value of a numeric column. It is often used to determine the average salary, average score, or average price from a set of data.

Example: Calculating the Average Salary

To calculate the average salary of employees in the employees table:

        SELECT AVG(salary) AS average_salary
        FROM employees;

This query will return the average salary of all employees in the table.

4. MIN() and MAX()

The MIN() and MAX() functions are used to find the minimum and maximum values in a column, respectively. These functions are useful for determining the smallest and largest values in your dataset.

Example: Finding the Minimum and Maximum Salary

To find the lowest and highest salary in the employees table:

        SELECT MIN(salary) AS lowest_salary,
               MAX(salary) AS highest_salary
        FROM employees;

This query will return the minimum and maximum salaries from the salary column.

5. VARIANCE() and STDDEV()

Variance and standard deviation are statistical measures that show the spread or dispersion of data. The VARIANCE() function calculates the variance, while the STDDEV() function calculates the standard deviation.

Example: Calculating Variance and Standard Deviation of Salary

To calculate the variance and standard deviation of the employees' salaries:

        SELECT VARIANCE(salary) AS salary_variance,
               STDDEV(salary) AS salary_stddev
        FROM employees;

This query will return the variance and standard deviation of the salary data in the salary column.

6. MEDIAN() (Approximation)

SQL does not have a built-in MEDIAN() function in most databases. However, you can approximate the median using window functions or custom queries. The median is the middle value in a dataset when arranged in ascending order.

Example: Calculating Median Salary (Approximation)

To calculate the median salary, you can use a query like this:

        SELECT salary
        FROM (SELECT salary,
                     ROW_NUMBER() OVER (ORDER BY salary) AS row_num,
                     COUNT(*) OVER () AS total_rows
              FROM employees) AS ranked_salaries
        WHERE row_num = (total_rows + 1) / 2;

This query calculates the middle value of the salary data by ordering the salaries and finding the value at the median position.

7. GROUP_CONCAT() / STRING_AGG() (Concatenation)

While not strictly a statistical function, the GROUP_CONCAT() or STRING_AGG() function is often used to aggregate data into a single string, useful for creating comma-separated lists or grouping values.

Example: Concatenating Employee Names

If you want to create a list of all employee names in each department, you can use GROUP_CONCAT() (MySQL) or STRING_AGG() (PostgreSQL):

        SELECT department,
               GROUP_CONCAT(employee_name) AS employees
        FROM employees
        GROUP BY department;

This query will return a list of employee names, concatenated for each department.

Conclusion

SQL provides a wide range of statistical functions that can be used to perform data analysis and gain insights from your data. These functions help to calculate totals, averages, counts, and other statistical measures. By combining these functions with filtering, grouping, and sorting, you can perform complex data analysis directly within your SQL queries.

Q3 Schools : India

Database Basics

DDL

DML

SQL Queries

Advance filtering

Functions

Grouping Data

Joins

Subqueries

Views

Indexing

Stored Procedures

Triggers

Concurrency Control

Database Security

Performance Optimization

Database Design

Advanced SQL

Data Analysis

NoSQL and New SQL