Last Updated:

SQL Group By Statement | Database Management

Create a table with data

For our example, we'll create a table that stores sales records of different products at different locations.

We'll call the table sales. This will be a simple representation of sales in stores: location name, product name, price and time of sale.

 

If we were to create such a table for a real application, we would use foreign keys to other tables (for example, locations or products). But to show the work of GROUP BY, we will create simple TEXT columns.

 

Let's create our table and enter some sales data into it:

CREATE TABLE sales( location TEXT, product TEXT, price DECIMAL, sold_at TIMESTAMP ); INSERT INTO sales(location, product, price, sold_at) VALUES (‘HQ’, ‘Coffee’, 2, NOW()), (‘HQ’, ‘Coffee’, 2, NOW() — INTERVAL ‘1 hour’), (‘Downtown’, ‘Bagel’, 3, NOW() — INTERVAL ‘2 hour’), (‘Downtown’, ‘Coffee’, 2, NOW() — INTERVAL ‘1 day’), (‘HQ’, ‘Bagel’, 2, NOW() — INTERVAL ‘2 day’), (‘1st Street’, ‘Bagel’, 3, NOW() — INTERVAL ‘2 day’ — INTERVAL ‘1 hour’), (‘1st Street’, ‘Coffee’, 2, NOW() — INTERVAL ‘3 day’), (‘HQ’, ‘Bagel’, 3, NOW() — INTERVAL ‘3 day’ — INTERVAL ‘1 hour’);

We have three locations: HQ, Downtown, and 1st Street.

We also have two products: Coffee and Bagel (coffee and bagels). We make sales with different values sold_at to show how many products were sold on different days and different times.

We had sales today, yesterday and the day before yesterday.

Writing GROUP BY sentences

 

The GROUP BY clause is written very simply. We use the group BY keywords and specify the fields by which the grouping should occur:

SELECT … FROM sales GROUP BY location;

This simple query groups the data in the sales table by the location column.

Okay, we've grouped them together, but what do we put in our SELECT?

 

Obviously, we need to make a selection of the location. We group the data by this column and at least want to see the names of the created groups:

SELECT location FROM sales GROUP BY location;

The result will be three of our locations:

location ———— 1st Street HQ Downtown (3 rows)

If we look at the raw data in our table (SELECT * FROM sales;), we can see that we have four rows with the location HQ, two rows with the location Downtown and two more rows with the location 1st Street:

product | location | price | sold_at ———+————+——-+—————————- Coffee | HQ | 2 | 2020-09-01 09:42:33.085995 Coffee | HQ | 2 | 2020-09-01 08:42:33.085995 Bagel | Downtown | 3 | 2020-09-01 07:42:33.085995 Coffee | Downtown | 2 | 2020-08-31 09:42:33.085995 Bagel | HQ | 2 | 2020-08-30 09:42:33.085995 Bagel | 1st Street | 3 | 2020-08-30 08:42:33.085995 Coffee | 1st Street | 2 | 2020-08-29 09:42:33.085995 Bagel | HQ | 3 | 2020-08-29 08:42:33.085995 (8 rows)

By grouping the data by the location column, our database takes these input rows and identifies unique locations among them. These unique locations serve as "groups".

What about the rest of the columns in the table?

If we try to select a product column that we didn't group on,

SELECT location, product FROM sales GROUP BY location;

we'll get this error:

ERROR: column «sales.product» must appear in the GROUP BY clause or be used in an aggregate function

The problem is that we took eight lines and tried to squeeze them into three.

 

We can't just return the remaining columns as usual, because we used to have eight rows, and now there are only three.

What to do with the remaining five rows of data? What data from the eight rows should be displayed in three rows?

There is no clear and precise answer to these questions.

To use the rest of the table data, we have to separate the data from the remaining columns into our three location groups.

This means that we have to aggregate this data or perform some calculations to get some summary information about the remaining data.

Aggregate functions (COUNT, SUM, AVG)

 

If we decide to group the data, we can aggregate the data of the remaining columns. for example, we can count the number of rows in each group, sum the individual values in the group, or derive some average value (also by group).

To begin with, let's find the number of sales for each location.

Because each record in the sales table is a record of a single sale, the number of sales by location will be equal to the number of rows in each group (grouped by location).

To get the desired result, we need to apply the aggregate function COUNT() - so we will calculate the number of rows in each group.

SELECT location, COUNT(*) AS number_of_sales FROM sales GROUP BY location;

We use COUNT(*) to read all the incoming rows in the group.

(COUNT() also works with expressions, but has slightly different behavior.)

Here's how this database fulfills our query:

  • FROM sales — First, get all the records from the sales table.
  • GROUP BY LOCATION — Then define unique groups when grouped by location (i.e., unique locations).
  • SELECT ... — Finally, select the name of the location and count the number of rows in this group.

To make the output more readable, we give the number of lines an alias - using AS number_of_sales. It looks like this:

 

location | number_of_sales ————+—————— 1st Street | 2 HQ | 4 Downtown | 2 (3 rows)

 
 

The 1st Street location has two sales, HQ has four, and Downtown has two.

As you can see, here we took the data of a column that was not grouped, and from eight separate rows we isolated useful summary information for each location, namely, the number of sales.

SUM

Instead of counting the number of rows in a group, we could summarize the information for the group. For example, get the total amount of money raised for each location.

To do this, we will use the SUM() function:

SELECT location, SUM(price) AS total_revenue FROM sales GROUP BY location;

Instead of counting the number of lines in each group, we added up the number of dollars received as a result of each sale, and derived the total income by location:

location | total_revenue ————+————— 1st Street | 5 HQ | 9 Downtown | 5 (3 rows)

AVG

The AVG() function allows you to find the average value (AVG from Average is average). Let's find the average amount of revenue by location. To do this, simply replace the SUM() function with the AVG() function:

SELECT location, AVG(price) AS average_revenue_per_sale FROM sales GROUP BY location;

Work with multiple groups

So far, we have worked with one group - by location. What if we need to break down the resulting groups into subgroups?

Recall the example of the scenario given at the beginning of the article, with a grouping of people by eye color and country of origin. Let's try to find the number of sales of each product in each individual location (for example, how many coffee sales there were, and how many were bagels on 1st Street, HQ and Downtown).

To do this, we need to add a second grouping condition to our GROUP BY offer:

SELECT … FROM sales GROUP BY location, product;

By adding the name of another column to our GROUP BY offering, we have divided our location groups into product subgroups.

Since we are now also grouping by the product column, we can return the result using our SELECT!

(For ease of reading, I've also added ORDER BY clauses to the request.)

SELECT location, product FROM sales GROUP BY location, product ORDER BY location, product;

In the results of our new grouping, we see unique combinations of locations and products:

location | product ————+——— 1st Street | Bagel 1st Street | Coffee Downtown | Bagel Downtown | Coffee HQ | Bagel HQ | Coffee (6 rows)

Okay, we have our groups, but what are we going to do with the data in the other columns?

We can find the number of sales of a particular product in each location using all the same aggregate functions:

SELECT location, product, COUNT(*) AS number_of_sales FROM sales GROUP BY location, product ORDER BY location, product; location | product | number_of_sales ————+———+—————— 1st Street | Bagel | 1 1st Street | Coffee | 1 Downtown | Bagel | 1 Downtown | Coffee | 1 HQ | Bagel | 2 HQ | Coffee | 2 (6 rows)

(Task "with an asterisk": find the total revenue (amount) for each product in each location).

 

Syntax GROUP BY

SELECT column_name(s) FROM table_name WHERE condition GROUP BY column_name(s) ORDER BY column_name(s);

Example SQL GROUP BY

The following SQL statement shows the number of users in each country:

Example:

SELECT COUNT(use_id), country FROM users GROUP BY country;

 

The following SQL statement shows the number of users in each country, sorted by high and low:

Example:

SELECT COUNT(user_id), country FROM users GROUP BY country ORDER BY COUNT(user_id) DESC;

GROUP BY с JOIN

The following SQL statement indicates the number of orders shipped by each delivery service:

Example:

SELECT delivery.name, COUNT(invoice.delivery_id) AS orders FROM invoice LEFT JOIN delivery ON invoice.delivery_id = delivery.delivery_id GROUP BY name;

Group data using the group by query

And at the very beginning, let's look at the syntax of group by, i.e. where to write this construction:

Syntax:

Select Aggregate Functions

From источник

Where Team Conditions

Group by поля группировки

Having Conditions for Aggregate Functions

Order by поля сортировки

Now, if we need to sum up all the funds of an employee without using a grouping, we will send this request:

SELECT SUM(summa)as summa FROM test_table WHERE name=’Ivanov

 

And if you need to sum up another employee, then we just change the condition. Agree, if there are a lot of such employees, why summarize each, and this is somehow not clear, so the operator group by comes to our aid. We write a request:

SELECT SUM(summa)as summa, name FROM test_table GROUP BY name

 

As you noticed, we do not write any conditions, and we display all employees with a sum amount of money at once, which is more visual.

Note:Just to note that how many fields we write in the query (i.e. grouping fields), in addition to aggregate functions, the same number of fields we write in the group by construct. In our example, we output one field, so in group by we specified only one field (name), if we displayed several fields, then they would all have to be specified in the group by construct (in subsequent examples you will see this).

You can also use other functions, for example, to calculate how many times money has been received to an employee with the total amount of funds received. To do this, in addition to the sum function, we will also use the count function.

SELECT SUM(summa)as [Total cash], COUNT(*) as [Number of receipts], Name [Employee] FROM test_table GROUP BY name

 

But let's say for the boss this is not enough, they still ask, to sum up as well, but also with the grouping on the basis, i.e. what kind of money it is (salary or bonus), for this we simply add another field to the grouping, and for better perception we will add sorting by employee, and we will get the following:

SELECT SUM(summa)as [Total cash], COUNT(*) as [Number of receipts], Name [Employee] , Priz [Source] FROM test_table GROUP BY name, priz ORDER BY name

 

Now we have everything displayed, i.e. how much money was received by the employee, how many times, as well as from what source.

 

And now for consolidation, let's write an even more complex request with a grouping, but we will also add the names of this source, since you will agree on the identifiers of the feature it is not clear from which source the funds came. To do this, we use the case construct.

SELECT SUM(summa) AS [Total Cash], COUNT(*) AS [Number of Income], Name [Employee], CASE WHEN priz = 1 then 'Salary' WHEN priz = 2 then 'Bonus' ELSE 'No Source' END AS [Source] FROM test_table GROUP BY name, priz ORDER BY name

 

Now everything is quite clear and not so difficult, even for beginners.

Also, let's touch on the conditions for the final results of the aggregate functions (having). In other words, we add a condition not for the selection of the rows themselves, but for the final value of the functions, in our case it is sum or count. For example, we need to withdraw all the same, but only those who have "total cash" more than 200. To do this, add the having condition:

SELECT SUM(summa)as [Total Cash], COUNT(*) as [Number of Income], Name [Employee], CASE WHEN priz = 1 then 'Salary' WHEN priz = 2 then 'Bonus' ELSE 'No Source' END AS [Source] FROM test_table GROUP BY name, priz - group HAVING SUM(summa) > 200 - select ORDER BY name - sort

 

Now we have all the sum(summa) values that are greater than 200, it's simple.

Using the GROUP BY construct with rollup operation

How to use the GROUP BY construct to obtain subtotals has already been shown. By using the GROUP BY construct with the ROLLUP operation, however, it is possible to obtain both intermediate totals and totals and therefore generate intermediate aggregate values at any level. In other words, the ROLLUP operation allows you to get aggregate values for each group at individual levels. The intermediate summary rows and the final summary rows are called superaggregate rows.

The following listing shows an example of how to use the GROUP BY construct with the ROLLUP operation.

SQL> SELECT Year,Country,SUM(Sales) AS Sales FROM Company_Sales GROUP BY ROLLUP (Year,Country); YEAR COUNTRY SALES ——— ——— ——- 1997 France 3990 1997 USA 13090 1997 17080 1998 France 4310 1998 USA 13900 1998 18210 1999 France 4570 1999 USA 14670 1999 19240 54530 /* Так выглядит конечное суммарное значение */ SQL>

Using the GROUP BY construct with the CUBE operation

The CUBE operation can be considered an extension of the ROLLUP operation because it helps extend the standard capabilities of the GROUP BY construct in Oracle. It calculates all possible subtotals in a GROUP BY operation. In the previous example, the ROLLUP operation returned interim totals by year. By using the CUBE operation, it is possible to obtain totals not only for years, but also for the whole country. Here's a simple example:

SQL> SELECT department_id, job_id, SUM(salary) 2 FROM employees 3 GROUP BY CUBE (department_id, job_id); DEPARTMENT_ID JOB_ID SUM(SALARY) ————- ——— ———— 10 AD_ASST 44000 20 MK_MAN 130000 20 MK_REP 60000 30 PU_MAN 110000 30 PU_CLERK 139000 . . . SQL>

Using the GROUP BY construct with the GROUPING operation

As shown earlier, the ROLLUP operation allows you to obtain super-aggregate intermediate and summary total data. The GROUPING operation in the GROUP BY construct helps distinguish between columns with superaggregate intermediate and summary totals and other data in rows.

Using GROUP BY with GROUPING SETS

 

The GROUPING SETS operation allows you to group many sets of columns when calculating aggregates such as sums. The following is an example that demonstrates how to use this operation to calculate aggregates and then distribute them to three such groups: (year, region, item), (year, item), and (region, item). The GROUPING SETS operation eliminates the need for inefficient UNION ALL operations.

SQL> SELECT year, region, item, sum(sales) FROM regional_salesitem GROUP BY GROUPING SETS (( year, region, item), (year, item), (region, item));

Using the GROUP BY construct with the HAVING operation

The HAVING operation allows you to limit or exclude the results of the GROUP BY operation, that is, in fact, to impose a WHERE condition on the GROUP BY result set. In the following example, the HAVING operation limits query results to only those departments where the maximum salary exceeds 20,000:

SQL> SELECT department_id, max(salary) 2 FROM employees 3 GROUP BY department_id 4* HAVING MAX(salary)>20000; DEPARTMENT_ID MAX(SALARY) ————- ———— 90 24000 SQL>

Grouping with aggregate functions

Aggregate functions COUNT, SUM, AVG, MAX, MIN are used to calculate the corresponding aggregate value for the entire set of rows for which a certain column is shared.

Example 4. Display the number of books issued by each author. The request will be as follows:

SELECT Author, COUNT(*) AS InUse FROMBookinuseGROUP BY Author

The result of the query will be the following table:

AuthorInUse
NULL1
Gogol1
Ilf and Petrov1
Mayakovsky1
Parsnip2
Pushkin3
Thick3
Chekhov5

Example 5. Display the number of workbooks issued to each user. The request will be as follows:

SELECT Customer_ID, COUNT(*) AS InUse FROMBookinuseGROUP BY Customer_ID

The result of the query will be the following table:

User_IDInUse
181
313
474
652
1203
2053

.

Updating Aggregate Functions

Often— when you're working with a database — you don't want to see the actual data in the database. Instead, you may need information about the data. For example, you can find out the number of unique products your business sells, or the maximum score on the leaderboard.

SQL has several built-in functions that allow you to get this information. These are called aggregate functions.

For example, suppose you want to know how many employees are trading partners, you can use the COUNT function. The COUNT function counts and returns the number of rows that match a specific set of criteria. Other aggregate features include SUM, AVG, MIN, and MAX.

When should I use GROUP BY in SQL?

The GROUP BY clause is only needed when you want to get more information than what the aggregate function returns. We discussed this a little bit earlier.

If you want to know the number of your customers, you only need to fulfill a regular query. Here's an example of a query that will return this information:

SELECT COUNT(name) FROM customers;

Our query groups the result and returns:

count
7

(Row 1)

If you want to know how many customers are included in each of your loyalty plans, you will need to use the GROUP BY operator. Here's an example of a query that can get a list of loyalty plans and the number of customers for each plan:

SELECT loyalty_plan, COUNT(loyalty_plan)
FROM customers
GROUP BY loyalty_plan;

Our request collects data. Our query then returns:

loyalty_plancount
Gold1
No one3
Silver1
Bronze2

(4 rows)

SQL Group By in multiple columns

If we wanted to, we could run GROUP BY on multiple columns. For example, suppose we wanted to get a list of employees with specific positions in each branch. We could get this data using the following query:

SELECT branch, title, COUNT(title)
FROM employees
GROUP BY branch, title;

Our query result set shows:

See also: PostgreSQL array per string

ramifytitlecount
StamfordSales Officer1
AlbanyVice President, Sales1
San FranciscoSales Officer1
San FranciscoSenior Sales Specialist1
AlbanyChief Marketing Officer1
BostonSales Officer2

(6 rows)

Our request creates a list of titles that each employee owns. We see the number of people holding that title. Our data is grouped by the industries in which each employee works and their positions.