Using GROUP BY and HAVING

SQL Order of Operations

In a previous post, we discussed how to write a SELECT statement in T-SQL. We also had posted on the WHERE and ORDER BY statements. To finish this series on writing T-SQL, we will look at the GROUP BY, and HAVING statements.

Writing a SELECT statement

Before we get to those statements, we will review writing a SELECT statement and see some errors we could get when using aggregate functions. To start, here is a basic SELECT statement that is retrieving three columns from the Sales.SalesOrderDetail table in the AdventureWorks database.

USE AdventureWorks2016
GO

SELECT SalesOrderID, UnitPrice, OrderQty
FROM Sales.SalesOrderDetail

And then we see the records that are returned.

Results of SELECT statement FROM Sales.SalesOrderDetail
group by having statements

Now let’s say our boss comes in and she says, “I would like to see the total for each order!” How would we do that? Let’s take this step by step and change the query to multiply the UnitPrice and OrderQty. We will also provide this calculation with the OrderTotal alias.

SELECT SalesOrderID, (UnitPrice * OrderQty) as OrderTotal
FROM Sales.SalesOrderDetail
SQL Results from Sales.SalesOrderDetail
group by having statements

Adding Aggregate Functions

As we look at the result set, we notice that we now have a line total for each individual item that was purchased within each sales order. But it is not a total of each of the sales orders. To get the total for the entire order we will need to use the SUM aggregate function.

SELECT SalesOrderID, SUM(UnitPrice * OrderQty) as OrderTotal
FROM Sales.SalesOrderDetail
Aggregate Error

Oh No! We received an error. This is because once we added an aggregate function for one column in a result set and that column is going to use a series of values to return a single result. However, the other column, in this case, the SalesOrderID column, will still want to return the result of every record. Which will make the SalesOrderID invalid in the select list.

According to the error message, we can fix this in one of two ways. Either include the SalesOrderID field in an aggregate function or in a Group By clause. Let’s use the Count aggregate function on the SalesOrderID first.

SELECT COUNT(SalesOrderID), SUM(UnitPrice * OrderQty) as OrderTotal
FROM Sales.SalesOrderDetail
Results of COUNT and SUM aggregate function

Well, at least this time we did not receive an error. This code gave us a count of ALL the order details and the total of ALL orders. However, our boss wanted the total for each order. More specifically, we want to group our results by each SalesOrderID and return the sum for each of those orders.

Adding a GROUP BY statement

This is where our Group By clause is used. So, let’s remove the Count function and place the SalesOrderID in the Group By clause.

SELECT SalesOrderID, SUM(UnitPrice * OrderQty) as OrderTotal
FROM Sales.SalesOrderDetail
GROUP BY SalesOrderID
T-SQL Results with a Group BY statement. Using group by having statements.

Adding a HAVING statement

Finally, we received the correct results. This is the total for each order. Now let’s take this one step further and filter this result set by using the HAVING statement. But first, we should discuss the difference between the WHERE clause and the HAVING clause. The WHERE statement will filter records BEFORE the records are grouped or aggregated. The HAVING statement will filter the results AFTER the records have been grouped or aggregated.

The previous code statement returned 31,465 records as that was the number of orders that were being grouped. If we only wanted to see the first 12 orders, we would use the WHERE SalesOrderID < 43671 to filter out the records before they were grouped.

However, if we want to filter the records AFTER they have been grouped, we use the HAVING statement. In this case of the first twelve records, we want to see how many orders were over 9,000. (We will end up with six records.)

SELECT SalesOrderID, SUM(UnitPrice * OrderQty) as OrderTotal
FROM Sales.SalesOrderDetail
WHERE SalesOrderID < 43670
GROUP BY SalesOrderID 
HAVING SUM(UnitPrice * OrderQty) > 10000
ORDER BY OrderTotal
Using group by having statements.

NOTE: We are required to write out the SUM aggregate function in the HAVING statement and cannot use the OrderTotal alias from the select list. This is because the SELECT statement is processed after the HAVING statement as noted at the top of this post. However, we can use the OrderTotal alias in the ORDER BY statement as it is processed after the SELECT statement.

Be the first to comment on "Using GROUP BY and HAVING"

Leave a comment

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.