Using the GROUP BY and HAVING Statements

SELECT statement processing order

In a previous post, we discussed how to write a SELECT statement in T-SQL. We also had posted on the WHERE and ORDER BY statements. To finish this series on writing T-SQL, we will take a look at the GROUP BY, and HAVING statements.

Before we get to those statements, we will review writing a SELECT statement and also see some errors we could get when using aggregate functions. To start, here is a basic SELECT statement that is retrieving three columns from the Sales.SalesOrderDetail table in the AdventureWorks database.

SELECT statement in T-SQL. group by having statements

And then we see the records that are returned.

Results of SELECT statement FROM Sales.SalesOrderDetail
group by having statements

Now let’s say our boss comes in and she says, “I would like to see the total for each order!” How would we do that? Let’s take this step by step and change the query to multiply the UnitPrice and OrderQty.

SELECT Statement in SQL group by having statements
SQL Results from Sales.SalesOrderDetail
group by having statements

As we look at the result set, we notice that we now have a line total for each individual item that was purchased within each sales order. But it is not a total of each of the sales order. To get the total for the entire order we will need to use the SUM aggregate function.

SQL Results and Aggregate Function from Sales.SalesOrderDetail
group by having statements
SalesOrderID is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Using group by having statements.

Oh No! We received an error. This is because once we add an aggregate function for one column in a result set, that column is going to take a series of values to return a single value. However, the other columns, in this case, the SalesOrderID column, will still want to return the result of every record. Which will make the SalesOrderID invalid in the select list.

According to the error message, we can fix this in one of two ways. Either include the SalesOrderID field in an aggregate function or in a Group By clause. Let’s use the Count aggregate function on the SalesOrderID first.

Aggregate functions in a SELECT Statement in SQL
group by having statements,
Using group by having statements.
Results of COUNT and SUM aggregate function

Well, at least this time we did not receive an error. This code gives us a count of all the order details and the total of ALL orders. But, our boss wanted the total for each order. More specifically, we want to group our results by each SalesOrderID and return the sum for each of those orders. This is where our Group By clause is used. So let’s remove the Count function and place the SalesOrderID in the Group By clause.

Using an Aggregate function and GROUP BY in T-SQL.
Using group by having statements.
T-SQL Results with a Group BY statement. Using group by having statements.

Finally, we received the correct results. This is the total for each order. Now let’s take this one step further and filter this group by using the HAVING statement. But first, we should discuss the difference between the WHERE clause and the HAVING clause. The WHERE statement will filter records BEFORE the records are grouped or aggregated. The HAVING statement will filter the results AFTER the records have been grouped or aggregated.

The previous code statement returned 31,465 records as that was the number of orders that were being grouped. If we only wanted to see the first 12 orders we would use the WHERE SalesOrderID < 43671 to filter out the records before they were grouped.

However, if we want to filter the records AFTER they have been grouped, we use the HAVING statement. In this case of the first 12 records, we want to see how many orders were over 9,000. (We will end up with six records.)

SELECT Statement using GROUP BY and HAVING. Using group by having statements.
Using group by having statements.

NOTE: We are required to write out the SUM aggregate function in the HAVING statement and can not use the OrderTotal alias from the select list. This is because the SELECT statement is processed after the HAVING statement as noted at the top of this post.

Be the first to comment on "Using the GROUP BY and HAVING Statements"

Leave a comment

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.