In a previous post, we discussed how to write a SELECT statement in T-SQL. We also had posted on the WHERE and ORDER BY statements. To finish this series on writing T-SQL, we will look at the GROUP BY, and HAVING statements.
Writing a SELECT statement
Before we get to those statements, we will review writing a SELECT statement and see some errors we could get when using aggregate functions. To start, here is a basic SELECT statement that is retrieving three columns from the Sales.SalesOrderDetail table in the AdventureWorks database.
USE AdventureWorks2016 GO SELECT SalesOrderID, UnitPrice, OrderQty FROM Sales.SalesOrderDetail
And then we see the records that are returned.
Now let’s say our boss comes in and she says, “I would like to see the total for each order!” How would we do that? Let’s take this step by step and change the query to multiply the UnitPrice and OrderQty. We will also provide this calculation with the OrderTotal alias.
SELECT SalesOrderID, (UnitPrice * OrderQty) as OrderTotal FROM Sales.SalesOrderDetail
Adding Aggregate Functions
As we look at the result set, we notice that we now have a line total for each individual item that was purchased within each sales order. But it is not a total of each of the sales orders. To get the total for the entire order we will need to use the SUM aggregate function.
SELECT SalesOrderID, SUM(UnitPrice * OrderQty) as OrderTotal FROM Sales.SalesOrderDetail
Oh No! We received an error. This is because once we added an aggregate function for one column in a result set and that column is going to use a series of values to return a single result. However, the other column, in this case, the SalesOrderID column, will still want to return the result of every record. Which will make the SalesOrderID invalid in the select list.
According to the error message, we can fix this in one of two ways. Either include the SalesOrderID field in an aggregate function or in a Group By clause. Let’s use the Count aggregate function on the SalesOrderID first.
SELECT COUNT(SalesOrderID), SUM(UnitPrice * OrderQty) as OrderTotal FROM Sales.SalesOrderDetail
Well, at least this time we did not receive an error. This code gave us a count of ALL the order details and the total of ALL orders. However, our boss wanted the total for each order. More specifically, we want to group our results by each SalesOrderID and return the sum for each of those orders.
Adding a GROUP BY statement
This is where our Group By clause is used. So, let’s remove the Count function and place the SalesOrderID in the Group By clause.
SELECT SalesOrderID, SUM(UnitPrice * OrderQty) as OrderTotal FROM Sales.SalesOrderDetail GROUP BY SalesOrderID
Adding a HAVING statement
Finally, we received the correct results. This is the total for each order. Now let’s take this one step further and filter this result set by using the HAVING statement. But first, we should discuss the difference between the WHERE clause and the HAVING clause. The WHERE statement will filter records BEFORE the records are grouped or aggregated. The HAVING statement will filter the results AFTER the records have been grouped or aggregated.
The previous code statement returned 31,465 records as that was the number of orders that were being grouped. If we only wanted to see the first 12 orders, we would use the WHERE SalesOrderID < 43671 to filter out the records before they were grouped.
However, if we want to filter the records AFTER they have been grouped, we use the HAVING statement. In this case of the first twelve records, we want to see how many orders were over 9,000. (We will end up with six records.)
SELECT SalesOrderID, SUM(UnitPrice * OrderQty) as OrderTotal FROM Sales.SalesOrderDetail WHERE SalesOrderID < 43670 GROUP BY SalesOrderID HAVING SUM(UnitPrice * OrderQty) > 10000 ORDER BY OrderTotal
NOTE: We are required to write out the SUM aggregate function in the HAVING statement and cannot use the OrderTotal alias from the select list. This is because the SELECT statement is processed after the HAVING statement as noted at the top of this post. However, we can use the OrderTotal alias in the ORDER BY statement as it is processed after the SELECT statement.