In a previous post, we discussed how to write a SELECT statement in T-SQL. We also had posted on the WHERE and ORDER BY statements. To finish this series on writing T-SQL, we will take a look at the GROUP BY, and HAVING statements.
Before we get to those statements, we will review writing a SELECT statement and also see some errors we could get when using aggregate functions. To start, here is a basic SELECT statement that is retrieving three columns from the Sales.SalesOrderDetail table in the AdventureWorks database.
And then we see the records that are returned.
Now let’s say our boss comes in and she says, “I would like to see the total for each order!” How would we do that? Let’s take this step by step and change the query to multiply the UnitPrice and OrderQty.
As we look at the result set, we notice that we now have a line total for each individual item that was purchased within each sales order. But it is not a total of each of the sales order. To get the total for the entire order we will need to use the SUM aggregate function.
Oh No! We received an error. This is because once we add an aggregate function for one column in a result set, that column is going to take a series of values to return a single value. However, the other columns, in this case, the SalesOrderID column, will still want to return the result of every record. Which will make the SalesOrderID invalid in the select list.
According to the error message, we can fix this in one of two ways. Either include the SalesOrderID field in an aggregate function or in a Group By clause. Let’s use the Count aggregate function on the SalesOrderID first.
Well, at least this time we did not receive an error. This code gives us a count of all the order details and the total of ALL orders. But, our boss wanted the total for each order. More specifically, we want to group our results by each SalesOrderID and return the sum for each of those orders. This is where our Group By clause is used. So let’s remove the Count function and place the SalesOrderID in the Group By clause.
Finally, we received the correct results. This is the total for each order. Now let’s take this one step further and filter this group by using the HAVING statement. But first, we should discuss the difference between the WHERE clause and the HAVING clause. The WHERE statement will filter records BEFORE the records are grouped or aggregated. The HAVING statement will filter the results AFTER the records have been grouped or aggregated.
The previous code statement returned 31,465 records as that was the number of orders that were being grouped. If we only wanted to see the first 12 orders we would use the WHERE SalesOrderID < 43671 to filter out the records before they were grouped.
However, if we want to filter the records AFTER they have been grouped, we use the HAVING statement. In this case of the first 12 records, we want to see how many orders were over 9,000. (We will end up with six records.)
NOTE: We are required to write out the SUM aggregate function in the HAVING statement and can not use the OrderTotal alias from the select list. This is because the SELECT statement is processed after the HAVING statement as noted at the top of this post.