Table Structures in SQL Server

Heap vs Clustered Indexes

As mentioned in a previous post, table structures in SQL Server will store records in data pages in either an unsorted order (Heap) or a sorted order (Clustered Index). If the data has not been provided a clustering key, the data would be unsorted, and the data structure would be considered a Heap. If a clustering key is provided, the data rows will be sorted by that key and would be considered a Clustered Index. Let’s take a closer look at this in SQL Server.

First, we will create a table named Accounting.BankAccounts and INSERT four records. You will notice that we have not created a Primary Key or any type of index for this table. Additionally, the values for the AcctID field have been inserted in random order.

If you are following along, make sure that the AcctID is set to NOT NULL to make it easier to add a Primary Key later in this demo.

CREATE SCHEMA Accounting Authorization dbo
CREATE TABLE BankAccounts
 (AcctID int NOT NULL,
  AcctName char(15),
  Balance money,
  ModifiedDate date)
GO

INSERT INTO Accounting.BankAccounts
VALUES(3,'Jack',500, GETDATE()),
	  (2, 'Diane', 750, GETDATE()),
	  (4, 'John', 650, GETDATE()),
	  (1, 'Dora', 725, GETDATE())
GO

When we run the SELECT statement you will see that the records are returned in an unsorted order. This is because the data is stored in a heap and the data is returned in the order they were inserted.

SELECT AcctID, AcctName, Balance, ModifiedDate
FROM Accounting.BankAccounts
SELECT records from a Heap

There are two other ways we can see that this data is in a heap. We can use two undocumented DBCC commands to look inside the actual data pages and or Execution Plan Table Operators which are discussed in a separate post. We will save the execution plans for another blog post. So, for now, let us focus on the Database Console Commands (DBCC).

We need to start with the DBCC TRACEON (3604) statement so that the results will be returned to the results pane. Then we run the DBCC IND statement to locate the data pages for the Accounting.BankAccounts table.

The first parameter for the DBCC IND statement is the database name. We put a 0 to indicate the current database. Next is the table name, Accounting.BankAccounts and the last parameter is how to view the data. We are using a -1 to show all of the indexes for this table.

DBCC TRACEON (3604) 
DBCC IND (0,'Accounting.BankAccounts',-1)

The first row of the DBCC IND displays the following information:

DBCC IND command
  • PageFID: Indicates this is the first page in the database.
  • PagePID: Indicates the Page ID of the page.
  • IAMFID: Displays the location of the IAM page for this file
  • IAMPID: Displays the location of the IAM page for this page
  • ObjectID: This is the Object ID value of the dbo.Accounting table.
  • IndexID: A value of 0 indicates the table is in a heap. A value of 1 indicates the table is in a clustered index. Any value greater than 1 indicates pages for a non-clustered index.

Two take-aways from this result is that the IndexID is 0 indicating the data is stored in a heap. Secondly, the data page has an ID of 320 that we will use for the DBCC Page command. This is a very small table of only four rows, so only one 8kb data page was needed.

Now let’s use the DBCC PAGE command to look inside data page 320. Again, the first parameter is the current database, the second parameter is the File ID, the third argument is the page number, and the final argument is to display everything inside the data page.

DBCC TRACEON (3604) 
DBCC PAGE (0, 1, 320, 3)
DBCC Pages in a Heap

In a blog post, it is not easy to slide up and down in a results pane; so, I just pulled out the relevant data above. Slot 0 inside a data page indicates the first row and you can see Column 1 or the AcctID field has a value of 3. Slot 1 indicates that the second record has a value of 2 in the AcctID field and Slot 2 indicates that the third record has a value of 4 for the first column.

The key point is that the data is not in sorted order, therefore the data is in a heap. Now we will add a Primary Key that will create a Clustering Key. This will store the data in the order of the clustering key, therefore, making it a Clustered Index. (Side Note: Just because a table has a Primary Key does not always mean the table is in a Clustered Index, but it is the default when you create the Primary Key.)

ALTER TABLE Accounting.BankAccounts
ADD CONSTRAINT pk_acctID PRIMARY KEY (AcctID)

When we run the SELECT statement again, we will notice that the records are returned in order by the AcctID field. This is because the table was rebuilt with a Clustered Index.

SELECT AcctID, AcctName, Balance, ModifiedDate
FROM Accounting.BankAccounts
SELECT results with Clustered Key

Now that we have added the Primary Key, we will re-run the DBCC IND and DBCC PAGE commands again. You will notice that the IndexID is now a 1 indicating that the data is stored in a Clustered Index. Additionally, the page numbers and ObjectID changed because the table was rebuilt to create the clustered index and put the data in sorted order.

DBCC TRACEON (3604) 
DBCC IND(0,'Accounting.BankAccounts',-1)
DBCC IND with Clustered Index

And when we run the DBCC PAGE Command on page 352 you will notice the rows are in order by Column 1 or AcctID field. Slot 0 has an AcctID of 1, Slot 1 has an AcctID of 2, Slot 2 has an AcctID of 3, etc…

DBCC TRACEON (3604) 
DBCC PAGE (0, 1, 352, 3)
DBCC Pages in a Clustered Index

Be the first to comment on "Table Structures in SQL Server"

Leave a comment

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.