Thursday, December 4, 2008

An Introduction to Clustered and Non-Clustered Index Data Structures

When I first started using SQL Server as a novice, I was initially confused as to the differences between clustered and non-clustered indexes. As a developer, and new DBA, I took it upon myself to learn everything I could about these index types, and when they should be used. This article is a result of my learning and experience, and explains the differences between clustered and non-clustered index data structures for the DBA or developer new to SQL Server. If you are new to SQL Server, I hope you find this article useful.

As you read this article, if you choose, you can cut and paste the code I have provided in order to more fully understand and appreciate the differences between clustered and non-clustered indexes.

Part I: Non-Clustered Index

Creating a Table
To better explain SQL Server non-clustered indexes; let’s start by creating a new table and populating it with some sample data using the following scripts. I assume you have a database you can use for this. If not, you will want to create one for these examples.

Create Table DummyTable1
(
EmpId Int,
EmpName Varchar(8000)
)

When you first create a new table, there is no index created by default. In technical terms, a table without an index is called a “heap”. We can confirm the fact that this new table doesn’t have an index by taking a look at the sysindexes system table, which contains one for this table with an of indid = 0. The sysindexes table, which exists in every database, tracks table and index information. “Indid” refers to Index ID, and is used to identify indexes. An indid of 0 means that a table does not have an index, and is stored by SQL Server as a heap.

Now let’s add a few records in this table using this script:

Insert Into DummyTable1 Values (4, Replicate ('d',2000))
GO

Insert Into DummyTable1 Values (6, Replicate ('f',2000))
GO

Insert Into DummyTable1 Values (1, Replicate ('a',2000))
GO

Insert Into DummyTable1 Values (3, Replicate ('c',2000))
GO

Now, let’s view the contests of the table by executing the following command in Query Analyzer for our new table.

Select EmpID From DummyTable1
GO

Empid

4
6
1
3

As you would expect, the data we inserted earlier has been displayed. Note that the order of the results is in the same order that I inserted them in, which is in no order at all.

Now, let’s execute the following commands to display the actual page information for the table we created and is now stored in SQL Server.

dbcc ind(dbid, tabid, -1) – This is an undocumented command.

DBCC TRACEON (3604)
GO

Declare @DBID Int, @TableID Int
Select @DBID = db_id(), @TableID = object_id('DummyTable1')

DBCC ind(@DBID, @TableID, -1)
GO

This script will display many columns, but we are only interested in three of them, as shown below.

PagePID
IndexID
PageType

26408
0
10

26255
0
1

26409
0
1

Here’s what the information displayed means:

PagePID is the physical page numbers used to store the table. In this case, three pages are currently used to store the data.

IndexID is the type of index,

Where:
0 – Datapage
1 – Clustered Index
2 – Greater and equal to 2 is an Index page (Non-Clustered Index and ordinary index),
PageType tells you what kind of data is stored in each database,
Where:
10 – IAM (Index Allocation MAP)
1 – Datapage
2 – Index page
Now, let us execute DBCC PAGE command. This is an undocumented command.

DBCC page(dbid, fileno, pageno, option)
Where:
dbid = database id.
Fileno = fileno of the page. Usually it will be 1, unless we use more than one file for a database.
Pageno = we can take the output of the dbcc ind page no.
Option = it can be 0, 1, 2, 3. I use 3 to get a display of the data. You can try yourself for the other options.
Run this script to execute the command:
DBCC TRACEON (3604)
GO
DBCC page(@DBID, 1, 26408, 3)
GO
The output will be page allocation details.
DBCC TRACEON (3604)
GO
dbcc page(@DBID, 1, 26255, 3)
GO

The data will be displayed in the order it was entered in the table. This is how SQL stores the data in pages. Actually, 26255 & 26409 both display the data page.

I have displayed the data page information for page 26255 only. This is how MS SQL stores the contents in data pages as such column name with its respective value.

Record Type = PRIMARY_RECORD
EmpId = 4

EmpName = ddddddddddddddddddddddddddddddddddddddddddddddddddd
ddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd
ddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd
ddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd
ddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd

Record Type = PRIMARY_RECORD
EmpId = 6

EmpName = ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff

Record Type = PRIMARY_RECORD
EmpId = 1

EmpName = aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

This displays the exact data storage in SQL, without any index on table. Now, let’s go and create a unique non-clustered index on the EmpID column.

Creating a Non-Clustered Index
Now, we will create a unique non-clustered index on the empid column to see how it affects the data, and how the data is stored in SQL Server.

CREATE UNIQUE NONCLUSTERED INDEX DummyTable1_empid
ON DummyTable1 (empid)
GO

Now, execute the DBCC ind (dbid, tabid, -1)

DBCC TRACEON (3604)
GO

Declare @DBID Int, @TableID Int
Select @DBID = db_id(), @TableID = object_id('DummyTable1')
DBCC ind(@DBID, @TableID, -1)
GO

Here are the results:

PagePID
IndexID
PageType

26408
0
10

26255
0
1

26409
0
1

26411
2
10

26410
2
2

Now, we see two more rows than before, which now contains index page details. Page 26408 displays the page allocation details, and pages 26255 and 26409 display the data page details, as before.

In regard to the new pages, page 26411 displays the page allocation details of an index page and page 26410 displays the index page details.

MS SQL generates a page (pagetype = 10) for an index and explains the page allocation details for an index. It shows the number of index page have been occupied for an index.

Let us see what would be the output for page 26411, that is page type = 10



IAM: Single Page Allocations @0x308A608E

-----------------------------------------
Slot 0 = (1:26410)

Let us view page 26410 to see the index page details.

DBCC TRACEON (3604)
GO

DBCC page(10, 1, 26410, 3)
GO

SQL populates the index column data in order. The last column (?) is pointed to the row locator.

Here are the results, using two different methods:

Method I FileID
PageID
EMPID
?

1
26410
1
0x8F66000001000200

1
26410
3
0x2967000001000000

1
26410
4
0x8F66000001000000

1
26410
6
0x8F66000001000100

The row location display in one of two ways:

If the table does not have a clustered index, the row locator will be combination of fileno, pageno and the no of rows in a page.
If the table does have clustered index, the row location will be clustered index key value.
Non-clustered indexes are particularly handy when we want to return a single row from a table.

For example, to search for employee ID (empid = 3) in a table that has a non-clustered index on the empid column, SQL Server looks through the index to find an entry that lists the exact page and row in the table where the matching empid can be found, and then goes directly to that page and row. This greatly speeds up accessing the record in question.

Select EmpID, EmpName From DummyTable1 WHERE EMPID = 3

Now, let’s insert some more rows in our table and view the data page storage of our non-clustered index.

Insert Into DummyTable1 Values (10, Replicate ('j',2000))
GO

Insert Into DummyTable1 Values (2, Replicate ('b',2000))
GO

Insert Into DummyTable1 Values (5, Replicate ('e',2000))
GO

Insert Into DummyTable1 Values (8, Replicate ('h',2000))
GO

Insert Into DummyTable1 Values (9, Replicate ('i',2000))
GO

Insert Into DummyTable1 Values (7, Replicate ('g',2000))
GO

Now, let’s view the data in our table.

No comments: