Wrox Home  
Search P2P Archive for: Go

  Return to Index  

sql_language thread: Clustering Index


Message #1 by "Ray" <ray@l...> on Mon, 4 Nov 2002 16:20:14 -0600
Excellent Jeff.

Sat the MS course 2 weeks ago, and this explains it a hell of a lot better
than the instructor did!

Cheers,
Andrew

-----Original Message-----
From: Jeff Mason [mailto:je.mason@a...]
Sent: Tuesday, 5 November 2002 10:17 AM
To: sql language
Subject: [sql_language] RE: Clustering Index


This discussion is about SQL Server, even though this is a generic SQL
language list.  I can't speak about other products and how they may (or may
not) handle this sort of thing.

All indexes are structured as B-Trees.  As such the keys of an index are
organized into multiple levels, starting with a root level.  The entries in
the root level, the keys of the index, point to lower intermediate levels
which in turn point to entries in still lower levels and so on until at the
lowest level, called the leaf nodes, the entries contain a pointer into the
data page which stores the row data for the index entry.  There is one entry
in the leaf node for every index key value and thus for every data row.  At
higher index levels there is one entry for each index page at the next lower
level and this works its way up until you get to the single root level index
page.

The number of levels involved is a function of the number of rows of data in
the table, and the size of the index key.  The bigger the key, the fewer the
number of entries in an index page, and the more rows, the more leaf nodes
are required.  For every index, all index entries are stored in key order.

When a clustered index exists for a table, the leaf nodes are the actual
data pages.  Thus, the data is stored within the index structure, so once
you traverse the tree looking for a key, when you find it, the data is right
there.  For all other indexes, the leaf nodes just contain the clustered
index key.

So, the first observation we can make is that access via the clustered index
is quite a bit faster than by a non clustered index.  This is because for
nonclustered indexes, the entry stored in the leaf node is the clustered
index key, so to retrieve a given data row, we must first traverse the index
tree to obtain the clustered index key value, and then traverse the
clustered index to get the data row.  For the clustered index, though, we're
right there.

The next observation is that since the keys of an index are stored in key
order, the data rows in the clustered index are stored physically in
clustered key order.  This means that an ORDER BY on the clustered index
will be fast.  It also means that range queries (e.g. key_value BETWEEN 1
AND 1000) are particularly efficient.

By default, the PRIMARY KEY of a table will be set as the clustered index.
This may or may not be a good thing.  For example, many people use an
IDENTITY column as the primary key.  Since identity values are monotonically
increasing, new rows inserted into a table will all be inserted into the
last data page.  On a heavily loaded system, this will result in a "hot
spot" on that data page, and performance will suffer, as connections bump
into the locks acquired by each other.  In such a case, it may be better to
define the clustered index on some other column, which more uniformly
distributes the updates across the domain of key values.

If there is only one index on the table, it almost always should be
clustered, but deciding which columns should comprise an index, and how many
indexes there should be is a bit of an art, as careful analysis of the
selectiveness of queries and update/insert/delete needs of the application
must be done.  Not enough indexes means queries can be slow, too many and
update/insert/delete will slow down because it takes time to maintain all
the indexes.

--
Jeff Mason              Custom Apps, Inc.
Jeff@c...


-----Original Message-----
From: Ray [mailto:ray@l...]
Sent: Monday, November 04, 2002 5:20 PM
To: sql language
Subject: [sql_language] Clustering Index


Can someone give me a good definition and some guidelines to Clustering
Indexes? I have read in three different books about them and they conflict
in some places and generally leave me confused. When should I use them? When
should I not?

Thanks
Ray






IMPORTANT - PLEASE READ ******************** 
This email and any files transmitted with it are confidential and may 
contain information protected by law from disclosure. 
If you have received this message in error, please notify the sender 
immediately and delete this email from your system. 
No warranty is given that this email or files, if attached to this 
email, are free from computer viruses or other defects. They 
are provided on the basis the user assumes all responsibility for 
loss, damage or consequence resulting directly or indirectly from 
their use, whether caused by the negligence of the sender or not.

  Return to Index