Clustered Index Scan

bmains · March 30th, 2004, 03:12 PM

Hello,

I'm looking at the estimated execution plan for one of my stored procedures. I've got a Clustered Index Scan occuring, where the number of rows exceeds 1800. The Clustered PK on this table is an ID field that is an identity with the data type of int.

Does anybody know how to increase performance on this table? I'm being told by the DBA's that I have to fix the table scan.

Thanks,

Brian

puneetmittal1974 · April 1st, 2004, 12:27 PM

I think you might want to increase performance on select queries.
Put a non clustered index on a column based on the value of which
most of your select queries are running.

I hope this will increase your performance of select queries.

Puneet

Jeff Mason · April 1st, 2004, 04:03 PM

A clustered index scan isn't altogether bad. It is, after all, a scan of your data (the clustered index is where your data columns are). If you execute a query which says, "Give me all rows where this condition is true", you must of necessity look at each row in turn to see if the row matches the condition.

Now, if you do this search a lot, you can place an index on the column(s) involved in the search which might speed things up. This really only holds true if the column data you are testing has a reasonable level of specificity. The less specific the values are, the less utility an index will give you. For example, if your query condition is something like "WHERE Gender='M'" an index on this column isn't likely to do you much good, and in fact may actually be slower than the clustered index scan. There are just too many duplicates, so scanning may actually be faster than bouncing back between index and data. And, of course, the presence of the index will slow down updates/inserts/deletes.

On the other hand, if your condition is something like "WHERE OrderDate > '1/1/2004'" then you may gain a significant improvement with the index, especially if you have 2 zillion orders dated before this year. The table scan will require reading all those rows, whereas the index will allow you to find this year's data quickly.

Generally (very generally) speaking, it is a good idea to place an index on those columns involved in heavy selections, or on those columns involved in JOINs. Note that the presence of a FOREIGN KEY or UNIQUE constraint already implies the existence of an index.

You may consider creating what's called a "covering index" on those columns involved in SELECTs, JOINs, and/or WHERE clauses. A covering index contains all the columns in these clauses. If there are a lot of such columns, forget it, as the overhead of the index will outweigh the benefits. A covering index can give you significant performance benefit, because the conditions of the clause(s) can be satisfied entirely from the index, and the data (clustered index) never has to be referred to.

It may or not be a good idea to create the clustered index on the identity ID column. Doing so means your data almost always isn't in the right order, and if you insert rows a lot (though with only 1800 rows this is unlikely to be the case) you can end up with a "hot spot" right at the end of the clustered index, where all the activity is taking place. If there is a more "natural" key, it might be better to place the index on that. A lot depends on your situation, though, since if the ID is used a lot in JOINs, the clustered index may in fact be better placed on it.

There are no hard and fast rules on this performance tuning stuff - it all depends on your unique situation. Perhaps if you post some more details about the table structures and the offending query, we might be able to offer additional insights...

Jeff Mason
Custom Apps, Inc.
www.custom-apps.com

bmains · April 2nd, 2004, 02:07 PM

That's what I thought, I just wanted to verify this. Thanks for your help.

peace2007 · April 6th, 2009, 06:25 AM

Quote:

Originally Posted by Jeff Mason

A clustered index scan isn't altogether bad. It is, after all, a scan of your data (the clustered index is where your data columns are). If you execute a query which says, "Give me all rows where this condition is true", you must of necessity look at each row in turn to see if the row matches the condition.

Now, if you do this search a lot, you can place an index on the column(s) involved in the search which might speed things up. This really only holds true if the column data you are testing has a reasonable level of specificity. The less specific the values are, the less utility an index will give you. For example, if your query condition is something like "WHERE Gender='M'" an index on this column isn't likely to do you much good, and in fact may actually be slower than the clustered index scan. There are just too many duplicates, so scanning may actually be faster than bouncing back between index and data. And, of course, the presence of the index will slow down updates/inserts/deletes.

On the other hand, if your condition is something like "WHERE OrderDate > '1/1/2004'" then you may gain a significant improvement with the index, especially if you have 2 zillion orders dated before this year. The table scan will require reading all those rows, whereas the index will allow you to find this year's data quickly.

Generally (very generally) speaking, it is a good idea to place an index on those columns involved in heavy selections, or on those columns involved in JOINs. Note that the presence of a FOREIGN KEY or UNIQUE constraint already implies the existence of an index.

You may consider creating what's called a "covering index" on those columns involved in SELECTs, JOINs, and/or WHERE clauses. A covering index contains all the columns in these clauses. If there are a lot of such columns, forget it, as the overhead of the index will outweigh the benefits. A covering index can give you significant performance benefit, because the conditions of the clause(s) can be satisfied entirely from the index, and the data (clustered index) never has to be referred to.

It may or not be a good idea to create the clustered index on the identity ID column. Doing so means your data almost always isn't in the right order, and if you insert rows a lot (though with only 1800 rows this is unlikely to be the case) you can end up with a "hot spot" right at the end of the clustered index, where all the activity is taking place. If there is a more "natural" key, it might be better to place the index on that. A lot depends on your situation, though, since if the ID is used a lot in JOINs, the clustered index may in fact be better placed on it.

There are no hard and fast rules on this performance tuning stuff - it all depends on your unique situation. Perhaps if you post some more details about the table structures and the offending query, we might be able to offer additional insights...

Jeff Mason
Custom Apps, Inc.
www.custom-apps.com

that helped me much thanks a lot

RandomGuy · June 7th, 2010, 11:35 AM

Quote:

Originally Posted by Jeff Mason

A clustered index scan isn't altogether bad. It is, after all, a scan of your data (the clustered index is where your data columns are). If you execute a query which says, "Give me all rows where this condition is true", you must of necessity look at each row in turn to see if the row matches the condition.

Now, if you do this search a lot, you can place an index on the column(s) involved in the search which might speed things up. This really only holds true if the column data you are testing has a reasonable level of specificity. The less specific the values are, the less utility an index will give you. For example, if your query condition is something like "WHERE Gender='M'" an index on this column isn't likely to do you much good, and in fact may actually be slower than the clustered index scan. There are just too many duplicates, so scanning may actually be faster than bouncing back between index and data. And, of course, the presence of the index will slow down updates/inserts/deletes.

On the other hand, if your condition is something like "WHERE OrderDate > '1/1/2004'" then you may gain a significant improvement with the index, especially if you have 2 zillion orders dated before this year. The table scan will require reading all those rows, whereas the index will allow you to find this year's data quickly.

Generally (very generally) speaking, it is a good idea to place an index on those columns involved in heavy selections, or on those columns involved in JOINs. Note that the presence of a FOREIGN KEY or UNIQUE constraint already implies the existence of an index.

You may consider creating what's called a "covering index" on those columns involved in SELECTs, JOINs, and/or WHERE clauses. A covering index contains all the columns in these clauses. If there are a lot of such columns, forget it, as the overhead of the index will outweigh the benefits. A covering index can give you significant performance benefit, because the conditions of the clause(s) can be satisfied entirely from the index, and the data (clustered index) never has to be referred to.

It may or not be a good idea to create the clustered index on the identity ID column. Doing so means your data almost always isn't in the right order, and if you insert rows a lot (though with only 1800 rows this is unlikely to be the case) you can end up with a "hot spot" right at the end of the clustered index, where all the activity is taking place. If there is a more "natural" key, it might be better to place the index on that. A lot depends on your situation, though, since if the ID is used a lot in JOINs, the clustered index may in fact be better placed on it.

There are no hard and fast rules on this performance tuning stuff - it all depends on your unique situation. Perhaps if you post some more details about the table structures and the offending query, we might be able to offer additional insights...

Jeff Mason
Custom Apps, Inc.
www.custom-apps.com

A very clear and easy to understand explanation. Thankyou.
I shall be quoting this when working with our junior programmers