Multiple Columns in an Index

owain · June 16th, 2003, 07:20 AM

What is the advantage of including several columns in an Index, for example:

Code:

CREATE INDEX CustomerIndex
ON Customers (LastName, FirstName)

as opposed to creating 2 indexes, one for each column?

Regards
Owain Williams

ram2098 · June 16th, 2003, 09:13 AM

Having more indexes on any table will reduce the performance if you are using any DML Statements,since it has to update the indexes as well.

When creating the indexes it is always good to have the columns (one or multiple) which are unique.

If you think both firstname and lastname are separately unique (I don't think it is) and the table is only for retrieval, you can go ahead and create separate indexes. In any other case, I suggest to have the combination.

owain · June 16th, 2003, 10:10 AM

FirstName and LastName were just examples, albeit bad ones :D

I was wondering what the advantage of using more than one column in a single index was. Does it have anything to do with query performance if you use both the columns a lot in your queries, or does it not matter whether the columns are in separate indexes or not?

Also, is it easer for the DBMS to maintain a few indexes with several columns in each, or to maintain several indexes with just one column in each?

This is all just hypothetical, I am just curious to the pros and cons of using indexes with several columns.

Regards
Owain Williams

ram2098 · June 16th, 2003, 10:23 AM

It does not matter a few indexes with several columns or more indexes with one column each.... as long you create good indexes and the table is for retrieval only. SQL Server Combines the indexes (even if you have separate indexes) based on the where clause of your query. But, the order of the columns is important with in the index (for performace tuning).

If you are using any DML statements on the table, it is always good to keep the index count low.

bluckcuck · June 16th, 2003, 10:43 AM

It will be a benefit to have multi-column indexes if you do reference criteria for each of the columns in your query.

In your original example index, the query:

Code:

SELECT * FROM Customers
WHERE FirstName = 'John' AND LastName = 'Smith'

Will cause the database to do an index scan of the single multi-column index and return a result without much database processing.

However, if there were only single-column indexes (one each for FirstName and LastName), then the database will perform two index scans, and then perform a dynamic hash join of the two index scans.

Will you notice the difference with a small database? Probably not. But with a large table and a lot of users you will see a substantial performance increase (or at least a lot less load on the server).

Bruce Luckcuck
Director, Applications & Support Services
Wiley Publishing, Inc.

chrislepingwell · June 16th, 2003, 10:49 AM

Another point to consider is the case of compound primary or foreign keys, where a multi-column index covering all the index fields is a must and, as was already mentioned here, column order is very important.

ram2098 · June 16th, 2003, 10:50 AM

I agree with you Bruce...Good Explanation.

Jeff Mason · June 16th, 2003, 10:58 AM

You create an index for performance reasons because you either use its columns in JOINS, as predicates in a WHERE clause, or as columns in an ORDER BY clause. The optimizer will notice that the columns being used are indexed and will then construct its query plan to take advantage of that fact to arrive at a more efficiently executed query.

For example, if you run a query like:

SELECT FirstName FROM Customers WHERE FirstName='Fred';

The only way to find all the rows which satisfy the WHERE condition is to look at all of them, each in turn, and find the ones where the condition is true. If you have a zillion rows in this table, it may take quite a while to scan the entire table looking for the rows which have a FirstName column value equal to "Fred".

Similarly, a query like:

SELECT LastName, FirstName FROM Customers ORDER BY FirstName;

would require that these zillion rows all be placed into a temporary table and then sorted by FirstName.

Now consider the case where an index exists on this FirstName column. The rows satisfying the selection criteria are now easily found by simply looking in the index for the desired value. The ORDER BY query requires no sorting at all, since the index orders the rows explicitly by FirstName.

Indexes are also very helpful when JOINs are being processed. When an index is present, it is easy to determine whether there is a matching row in the second table. If there is no index, a table scan will have to be done, in a manner similar to a WHERE clause.

There are no guarantees that an index will be used in any given situation, however. The optimizer is free to ignore their presence if it thinks it has a better way to construct the query.

There is a significant difference between an index on two columns as (LastName, FirstName) versus two indexes, one on each column. If you have a query such as:

SELECT LastName, FirstName FROM Customers WHERE LastName='Flintstone' AND FirstName='Fred'

and you have a two column index, the query can be very quickly satisfied by searching the index for the combined values of LastName and FirstName. If you have two separate indexes on the two columns, then only one index can be practically used. The optimizer may choose to use the LastName index and find all the rows which match the LastName selection value, then scan those matching rows looking for the FirstName value. Note that the optimizer may instead choose the other index, find all the rows whose FirstName is 'Fred', then scan those rows looking for a LastName match. You cannot easily tell the optimzer which way to use the two indexes, nor would you generally want to - it knows better than you which way to go.

Note that the two column index is totally useless for the FirstName only query. Since a multicolumn index orders its entries in the specified column order, in this case the two column index orders its entries first by LastName, then by FirstName. Thus for all intents and purposes it might as well be in a random order if all you are looking for is a FirstName, i.e. the second column of the index.

Depending on the nature of your specific queries, it might make sense to have both the two column index and the FirstName index. It doesn't make sense to have all three though, since the two column index where LastName is the first column is functionally equivalent to a single column index on that value.

Do not create indexes 'just because'. The index imposes overhead on update, insert, and delete queries, as the index contents have to be maintained as rows are changed. Obviously, the more indexes you have on a table, the more work will need to be done when you change/insert/delete a row, as each index in turn may have to be updated. You have to analyze your queries and evaluate the proportion of SELECT versus update/insert/delete queries, and decide where you will take the overhead and/or performance hit. There is no magic bullet, and query performance analysis is somewhat of an art, not a science...

Jeff Mason
Custom Apps, Inc.
www.custom-apps.com

owain · June 17th, 2003, 03:44 AM

Thank you all for your helpful and detailed responses to my question. Once again Jeff's knowledge and ability to explain has totally blown me out of the water :)

Regards
Owain Williams