You create an index for performance reasons because you either use its columns in JOINS, as predicates in a WHERE clause, or as columns in an ORDER BY clause. The optimizer will notice that the columns being used are indexed and will then construct its query plan to take advantage of that fact to arrive at a more efficiently executed query.
For example, if you run a query like:
SELECT FirstName FROM Customers WHERE FirstName='Fred';
The only way to find all the rows which satisfy the WHERE condition is to look at all of them, each in turn, and find the ones where the condition is true. If you have a zillion rows in this table, it may take quite a while to scan the entire table looking for the rows which have a FirstName column value equal to "Fred".
Similarly, a query like:
SELECT LastName, FirstName FROM Customers ORDER BY FirstName;
would require that these zillion rows all be placed into a temporary table and then sorted by FirstName.
Now consider the case where an index exists on this FirstName column. The rows satisfying the selection criteria are now easily found by simply looking in the index for the desired value. The ORDER BY query requires no sorting at all, since the index orders the rows explicitly by FirstName.
Indexes are also very helpful when JOINs are being processed. When an index is present, it is easy to determine whether there is a matching row in the second table. If there is no index, a table scan will have to be done, in a manner similar to a WHERE clause.
There are no guarantees that an index will be used in any given situation, however. The optimizer is free to ignore their presence if it thinks it has a better way to construct the query.
There is a significant difference between an index on two columns as (LastName, FirstName) versus two indexes, one on each column. If you have a query such as:
SELECT LastName, FirstName FROM Customers WHERE LastName='Flintstone' AND FirstName='Fred'
and you have a two column index, the query can be very quickly satisfied by searching the index for the combined values of LastName and FirstName. If you have two separate indexes on the two columns, then only one index can be practically used. The optimizer may choose to use the LastName index and find all the rows which match the LastName selection value, then scan those matching rows looking for the FirstName value. Note that the optimizer may instead choose the other index, find all the rows whose FirstName is 'Fred', then scan those rows looking for a LastName match. You cannot easily tell the optimzer which way to use the two indexes, nor would you generally want to - it knows better than you which way to go.
Note that the two column index is totally useless for the FirstName only query. Since a multicolumn index orders its entries in the specified column order, in this case the two column index orders its entries first by LastName, then by FirstName. Thus for all intents and purposes it might as well be in a random order if all you are looking for is a FirstName, i.e. the second column of the index.
Depending on the nature of your specific queries, it might make sense to have
both the two column index
and the FirstName index. It doesn't make sense to have all three though, since the two column index where LastName is the first column is functionally equivalent to a single column index on that value.
Do not create indexes 'just because'. The index imposes overhead on update, insert, and delete queries, as the index contents have to be maintained as rows are changed. Obviously, the more indexes you have on a table, the more work will need to be done when you change/insert/delete a row, as each index in turn may have to be updated. You have to analyze your queries and evaluate the proportion of SELECT versus update/insert/delete queries, and decide where you will take the overhead and/or performance hit. There is no magic bullet, and query performance analysis is somewhat of an art, not a science...
Jeff Mason
Custom Apps, Inc.
www.custom-apps.com