Identity attribute

David Cameron · June 19th, 2003, 06:58 PM

Good points.

I also like the word surrogate better.

regards
David Cameron

owain · June 24th, 2003, 09:38 AM

One of the biggest advantages of relational data models is the reduced amount of repeated data, therefore reduced storage requirements.

If you are referencing several 'natural' keys as opposed to one IDENTITY key then you are repeating data unnecessarily.

Of course the counter argument is if you use an IDENTITY key when a natural one will do, then you are adding unnecessary data to the database, however over millions of rows, with several child tables and more then one or two natural keys, the repetition increases.

I know that these points have been argued in the above posts, I merely wanted to outline that although a natural key(s) should exist there is a reason why someone would want to

Quote:

quote:use these [IDENTITY columns] as primary keys.

Regards
Owain Williams

Jeff Mason · June 24th, 2003, 10:57 AM

Quote:

quote:Originally posted by owain
One of the biggest advantages of relational data models is the reduced amount of repeated data...

One of the biggest advantages of normalization (and not an RDBMS per se) is the reduced amount of repeated data. Normalization is the concept of "one fact, one time, one place, one way". That's the goal because experience (and a lot research) has shown us that a properly normalized database is better protected against data anomalies which can arise because of improper or insufficient inserts, updates, or deletes of data.

The reality is that normalizing data often means placing the data representing the entity being modeled into multiple tables. When it comes time to bring that data back together for processing, to again fully represent the entity being modeled, there is no choice but to JOIN the tables together. This means that there must be sufficient information in the tables being JOINed to relate them to each other, and that (typically) means that the primary key in one table must be present as a foreign key in another.

This means that the relating data must be present in both tables, and that is a duplication that cannot be avoided.

So the question is whether the relating data should be some sort of artificial meaningless value like an IDENTITY value, or whether it ought to be the "natural" value that already is the real world attribute that distinguishes one entity from another. Why make up a value when nature has already provided one?

We've discussed one reason - the "natural" key is unwieldy (composite), so we invent a strictly internal value which stands in for the natural key. Another reason may be primary key volatility - if, for example, an account number can be changed because of entry error or other circumstances, then it may be that the account number is not a good candidate for the primary key.

I have a hard time understanding any other reasons... :)

Jeff Mason
Custom Apps, Inc.
www.custom-apps.com