As you've discovered, there is no one right answer, and there are religious
wars which erupt over this issue.
A primary key is that column, or set of columns, which uniquely identify a
row. A row is a representation of an entity or process. So the first
question that needs to be asked here, is what business entity is being
modeled by the rows in this table, and what attributes of that entity can be
used to identify each row and discern them one from the other.
What, exactly, is a 'BizID' and 'TermID'? What attributes of what entity or
process do these values represent?
I think you are arguing technical approaches here without first analyzing
entities being modeled. Perhaps you've already done that analysis, but I
cannot tell from your presentation of the issues.
When you can specify the real world attributes which identify and
distinguish entities from one another, you have identified a set of
potential candidate keys. There will then (if you are lucky) be one from
the set which stand out as the natural key for the row, and that is the key
you should use as the primary key.
Now, that having been said, there are many reasons why you may choose to
assign a surrogate key in place of the natural key, using perhaps an
identity column attribute to define that key. The extent to which rows are
joined using the natural key to other tables may argue in favor of a
surrogate, especially if the natural key is lengthy; the volatility of the
natural key values may argue in favor of a surrogate.
Just remember that it is very rare that the *only* way to distinguish one
row from another is by having an identity-type artificial key. I could
religiously argue that anybody who thinks there can be is still thinking of
records in a file system instead of rows in a relational database. ;-)
Be aware that the identity attribute (in SQL Server, say) is decidedly *not*
portable, though many other systems have a similar concept.
--
Jeff Mason Custom Apps, Inc.
Jeff@c...
-----Original Message-----
From: Christian Cryder [mailto:christianc@g...]
Sent: Wednesday, January 29, 2003 12:23 PM
To: sql language
Subject: [sql_language] Identity fields in Primary Keys
Hi folks,
Ok, I'm hoping someone can help shed some light on a DB design issue
relating to primary keys and identity fields. Let's say I have a table that
contains some columns like this:
BizID, TerminalID, ...
Option 1: pk defined by both, term id unique within biz
--------
A, 0, ...
A, 1, ...
A, 2, ...
B, 0, ...
B, 1, ...
B, 2, ...
Option 2: pk defined by both, term id unique across table
--------
A, 0, ...
A, 1, ...
A, 2, ...
B, 3, ...
B, 4, ...
B, 5, ...
Option 3: pk defined by term id, biz defined as foreign key
--------
0, ...A(FK),...
1, ...A(FK),...
2, ...A(FK),...
3, ...B(FK),...
4, ...B(FK),...
5, ...B(FK),...
Some within our team argue that Option 1 is preferred since it is more
"technically pure"...a row's identity is established by the entire pk, and
nothing less. Down side is that it requires us to assign terminal ids,
rather than allowing the DB to handle that for us. And that in turn requires
manually locking the table on update (which itself is db specific).
2nd and 3rd options allow us to declare the TerminalID as identity, and have
the db create the value for us. Much more portable, easier to implement,
etc. But opt 2 is not as technically pure because there are actually two
portions of the row that make it unique - biz + terminal, or just terminal.
Opt 3 would also be technically correct, in that pk uniqueness comes simply
from the term id (identity), and the link to biz is simply defined as a
foreign key. The downside to 3 is that if we are always doing lookups by biz
+ term id, we end up having to manually define an index for biz+term.
Now, searching the web for info on this type of thing reveals there seem to
be some fairly heated debates over what approach is the "right" one, with no
clear consensus over which is best.
My inclination is to lean towards option 2 for the following reasons:
a) we push the identity assignment down into the db itself, which should be
more efficient, portable, and easy to implement for us
b) by keeping the primary key as being both biz+term, we end up getting a
commonly used index for free without having to explicitly define it; plus it
doesn't seem like the double uniqueness really hurts anything
So I guess my question here is, is there any compelling reason NOT to use
identity fields in a composite primary key? Is there any reason why it would
somehow be advantageous to pursue opt 1, given the fact that it requires us
to assign the term id values ourself, and that in turn requires us to manage
table locking on updates.
Any insights into these matters would be greatly appreciated...
Christian
----------------------------------------------
Christian Cryder [christianc@a...]
Internet Architect, ATMReports.com
Barracuda - http://barracudamvc.org
----------------------------------------------
"Coffee? I could quit anytime, just not today"