Wrox Programmer Forums

Need to download code?

View our list of code downloads.

Go Back   Wrox Programmer Forums > Database > BOOK: Beginning Database Design Solutions ISBN: 978-0-470-38549-4
Password Reminder
Register
Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read
BOOK: Beginning Database Design Solutions ISBN: 978-0-470-38549-4
This is the forum to discuss the Wrox book Beginning Database Design Solutions by Rod Stephens; ISBN: 9780470385494
Welcome to the p2p.wrox.com Forums.

You are currently viewing the BOOK: Beginning Database Design Solutions ISBN: 978-0-470-38549-4 section of the Wrox Programmer to Programmer discussions. This is a community of tens of thousands of software programmers and website developers including Wrox book authors and readers. As a guest, you can read any forum posting. By joining today you can post your own programming questions, respond to other developersí questions, and eliminate the ads that are displayed to guests. Registration is fast, simple and absolutely free .
DRM-free e-books 300x50
Reply
 
Thread Tools Display Modes
  #1 (permalink)  
Old April 15th, 2012, 08:08 AM
Authorized User
Points: 143, Level: 2
Points: 143, Level: 2 Points: 143, Level: 2 Points: 143, Level: 2
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Mar 2012
Posts: 32
Thanks: 9
Thanked 0 Times in 0 Posts
Default No apparent combination of keys to make a compound primary key unless we take all col

I believe that it is silly when inexperienced people like myself say things like "there is no obvious natural primary key" or suggest tables to go without a primary key. I begin to understand that this is the sign of a faulty database design. Still...

I was wrong giving up my example, which is much more on the limb, to work with your example which is a lot more clearer but at the price of simplicity. It was excellent that we worked upon it because I was making a mistake anyway. I fixed the tables in my database where the solution presented worked smoothly. However, there are tables where I found myself in the situation where I am unsure once more what to do.

We got to this solution for your example:

Code:
users
-----
id (PK) X
                   
userskills       
---------      
id (PK1) (FK1) X
skill (PK2) (FK2) Y
                       
skills                
-----                
skill (PK) Y
Now my initial example was one having the tables: "users", "education" and "experience". Unlike "Users" and "Skills" that have the solution you presented, here, let's take "Users" and "Experience" as working example, in my opinion, are not alike.

Code:
users
-----
userid (PK)

experience
------------
start
end
position
company
industry
userid
"Start" and "End" refer to a period of time when the user was having the "position" in the "company" belonging to the "industry". An example will be: (2010-10-10, 2012-03-02, "Web Designer", "Adobe", "IT/Software/Web Design, 12"). The "experience.userid" is a foreign key for "users.userid". Also the "experience.industry" will be a foreign key for a lookup table that will contain the list of industries as constrain (as in the "skills" example).

Now i should determine a primary key for this table (experience) and I see no obvious candidates. Unlike the skill table where a compound key with the "userid" and "skill" resolve the problem making an unique record along the lines that a user should have many skills, but not one skill twice, here nothing pops-out.

If we take "userid" in combination with "position" and/or "company", will be ok for most of the cases. However, it is not unheard for a person to leave a place of work for a better one, than to come back for the same position at the same company for various reasons like: the company discovered they cannot do it without him and they offered the position back with an increase in the salary, the move was a bad decision and he decides to come back, the new position was great but the company sunk quick...etc. So we may well have data like the following:

Code:
2010-10-10, 2012-03-02, "Web Designer", "Adobe", "IT/Software/Web Design, 12
2010-08-10, 2010-10-03, "Web Designer", "BitWise", "IT/Software/Web Design, 12
2006-08-10, 2010-08-10, "Web Designer", "Adobe", "IT/Software/Web Design, 12
It seems that the only real way we can assure data integrity is if the entire set (taken as a whole) is unique.

Now my questions are: should we make a compound key from all ? Isn't 6 columns marked as PK a little too much?

How about if we make the table much heavier by adding stuff like "country", "state", "city", "responsibilities"... Now maybe not the entire set will make it for the compound key, but the location thing will probably make it as a company may have two offices in two different cities of the same state...etc. Now we have 9 columns that should make the compound primary key... If the logic dictates it we make it 9 or we change the way we're thinking (as maybe it's not that efficient, even if it's logical correct). What is the rule of thumb in such scenarios?

A few days back I was asking a question about the "date" as key, I was interested in a general reasoning, but what I really had in mind was an example like this one when the date may be part of a compound key (or not).

Wouldn't be better to simply create an artificial key for a situation like this?

The rules are never meant to be broken? If this is not a case of braking the rules, can you please provide me one to think about it and have it for comparison in the future?

Thank you for your help.

Last edited by masterlayouts; April 15th, 2012 at 08:30 AM.
Reply With Quote
  #2 (permalink)  
Old April 15th, 2012, 11:06 AM
Rod Stephens's Avatar
Wrox Author
Points: 2,999, Level: 22
Points: 2,999, Level: 22 Points: 2,999, Level: 22 Points: 2,999, Level: 22
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Jan 2006
Location: , , .
Posts: 601
Thanks: 1
Thanked 84 Times in 83 Posts
Default

Quote:
I believe that it is silly when inexperienced people like myself say things like "there is no obvious natural primary key" or suggest tables to go without a primary key. I begin to understand that this is the sign of a faulty database design. Still...
It's not silly. Many real world problems lead naturally to table designs that are not normalized. With enough experience you tend to normalize then as you write them down but until then you get the hang of it you need to work through those issues one at a time.

Quote:
If we take "userid" in combination with "position" and/or "company", will be ok for most of the cases. However, it is not unheard for a person to leave a place of work for a better one, than to come back for the same position at the same company for various reasons
You're right. That's not enough. For this table you might be able to use userid + start. That assumes that a user only has one job at a time.

Ir the user might have more than one job at the same time, you would need userid + start + company. That assumes a user has only one job at a time at any single company.

If the user could have more than one job at a single company, you would need userid + start + company + position.

As you can see, the primary key can include many fields, not just one or two.

If the user could have more than one job at the same company and the same position (i.e. is the manager for two departments), then you need to add more data to the table. For example, you could add the department name as a new field. Or you could make the positions be "Manager IT" and "Manager HR" instead of just "Manager."

Or if you don't care what departments the user managed, you could add a Count field so you could indicate that the user was a manager for 2 departments at the same time. That would be pretty unusual.

All of these would be pretty unusual. My guess is that most programs would assume that a user will have only one job at a time in a particular company so start + company will be enough. That seems like the most intuitive combination.

To figure out which fields to use, ask yourself, "What will always make two records distinguishable?" or "What combination will always be different for any two records?" Here I think you may work at the same company twice but probably not at the same time.

Quote:
Isn't 6 columns marked as PK a little too much?
That is a lot and if you have such a large key then you might want to think hard about the design to see if there's a way to simplify things. But I can imagine there may be times when you'd want such a large key. Or when it might be better to denormalize the table a little to make things simpler. Then the table might not strictly follow all of the normalization rules but it would make things easier.

Quote:
How about if we make the table much heavier by adding stuff like "country", "state", "city", "responsibilities"...
I don't think you need to add the new fields to the key, although you could if you needed to. If the user could hold more than one position in multiple parts of the company, you might need to add the part's name or ID to make the key unique.

Note, however, that in this case there would be a lot of data duplicated across records. For example, every user that worked in the London office would have all of the same data for that office: OfficeName, OfficeAddress, Country, City, etc. That means that data should be pulled out into an OfficeAddress table and then linked to the user's data with an OfficeId (which would be a foreign key constraint).

When you start trying to cover every possibility like this, the model can grow very complex. That's why it's important not to store data that you will never need. You could add fields to hold a user's **************** size but you probably won't need it (unless you're a show manufacturer) and it will make things more complicated.

Quote:
A few days back I was asking a question about the "date" as key, I was interested in a general reasoning, but what I really had in mind was an example like this one when the date may be part of a compound key (or not).
Here I think it makes sense to use the start date as part of the primary key. Because you are probably only thinking about the date, I would only store that and not the time. You probably won't need to know the exact millisecond when the user started work.

You may want some extra data checks, too, to ensure that a user doesn't have two jobs that overlap. For example, one job goes from Feb 1, 2011 to Apr 1, 2011 and another goes from Jan 1, 2011 to May 1, 2011. Putting the start date in the primary key guarantees the user cannot start two jobs on the same date but it doesn't prevent this kind of overlap.

I don't think most databases can make that check for you easily. You might have to write a trigger or perform the check in code when you create or edit records.

Quote:
The rules are never meant to be broken? If this is not a case of braking the rules, can you please provide me one to think about it and have it for comparison in the future?
There are times when you want to break the rules but before you do you should think about the effect on the database. In this case, adding an artificial record ID number doesn't add any useful information and prevents the database from using its uniqueness guarantee for any good purpose.

One drawback to primary keys is that most (all?) database products do not let you modify them after a record is created.

So what if you need to modify the fields in the key? Most big databases like credit card or big store customer databases assign you a customer ID (your credit card number) and then you can never change it. That's an artificial key so you should ideally never need to change it. If you do (for example, your customer data was stolen and you need to get rid of those credit card numbers), you need to create new records for the customer, copy the data over, and destroy the original records.

So now back to this example. Suppose you use start + company as the primary key. Will you ever need to change a user's start date or company in one of those records? Probably not. If you do, then you probably made a mistake creating the original record and to fix it you'll need to create a new record and delete the old one.

In the credit card case, are there any customer fields that will never change that you could use as a primary key? Probably not. Customers can move and even change their names. And often customers in the same city may have the same name. Here you kind of need an artificial ID field.

The classic example of breaking a rule to make things easier is storing ZIP code and City in a customer record. (In countries other than the USA there are probably similar issues with cities and postal codes.) A particular ZIP code determines the City so if you know the ZIP code then you should be able to look up the City. That means storing both City and ZIP code in the same record would be redundant and normalization says you should just store the ZIP code.

But looking up ZIP codes all the time slows things down. It's a lot easier and more intuitive to just store both. You'll only get in trouble if the City assigned to a ZIP code changes and that is pretty rare.

So there are two examples where the rules are often broken:

With credit cards and other huge customer databases, there is no good combination of fields that are unique and all of them could change for a particular user so they create an artificial customer ID (account) number.

With City and ZIP code, storing both fields is redundant so violates the normalization rules but people do it anyway for the convenience because when you use one you almost always use them both (printing bills and so forth).
__________________
Rod

Rod Stephens, Microsoft MVP

Essential Algorithms: A Practical Approach to Computer Algorithms

(Please post reviews at Amazon or wherever you shop!)
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
SQL Design: Foreign Key to Multiple Primary Keys? kalel_4444 BOOK: ASP.NET 2.0 Website Programming Problem Design Solution ISBN: 978-0-7645-8464-0 10 May 8th, 2008 04:14 PM
When to define primary keys and foregin keys? method SQL Server 2000 1 August 26th, 2005 09:14 AM
Thoughts on Primary keys rohan_man Access 3 February 9th, 2005 05:18 PM
how to post a key combination successfully? Greywacke VB How-To 0 October 14th, 2003 01:09 PM



All times are GMT -4. The time now is 02:58 AM.


Powered by vBulletin® Version 3.7.0
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
© 2013 John Wiley & Sons, Inc.