Wrox Programmer Forums

Need to download code?

View our list of code downloads.

Go Back   Wrox Programmer Forums > Database > BOOK: Beginning Database Design Solutions ISBN: 978-0-470-38549-4
Password Reminder
Register
Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read
BOOK: Beginning Database Design Solutions ISBN: 978-0-470-38549-4
This is the forum to discuss the Wrox book Beginning Database Design Solutions by Rod Stephens; ISBN: 9780470385494
Welcome to the p2p.wrox.com Forums.

You are currently viewing the BOOK: Beginning Database Design Solutions ISBN: 978-0-470-38549-4 section of the Wrox Programmer to Programmer discussions. This is a community of tens of thousands of software programmers and website developers including Wrox book authors and readers. As a guest, you can read any forum posting. By joining today you can post your own programming questions, respond to other developers’ questions, and eliminate the ads that are displayed to guests. Registration is fast, simple and absolutely free .
DRM-free e-books 300x50
 
 
Thread Tools Display Modes
  #1 (permalink)  
Old May 3rd, 2012, 07:26 AM
Authorized User
Points: 143, Level: 2
Points: 143, Level: 2 Points: 143, Level: 2 Points: 143, Level: 2
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Mar 2012
Posts: 32
Thanks: 9
Thanked 0 Times in 0 Posts
Default Artificial key, is it fair to use it to save programming time?

Thank you for your constant help. I think I may have a good question today. I confess that I learned a lot from your book and I have to say one of the common mistakes I did until I read your book it was to use extensively artificial primary keys. Now understanding the reason why it is not such a great idea I am using natural primary keys (as composite where needed). But I have a question related to the programming methods. It is somehow obvious that there may be situations when using an artificial key will be easier and in fact there are frameworks that are doing precisely this thing.

My situation it's the following:

Code:
Users
-----
UserID PK

Language
---------
UserID 
NameLanguage FK
LevelLanguage FK

Languages
----------
NameLanguage FK
It seems natural to user Language.UserID and Language.NameLanguage as primary key for this table. Now comes the tricky thing. When I had an artificial primary key I was able to modify and delete records by simply passing that key as parameter in URL.

.../modify.php?language=<?php echo $LanguageID;?>

where the variable holds the value from the artificial key "Languages.LanguageID". Simply and efficient.

Now when I have a composite key it seems that a lot of programming may be required. What one should do in this circumstances?

I believe that UserID may be taken from the $_SESSION['id'] and what we pass as parameter in the URL will be the Language.NameLanguage, hence not really a lot of work.

But I guess there will be situations when one of the values of a compound key will not be so easily available like the UserID, in which case what do we need to do? Pass both values as parameter? How about when we have a compound key with 3 or more keys? Wouldn't be easier to make an artificial key that will save us a lot of time and trouble? As far as the database will not have consistency problems, is it fair as programmer to save time by using artificial primary keys?

(No, I am not obsessed with breaking the normalization rules... )

-------------------------------------------------------------------------------------------------------------------

PS: such difficult of implement example you were kind to discuss in a previous thread where the answer was:

Quote:
If the user could have more than one job at a single company, you would need userid + start + company + position.
for the table example:

Code:
experience
------------
start
end
position
company
industry
userid
Here an artificial key will make things soooo much easier...

-------------------------------------------------------------------------------------------------------------------

I think I can top the previous example:

Code:
education
------------
userid
start
end
school
industry
degree char(1) Y/N
What combination of keys will make the primary key?
There are students that are taking two degrees in the same time at the same university, and until they get their degree the degree will say "no". The userid, start, end, degree, school, even industry may be the same.

For example an user may be first year at an university looking for a degree in software (anything) and one in hardware (anything). The input will look like this:

2011-2012 | University of Pensilvania | IT | no
2011-2012 | University of Pensilvania | IT | no

When the web site is big it is normal to overlap some industries/domains like "IT". What needs to be done in this example?

----------

For completeness I should add the following: there are several standards, one of them is for European countries, called "Europass". Their fields are the following:
Education and training

Dates (beginning - end)
Title of qualification awarded
Principal subjects/occupational skills covered
Name and type of organization providing education and training
Level in national or international classification

This is a real example and on their web site, to my surprise, "Level in national or international classification" is a text field. "Name and type or organization" it's split on usual name + address fields. "Title or qualification" is a textfield. "Skills" is a textarea (surprise). Dates are dates.

I guess the primary key at first glance seems UserID+Qualification. But how about the cases when somebody is taking the same qualification in two different geographical areas (countries in Europe or states in United States) as when required to pass the qualification for that country or state. Even simpler, how about a college student who is working for two degrees at the same university and did not get any license because it's undergoing (or to be mean, because did not passed the exams).

Also I guess the level could be involved as well in discussion... What is your opinion for this, I will say, difficult situation?

Applying what you teach us in a previous thread when we were talking about the "Experience":
Quote:
If the user might have more than one job at the same time, you would need userid + start + company. That assumes a user has only one job at a time at any single company.
If the user could have more than one job at a single company, you would need userid + start + company + position.
I guess I will take userid + end + qualification
or
userid + end + qualification + institution

Not even here an artificial key will make sense? One small educationID and all problems are solved and at the programming level we pass this id instead 3 extra parameters.

I apologies for the long thread...

Last edited by masterlayouts; May 3rd, 2012 at 10:23 AM.
  #2 (permalink)  
Old May 3rd, 2012, 11:14 AM
Rod Stephens's Avatar
Wrox Author
Points: 3,153, Level: 23
Points: 3,153, Level: 23 Points: 3,153, Level: 23 Points: 3,153, Level: 23
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Jan 2006
Location: , , .
Posts: 643
Thanks: 2
Thanked 96 Times in 95 Posts
Default

First, remember that the normalization rules are there to protect you against inconsistent data and to provide flexibility.

For example, if you have duplicate data, then if you update it you have to be sure you update all instances of the data or you have inconsistencies.

For another example, if you hard code license classes (from another thread) A, B, C, ... F into a user's record, then it's difficult to change the data if you need to add a new license class. (This can also make queries more awkward. To query for someone with a particular license class, you need to custom build the query in code instead of something like SELECT * FROM Users, LicenseClasses WHERE Users.UserId = LicenseClasses.UserId AND LicenseClasses.LicenseType = "C".)

But if your code would be simplified by creating an artificial key and you're willing to give up the protection that the normalization rules give you, then you're welcome to do that. Overall, however, I'd encourage you to think hard about what you will really save. In most cases the code isn't much simpler with the denormalized version and you add more potential confusion later when some other programmer takes over the code and doesn't understand how and why you broke the rules.

To your specific questions.

Yes, you might need to pass multiple values to make up the primary key if a table uses a compound key. It's not so bad.

Sometimes every key in a table will be part of the primary key. That may be required to get uniqueness. Your Language table is a good example.

I'm not sure artificial keys will help as much as you think. For example, consider the Language table. Most of the time, I won't be searching for a particular record where I don't know all of the values. Usually I'll be searching for all records for a particular user. SELECT * FROM Language WHERE UserId = 1337.

To update those records, I'll probably often simply delete the old records and then add new ones.

One case where you definitely need an artificial key is when you *need* to update a field that should be part of the primary key. Most databases won't let you modify the primary key so you would need the artificial key. That's why credit card companies, phone companies, and other big companies use a customer ID. In a big database, you will get two customers with the same name and occasionally you'll need to change a customer's name.

(It's only fairly recently that these companies have had large numbers of account numbers stolen and they need to change customer account numbers such as credit card numbers. I think that was a bit of a hassle for them but I presume they're getting more experienced at it. The "correct" solution is to have a separate credit card number and customer ID so you can change the credit card number without changing the customer ID. What a headache!)

For the education table, you need to think about what data is important for your application. If you're a typical employer, you only want to know when the employee graduated and what degree (if any) they got. In that case you could probably use:

EmployeeId
GraduationDate (which could be the date of last attendance if they didn't graduate)
Degree (which could be none)

If you are really focused on education, you might want a lot more information. In either case I think adding a Degree field ("BA Mathematics" or "MS Engineering") would give you the data you need. Then if you work on two degrees at the same time, you would have two records.

For the Europass example. They have lots of free-form text fields that ideally should not be. Instead they should have a list of acceptable values in a foreign key constraint so you can't enter "Masters of Information Sciences" in one record and "Mast. Iunf. Sci." in another. Otherwise it's very hard to search the table.

But perhaps they don't want to search the table. If they only want to see your qualifications when they pull up your record, then they don't need to be able to search the fields. (Changing the table would still save them time if they want to only see candidates with a certain level of degree.)

In fact, this may be a case where they really don't need *to use* the primary key at all. If the only way they will only *ever* want to list the degrees for a selected person, then they can just pick all records with the give person's ID. You will still probably want to make a primary key (1) to guarantee uniqueness and (2) some databases require it.

To handle almost all of these situations, your "what if" question usually points to the answer.

"How about the cases when somebody is taking the same qualification in two different geographical areas?" You need to add the geographical location to the table. (In the U.S. I think you need to pass exams in any stater where you want to practice law or medicine so this would be an issue if you want to hire a doctor in Colorado, for example.)

"How about a college student who is working for two degrees at the same university and did not get any license because it's undergoing?" You need to add a field indicating that the degree is not finished.

I'm not sure how adding an artificial key would help in these cases anyway. When you search for a candidate's data, you'll search by UserId and not any of the other fields. If you want a candidate with a particular qualification, you'll search for a particular field (like Qualification = "Bar Exam") and not the artificial key.

How would you use they artificial key?
__________________
Rod

Rod Stephens, Microsoft MVP

Essential Algorithms: A Practical Approach to Computer Algorithms

(Please post reviews at Amazon or wherever you shop!)
The Following User Says Thank You to Rod Stephens For This Useful Post:
masterlayouts (May 3rd, 2012)
  #3 (permalink)  
Old May 3rd, 2012, 12:52 PM
Authorized User
Points: 143, Level: 2
Points: 143, Level: 2 Points: 143, Level: 2 Points: 143, Level: 2
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Mar 2012
Posts: 32
Thanks: 9
Thanked 0 Times in 0 Posts
Default

Now I understand the issue with the driving license from yesterday. I will fix it asap.

For today's example, I think it is safe to say there are two issues.

The first problem as I see it is that when the primary key is compound and we have to deal with 4-5 keys in order to modify or delete a record we have to pass all these values as parameters, which is at least more inconvenient than passing an id as something that behaves like an artificial key for that table.

Passing a series of parameters as string, especially free text input from the user it is less secure and more difficult to handle than numbers. For a string that we pass as parameter we will have to sanitize it than look if the input is in an array of allowed values, than operate with it. Instead, having the parameter passed in the URL as number all we have to do is typecast it.

We have
is_integer and (int) VS urlencode, urldecode, sanitize, in_array

When using an artificial key instead of natural keys there are no benefits.
My question is how about we add a field "ThisTableID" and we set it as AUTOINCREMENT. In this case we have the uniqueness of the record that we want to edit, so instead of passing 5 parameters as string, we pass only one as integer. "ThisTableID" doesn't have to be set as PK and carry no semantic value. It has only a functional value.

-----------------------------------------------------
The second issue is that I am still on the limb with this table. I took out the fields from the mentioned Europass that it is supposed to be a standard for Europe.

UserID
AdmissionDate
EmployeeId
GraduationDate
Degree = Title or qualification awarded
Skills = Principal subjects/occupational skills covered
Institution = Name and type of organization providing education and training
Level = Level in national or international classification

I guess "Skills" can be left out, especially that is defined as text field.
We can probably leave "AdmissionDate".

However, everything else seems to me that has to go in the Primary Key.
More than this, those that do not have yet obtained the diploma or certification will still have to put something there (as you mentioned, it can be 'none'). How this can be worked out?
  #4 (permalink)  
Old May 5th, 2012, 10:25 AM
Rod Stephens's Avatar
Wrox Author
Points: 3,153, Level: 23
Points: 3,153, Level: 23 Points: 3,153, Level: 23 Points: 3,153, Level: 23
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Jan 2006
Location: , , .
Posts: 643
Thanks: 2
Thanked 96 Times in 95 Posts
Default

Well, it's not like you have to write the parameters by hand and carry them across town yourself. The computer doesn't care whether they are strings or numbers. It's pretty good at dealing with a long series of parameters of different types. Yes if you're passing these as URLs then you get some long URLs but that's okay.

You can use auto-increment to get unique IDs but unless you really need them they don't really add anything. They do make things a tiny bit easier for you when you're passing values back and forth but keep in mind that they also add another index structure to the table and that takes up some space and the database needs more time to update the table when you make changes because it has to update the index for the primary key that you don't really use.

Note that this is all different if you need to update all of the fields in the record. Because most databases won't let you modify the primary key after it is assigned, if you need to update any of the fields in a compound primary key they you need some sort of artificial key to you can update the other fields.

There are a couple of issues with that table.

If Skills are related to the degree (i.e. everyone with a particular degree has certain skills), then they should be linked to the degree not the person. If every person has skills independent of other people, then they could be listed separately in a manner similar to the degrees. There would be a Skill table that lists skills for people.

The other issue here is that this is a weak entity. It doesn't really make sense for a Europass entry to exist without a corresponding User. You'll also probably never want to find a Europass entry without finding the corresponding user. For example, would you ever do SELECT * FROM Europass WHERE AdmissionDate=#4/4/2012# AND Degree="BA Mathematics"? Probably not. You'd want to join this to the User table to find out who has that degree.

So yes you'd need just about every field to guarantee uniqueness but you probably wouldn't use that key to find records. Instead you're probably going to use UserID almost always and sometimes add an extra field like Degree if you're looking for a specific degree.

In this case the primary key doesn't really help you find records but it does guarantee uniqueness. You need to guarantee uniqueness anyway so the program doesn't accidentally add duplicated records so you can add the primary key for that.

That may also help when searching for a particular UserID (with some databases and if you make UserID the first field in the primary key). If not, then you may want to add another index on UserID because that's how you will almost always search this table.

You could add an artificial primary key but it would (1) add extra overhead to the table, (2) not guarantee uniqueness so you'd have to add a separate uniqueness constraint, and (3) not really help with querying.

You could try it to see what happens. Perhaps the database can manage a uniqueness constraint plus an index on UserID more effectively than a primary key with UserID as the first field. You'd have to try it to see.
__________________
Rod

Rod Stephens, Microsoft MVP

Essential Algorithms: A Practical Approach to Computer Algorithms

(Please post reviews at Amazon or wherever you shop!)
The Following User Says Thank You to Rod Stephens For This Useful Post:
masterlayouts (May 5th, 2012)
 


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
How to save public key in keystore using keytool? boy18nj BOOK: Beginning Cryptography with Java 10 June 23rd, 2011 12:59 AM
I don’t think we save any process time if… carewithl BOOK: ASP.NET 2.0 Website Programming Problem Design Solution ISBN: 978-0-7645-8464-0 6 October 21st, 2009 02:57 PM
save time umeshtheone Pro VB 6 1 October 23rd, 2007 09:01 AM
Using "now()" to save date and time to database misskaos Classic ASP Basics 12 October 11th, 2006 05:01 PM
You can't save this record at this time avd Access 4 June 12th, 2006 08:29 AM



All times are GMT -4. The time now is 07:37 PM.


Powered by vBulletin®
Copyright ©2000 - 2018, Jelsoft Enterprises Ltd.
© 2013 John Wiley & Sons, Inc.