Thanks for the kind words. Post a review when you have a chance! (Just pasting in your first sentence would make a good review.)
(A long one!
“Countries” and “States” tables
You said that if we have a validation list to convert it into a foreign key constraint. So I assume countries and states qualify, so I create two separate tables. Now both have a (full) name and an abbreviation (standard).
The idea behind the lookup table isn’t necessarily to match long names with abbreviations. It’s to make a list of all allowed entries. For States, you could use a table that only has a single field: the states’ full names. Then the main table (Customer or Order or whatever) would have a foreign key constraint to validate a record’s State field so the user can’t enter “Taxas” or “Funzona.”
put the abbreviation in the main table if you wanted to, but let’s think about how much disk space that would save.
Suppose the longest state has a 20-letter name. Using the abbreviation in the main record would save 18 characters. Assuming only 1 byte per character (which may not be the case with Unicode) and assuming you have 1,000 records, you save around 18KB of disk space. That’s nothing. Suppose you have 1 million records. In that case you save about 18MB. That’s more but still not all that much. Back in the 1970s and 1980s when disk space cost hundreds of dollars per MB, you would go to great lengths to save that much space. But now that disk space costs about $0.10 per gigabyte, it’s not worth spending much effort to save 18MB.
So I would probably decide which value (full name or abbreviation) that you want the user to see and store that value in the main record. You may still want to store them both in the lookup table for other purposes. For example, the USPS prefers that you use the abbreviation in the address so if you let the user see the full name, you may still want to be able to look up the abbreviation to print address labels and such.
You also need to consider things like how hard would it be to change the name of a state leaving the abbreviation alone. It would be moderately hard (although a cascading update will take care of it for you in many databases) but it’s a very unlikely scenario.
The second question will be if the Country and States tables must have a primary key? I assume that the foreign key will be State (and Country respectively), but should I also add a primary key like bellow?
All tables should have a primary key. If there is no combination of fields that is guaranteed unique (which is required by a primary key), then there’s something wrong with the table design.
Use the field that you use to look up records most often because the primary key should have the fastest indexing.
The primary key requires uniqueness, which is something you want in a lookup table because, for example, you don’t want two states with the same name. You also don’t want two states with the same abbreviation, however, so you should put a uniqueness constraint on whichever field is not the primary key in this example. (I think I would use full name as the primary key because it will be hit every time you look up a main record and give abbreviation the uniqueness constraint. Unless you leave out the abbreviation, in which case that’s moot.)
Assuming that we want to display the States/Provinces for several countries, let say for Canada and United States. In many places I’ve seen both put together in the (UI Form) Select. I guess this is adhoc and probably goes in what you called “mishmashed “ table and you advise against.
An excellent question. If you put states and provinces in the same table with no indication that they are in different countries, then that is a mismashed table because it doesn’t acknowledge the relationship between state and country. To look at it in another way, there’s a missing piece of information. California and Quebec are almost but not exactly the same sort of thing. And if that’s all you store in that table, then the user could select incorrect combinations such as Ontario USA.
I think a lot of programs let you make that invalid selection and then the user interface validates the data when you accept the form. That’s a workable solution but I prefer not to let the user enter invalid data/combinations if possible.
From a DB design point of view, you could simply include the country name in the table. So at this point, you could have a table with the fields:
You could make a lookup table listing the countries if you want and use it as a foreign key constraint in this table, but this may be the only place you need to use country, in which case I think many developers would skip the extra table. (The downside is if you need to rename a country or add a new state record, then you need to be careful that you do it correctly and don’t misspell Mexicho.)
From a user interface point of view, what does the user select to make an address? You might make the user pick the country first and then make a combo box show the list of allowed states/provinces. A lot of web pages do that and it’s not too bad.
An alternative would be to let the user select the state/province and then automatically fill in the country by looking it up in the table. That would work if you don’t have too many states/provinces. The US plus Canada might work (although Canadians would need to look through a lot more entries than usual). If you have lots of countries, it would probably be better to do it the other way because this could be a huge list.
Allowing type-ahead would help, too. For example, you type “Was” and it jumps to “Washington.”
To really do this right, you’d also need a postal code lookup table somewhere to prevent the user from entering an invalid country/state/city/postal code combination.
(Job/recruiting records) Some have a finite number of options: ContractType (full time, part time, per diem, per project…), Availability (local, state wide, nation wide, international), CareerLevel, CivilStatus, Education, Experience. My first question that pops my mind is if it feasible to have so many columns linked to other tables for meaning.
Yes I think that’s feasible. The database really only takes a hit when you try to create or modify a record and it needs to check the lookup tables to verify that your values are correct. I don’t think that will be a problem for modern databases. Fetching the data takes a bit longer but shouldn’t be too big a burden as long as you’re not trying to fetch a million records, and in that case it’s going to take a while no matter what.
If you’re worried about a particular program, I would try it and see. Most of the time, this won’t be a problem.
[quote]To create the UI I assume that I will have to join all in a query to create the dropdown menus. Wouldn’t be better if I do not make this thing at this tier level, but in the programming language basically hard-coding the values?[quote]
Yes and no. I would probably handle validation in the UI but you don’t need to hard code the values. When a form loads, it can do a “SELECT StateName FROM States ORDER BY StateName” and use the results to build the items in the states combo box. That way the data is all in the database and not hard coded but the combo box can still perform validation in the UI getting the latest values from the database.
The program can rebuild the combo box when the user picks a country. (Actually the combo box would probably be disabled until the user picked a country, at which point the program would populate the combo box and then enable it.)
Another easy thing that bothers me is the “sex” column. I am tempted to use a char(1) and m/f; but when it is true when the Employee register that one can be either Male or Female, when the Employer post the ad he has in fact 3 options: male, female and both. How this issue should be handled?
I think different developers handle this one in various ways. I think most hard code M/F and leave it at that. Unless the application focuses on such issues, I don’t think anyone spends any effort other genders.
For the employer who can say M, F, or both/either, you could use M/F/B and write a little ugly code to handle it. Or you could give the Job table two fields AllowMale and AllowFemale and set one or both of them. That might make the query a little simpler.
SELECT * FROM Job WHERE
((Employee.Gender = ‘M’ AND Job.AllowMale) OR
(Employee.Gender = ‘F’ AND Job.AllowFemale))
Another general question is if all these lists go to separate tables, how it will be better to have something similar to the abbreviation in the State/Country in the main table (here Ads) and the full name in the linked table (for example ContractType)? I assume I can even move the meaning entirely in the linked table using in the main table placeholders and defining the column as int(1). Now I will have “1” in the main table, “1” FK in one of the columns of the linked table and “Full Time” (meaning) as another column in the linked?
I think this depends on why you’re using the lookup table. If you’re just using it to list all of the available possibilities, then you probably don’t need to have the shorter abbreviation or code. Instead you can store the full value in the main table.
That makes it a little harder to change the value. For example, if you decide that “Full Time” should be changed to “Exempt,” then you need to update more records. The database should ensure that this update occurs correctly, however, via cascading updates, which are supported by the more advanced databases at least. And hopefully that shouldn’t occur very often.
The case is less clear with a “heavier” table such as InventoryItems. That table might store an item’s name, description, and price. In that case you may want to store item IDs in the main table and look them up in the InventoryItems table instead of storing the item’s name everywhere.
Then you can do things like update the item’s name or description without updating the main OrderItems table.
Even there you may want to take care, however, because those records probably represent past sales, too. For example, if you change the price, you would effectively be changing the price retroactively on past sales and that could be confusing.
To avoid that, you might store the actual price with the past orders (which adds redundant data, but which I think is the most common approach) or you could add another level of tables storing pricing history over time (which is probably “correct” but more confusing).
The most difficult aspect it is probably related to the Domain column that I left it last. Here I have a list of domains from which the Employer can select how many he wants. I assume I will have another two tables besides the “Ads” table: one to hold the Employee selections and one to hold the list of domains available. The same questions above apply here: do I place placeholders in the main table and SelectedDomains table, explain the meaning in the last table DomainsList?
I think so, if I understand the question. This is a many-to-many relationship so you need to add an intermediate table.
You would want a lookup table to list the domains, perhaps called Domains. Then you would need a JobDomains (or SelectedDomains) table to list the domains selected for each job. It would have JobId as a link to the Jobs table and DomainId as a link to the Domains table. (DomainId could be the domain’s name if you like, just as the States table could include only the state’s name and not an abbreviation. This might depend on how long the domain names are and how often then change.)
Last question, I took this table as being a compound object. Do I interpret correctly the problem? I would have been tempted to take out some of the columns and leave the filtering at the account level when the Employer sort the information; However what I want is to implement and enforce them at the level of ad. Still it remains the questions if it will be better to split this table further or not.
Do you mean would it better to keep all of the fields in one huge record or split out the filtering fields and then link them to the body of the data such as the long job description?
In the old days, the clear answer would be to split the record because shuffling around huge chunks of data was slow. Now days the choice is less clear. Databases are pretty good at only using the data they need not the entire record. So if you query on a huge record but are only looking at a date field, the database should only really look at the date field. There will still probably be some issues with the disk drive taking longer to spin across the bigger records but modern disk drives are pretty fast so I doubt it would be a problem.
This should only be a problem if you have a really huge database, which CareerBuilder, Monster, and Dice probably do. In that case, they may get better performance by using a distributed database so the data is stored on several different servers that can all search their data at the same time.
Even for those databases, I’m not sure there will be a problem. Modern databases use typically B+trees and B*trees to perform searches and they are amazingly fast. They don’t actually need to look at the entire record to just look at key fields so they don’t really use the huge record until they find the records they want.
What will really kill performance is a poor database design so you need to search non-indexed fields. For example, searching for all jobs with the word “fish” in the description would require a linear search of every record and would be horribly slow. That’s why this type of application usually requires keyword lists. Those are indexed and much faster to search.
And if things really are too slow, it’s not too terribly hard to split a table later. It’s not fun but possible. If you start with a split table and then decide that was unnecessary, you will be unlikely to try to join it back together and rewrite all of the logic that puts the pieces together.
In general my rule in all programming (user interface, database, algorithms, whatever) is to first make the program make logical sense. Do what’s most natural for you. Then make it more complicated only after you prove
there is a problem. If you suspect there might be a problem, perform some tests to see if you’re right before you make the code unnecessarily complicated. I have seen projects fail because their logic was so convoluted that no one could get them to work. I’ve rarely seen a project fail because it was too slow and couldn’t be speeded up.
I hope I’ve answered all of your questions. If not, please let me know!