You are currently viewing the BOOK: Beginning Database Design section of the Wrox Programmer to Programmer discussions. This is a community of tens of thousands of software programmers and website developers including Wrox book authors and readers. As a guest, you can read any forum posting. By joining today you can post your own programming questions, respond to other developers’ questions, and eliminate the ads that are displayed to guests. Registration is fast, simple and absolutely free .
I am quite new to database design and would like some advice on the following problem.
The project involves cataloguing the articles published in my employers inhouse magazine, with a view to putting them all online in the future.
I could go the simple path of connecting the authors to their works with a simple two model table:
AUTHORS
authorID [PK]
fName
sName
ARTICLES
ArticleID [PK]
AuthorID [FK]
Header
Body
Summary
I thought, however, that it would be more interesting to show the authors role in the company at the time of publication.
(There are 13 locations and a number of job titles they may have been associated with during their time at the company.)
I therefore intend to use an 'author_context' table to link the 'author' table to the 'article table'.
My question is this: is it better to combine location, function and authorid into a composite key or create a Author_Context_ID field?
AUTHOR_CONTEXT
Location [PK]
Function [PK]
AuthorID [PK]
DateStart
DateEnd
or...
AuthorContext [PK]
Location
Function
AuthorID
DateStart
DateEnd
The use of a 3 field composite key would mean introducing those 3 fields into the ARTICLES table which seems messy/inefficient.
However, because the combined three fields will always be unique I dont know if it is good practice to introduce a uniquekey to reference them.
What's best, or am I missing something more logical?
My personal feeling is that I always want a specific ID that I can go after to return a record. I like having the AuthorContext [PK] field in the DB. I suspect that this may also reduce the complexity of the database and the fields you're using. When you hit the database to return one of the records in that table, I believe it's easier to look up one ID and return the result, than to look up an author in the table, then look up the locations of those authors and then, etc. etc. This also simplifies the SQL you write because you only need to SELECT * WHERE AuthorContext = @AuthorContext, instead of SELECT * WHERE (Author = @Author AND Location = @Location AND Function = @Function) This may also eliminate a couple parameters you have to feed into your DbCommand object depending on what other parameters you require.
That said, if you need to ensure that the combination of Author, Location, and Function remain unique then there is still absolutely value in creating a UNIQUE key relationship (as opposed to a Primary key) for them to preserve the integrity of your data. Note that this isn't a construct you'd create for data access purposes, but only to eliminate a certain type of invalid data from being stored in the DB. All your queries and nonqueries in your SQL text and stored procedures will be reference the easier AuthorContext PK.
__________________
-------------------------
Whatever you can do or dream you can, begin it. Boldness has genius, power and magic in it. Begin it now.
-Johann von Goethe
When Two Hearts Race... Both Win.
-Dove Chocolate Wrapper
Chroniclemaster1, Founder of www.EarthChronicle.com
A Growing History of our Planet, by our Planet, for our Planet.
I think I will go the uniqueID path for the reasons you mention.
Incidentally, I searched the web just after posting and it seems that this is a long ongoing debate: Surrogate Keys vs Primary/Composite keys. I would have expected there to be a consensus of opinion on something that lies at the very foundation of good database design, yet the difference in opinions and the intensity of them is unexpected to say the least.
Personally I feel, as you do, that it reduces complexity and for someone such as myself who can be muddleheaded at the best of times, this is a big advantage!
Oldpedant:
I never actually considered separate tables for location and function. This may have been intuitive because as desirable as normalized data is these fields will play such a limited role in the finished product that I fear normalizing them may bring unnecessary complexity with no tangible gain.
I give my reasons below. I would value your opinion because I have been warned about the possibility of unforeseen complications occurring further down the development and deployment paths!
I do not need to store additional information about the functions, or the locations (not even the addresses) other than their names. If I were to isolate them then it would mean introducing two tables, each consisting of two fields only, the field info itself and its referencing key. The queries would be more complicated and quick glances at the raw data would be full of numeric references to function and location as opposed to their real world names. I could reduce the tables down to one field, their 'names', but that seems a little odd. Of course, in the future as new locations and functions are added it could technically be trickier to input the first entries that reference them but I intend to hardwire the values into drop-down lists, and not populate these lists from the database. The functions and locations themselves are only added to on a longterm basis.
I think I will go the uniqueID path for the reasons you mention.
I would have expected there to be a consensus of opinion on something that lies at the very foundation of good database design, yet the difference in opinions and the intensity of them is unexpected to say the least.
Hahaha. :) Yeah, you'll find that on a number of issues. As much as programming is about dealing with technical (in some sense scientific) details you'd think these things were "solved" just like you solve a math problem. The thing is that different programmers have different priorities, so a solution that makes total sense to one person because it solves their priorities, seems totally ridiculous to someone else who thinks differently. :) That's why people argue about Mac vs. PC, Windows vs. Linux, MS vs. Apple vs. Open Source, OOP vs. Procedural, frameworks vs. programming, etc., etc., etc.
Quote:
Originally Posted by guynorton
Personally I feel, as you do, that it reduces complexity and for someone such as myself who can be muddleheaded at the best of times, this is a big advantage!
:D This is almost word for word from a great book on programming, Steve McConnell's Code Complete 2. The number one rule in computer programming is to minimize complexity. McConnell has a lot of excellent tips and techniques for achieving that too, which is why it's part of my quick reference shelf, but in particular I think this rule is right on.
There are all kinds of guys who are absolutely brilliant; they're SO much better than I am. I've seen the code they write. It's completely amazing. I would never be smart enough to write code like that. But here's the problem... nor would I ever be smart enough to maintain code like that. And because two equally smart developers still think differently there aren't many OTHER people who can either.
That's why I agree you should minimize the complexity of your code. I try to program defensively, which means that if your code isn't clear, obvious, simple to understand, people will find ways to misunderstand it.
Quote:
Originally Posted by guynorton
Oldpedant:
I never actually considered separate tables for location and function. This may have been intuitive because as desirable as normalized data is these fields will play such a limited role in the finished product that I fear normalizing them may bring unnecessary complexity with no tangible gain.
I actually like the separate tables. It is a little more complicated but not much, and it ensures that a simple misspelling doesn't ruin your database data (because that's all it takes to lose data).
Think of it this way, if you're putting everything in one table, it's just like a spreadsheet. That's not just a problem, it was THE problem which inspired people to develop the relational database in the first place. With all the names inserted, you have data that is right, data that is wrong, and data that is almost right. That's the problem with spreadsheets, because it's easy to fix the stuff that's wrong, it's finding the stuff that "looks" right but isn't. In a database where you refer to the ID numbers, you dump out a query and the things which are wrong stick out like a sore thumb because you've effectively eliminated the category of "almost right" from the equation. You either have the correct ID or you don't.
__________________
-------------------------
Whatever you can do or dream you can, begin it. Boldness has genius, power and magic in it. Begin it now.
-Johann von Goethe
When Two Hearts Race... Both Win.
-Dove Chocolate Wrapper
Chroniclemaster1, Founder of www.EarthChronicle.com
A Growing History of our Planet, by our Planet, for our Planet.