Wrox Programmer Forums

Need to download code?

View our list of code downloads.

Go Back   Wrox Programmer Forums > Database > SQL Language
Password Reminder
Register
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read
SQL Language SQL Language discussions not specific to a particular RDBMS program or vendor.
Welcome to the p2p.wrox.com Forums.

You are currently viewing the SQL Language section of the Wrox Programmer to Programmer discussions. This is a community of tens of thousands of software programmers and website developers including Wrox book authors and readers. As a guest, you can read any forum posting. By joining today you can post your own programming questions, respond to other developers’ questions, and eliminate the ads that are displayed to guests. Registration is fast, simple and absolutely free .
DRM-free e-books 300x50
Reply
 
Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old July 3rd, 2003, 06:26 PM
Registered User
 
Join Date: Jul 2003
Location: Sacramento, CA, USA.
Posts: 1
Thanks: 0
Thanked 0 Times in 0 Posts
Default The Alias Problem

By Alias, I mean "otherwise called; otherwise known as" (thanks Merriam-Webster online) for a person.

My problem is this. I am designing a database, and I am unsure of which way to model the data.

The Real World: A person is arrested, and he is "booked" or processed into a jail or released. As is often the case, that person uses a fake name or alias. Eventually, if that person is a habitual offender (a modern term for career criminal) he or she will have a list of known fake names or aliases.

Now comes the database design part.

I have approached this two ways. Both ways seperate the person (at booking time) from their name used. In conceptual database modeling, the relationship appears as such:

+--------+ +-------+
| person |-|----0<| alias |
+--------+ +-------+

I am thinking of doing the following for the alias


+--------+ +-----------+ +------+
| person |-|----0<| aliasList |>0----|-| name |
+--------+ +-----------+ +------+

This would minimize the replication of common names such as "Edward" or "Jennifer", at the cost of more complex queries and joins. The AliasList table would contain the two primary keys (both artifical keys) to link the Person to their Names. For the name table, I am wondering if it would have:
  (1) LastName, FirstName, MiddleName, Suffix
or should it have:
  (2) Name, NameType [Last|First|Middle|Nickname]

The First solution was the first design, and one which a Sr. DBA did. The second design could handle "George 'Machine Gun' Kelly" better and result in much smaller list of names.

I ask you, forum members, if you have any better suggestions on how to implement this. I am wary of complicated joins, having been burnt in the past by database vendor performance verses database design theory.

Oh, and of course the project has no specs, no explicit performance requirements, and no database vendor decided on. The usual development process. :-)

--Matthew




Reply With Quote
  #2 (permalink)  
Old July 3rd, 2003, 07:11 PM
Friend of Wrox
 
Join Date: Jun 2003
Location: Hudson, MA, USA.
Posts: 839
Thanks: 0
Thanked 1 Time in 1 Post
Default

If I were doing it, I think I would opt for solution number 1. It's simpler, and results in far simpler queries. Furthermore, if you think about the code you'll have to write to maintain the many-to-many join your second solution requires, I think you will agree that it is far more complex. The user interface for such things can get convoluted, too.

I base my opinion mostly on the following: Disk space is cheap, user time waiting for joins to finish isn't.

Joins are a necessary part of life, but let's not go overboard.

How many times, really, is "Edward" likely to be repeated in your database? And even if it is repeated 100,000 times, you've used up a bit over a half a meg of space to store all of them. Big deal. If you consider the entire database, how many names are likely to repeated overall? What is the actual probability that any given name is likely to be repeated?

Let's do a bit of math. Suppose you have 100 million names in your database. Let's be generous and say each name on average requires 30 characters. You need only 3 Gigs to store those names. How many duplicates are in those 100 million? 30%? I doubt its anywhere near that much, but let's say 1/2 the names are duplicated. So, the space required to store the artificial keys in the many-to-many table would be 4 bytes times 100 million on one "side" of the table and 4 bytes times 50 million on the other, and the space reduction by removing the duplicates would be 1.5 gigs. So we have 1.5 Gigs to store the unique names plus 600MB for the many-to-many table plus some overhead in the table itself which I'll ignore. So fully representing the names takes 3 Gigs, and the many-to-many approach avoiding duplicates takes a bit over 2 Gigs. Now, things are a good deal more complex than I've represented, as presumably some names would result in a higher savings. Thus, if there a lot of criminals using the alias "John Smith", then the savings would be higher. But it would only be higher on the 'alias' side of the many-to-many table. You still have the fixed "overhead" of 100 million people on the 'people' side.

But all in all we're still talking a few gigabytes one way or the other. I think that's a worthwhile tradeoff for the complexity of the queries and the code to maintain the data in the structure.

I'm not sure how the second solution handles Mr. Kelley "better" than the first.

If you insist on using the many-to-many table, I would recommend staying away from your alternative 2. Constructing somebody's fullname would require four self-joins, possibly 3 of them outer joins, for each name you want to present. Yuck.

Listen to your Senior DBA. He knows what he's talking about.

Jeff Mason
Custom Apps, Inc.
www.custom-apps.com
Reply With Quote
  #3 (permalink)  
Old July 3rd, 2003, 07:25 PM
Registered User
 
Join Date: Jul 2003
Location: Adrift on Loch Ruadh, Any Day of the Week, Ireland.
Posts: 2
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Well, if I have what you are asking, my first say would be that it really comes down to how often you are likely to search for bits of names. Is there statistical significance in searching for all the criminals named George, for instance? Probably not. There is only merit in splitting fields up if there is merit in searching the individual subsections separately.

However, since criminals have quite limited imaginations, and tend to name one another after their modus operandi ,is there perhaps not merit in being able to pull out all the "Machineguns" in the list together: so that one might view George "Machinegun" Kelly alongside his more deadly and infamous compatriot Oonagh "Machinegun" Molloy - in such a way that we are not hampered with results concerning the far less threatening Oonagh "Inflatableclawhammer" Molloy? That way, we could distinguish whether the recently rumored Maxwell "Machinegun" Hosenbaum, said to be touring eastside making holes in people, is not in fact, more likely to be an alias for the recently-escaped George "Machinegun" Kelly, than the newly incarcerated Oonagh "Machinegun" Molloy (she being a woman, and all - and behind bars, too boot).

Oomoo

--------------
Feline Tiik Design
To err is human... Only, some of us are more human than others.
Reply With Quote
  #4 (permalink)  
Old July 3rd, 2003, 07:28 PM
Friend of Wrox
 
Join Date: Jun 2003
Location: Sydney, NSW, Australia.
Posts: 215
Thanks: 0
Thanked 0 Times in 0 Posts
Default

I concur with Jeff. This would unnecessarily complicate things.

regards
David Cameron
Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
XSL namespace-alias problem cheez XSLT 4 August 22nd, 2006 03:20 AM
date formate in alias mateenmohd SQL Server 2000 1 November 7th, 2004 02:58 AM
Access alias question badgolfer Access 2 December 15th, 2003 04:47 AM
Access alias question badgolfer VB Databases Basics 1 December 12th, 2003 11:56 PM



All times are GMT -4. The time now is 04:30 PM.


Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
© 2013 John Wiley & Sons, Inc.