Advance query?

masterlayouts · April 12th, 2012, 12:03 PM

I have the following database scenario:

Code:

companies
-------
companyID (PK)
CompanyName

jobs
-------
jobID (PK)
companyID (FK)
description

domains
-----
domainsID (PK)
jobID (FK)
domains ('retail', 'it, 'management'...etc.)

benefits
------
benefitsID (PK)
jobID (FK)
benefits ('free transport', 'health insurance'...etc.)

Each job in jobs is posted by 1 company only, here I would like to limit the result to 1.
Each job may have multiple domains.
Each job may have multiple benefits.

What would be the easiest way to retrieve the information for a specific job in jobs table? The name of the company from companies table, all the domains associated with the jobID in jobs and all benefits associated with the same jobID in benefits table?

What I would like to obtain (using variable named as columns in the tables) is a multidimensional array like:

Code:

[jobID] array =>
0 => string ['description']
1 => string ['companyName']
2 => array [benefits] =>
     0 => string 'some benefits'
     1 => string 'another benefit'
     ... so on for all benefits associated to that jobID
3 => array [domains] =>
     0 => string 'some domain'
     1 => string 'another domain'
     ... so on for all domains associated to that jobID

What gives me problems is the fact that one table has 1 record only and I would like to see it enforced, where the domains and benefits table have a variable number of rows. If I join all I will have a number of records equal to 1 + number of domains with that jobID + number of benefits with that jobID. I can extract the information the way I want it, still I guess I am making a mistake and I have the feeling that it may be a better way to achieve the same result.

What query is best fitted to this situation?
Is there a better solution (maybe using views) for this complex example?

Rod Stephens · April 12th, 2012, 01:21 PM

I'm not completely sure I understand your questions so let me know if I miss something.

I don't think you need the domainsID and benefitsID fields because they aren't really used for anything. Instead I would make the combination of jobID and domain (or benefit) be the primary key. That would guarantee uniqueness so a particular job cannot have the same domain (or benefit) twice. (For example, a particular job could not have "retail" listed in two records.)

Quote:

If I join all I will have a number of records equal to 1 + number of domains with that jobID + number of benefits with that jobID.

I can think of two main approaches.

1. This is the method you mention where you just fetch everything in one big query. It's fairly brute force but may not be all that bad. In this example, the duplicate data is the company and job data and there isn't much of that here so you're not wasting too much space.

However, in a real application there might be a lot more information in the company and job tables and then fetching duplicates of that data for every row would add a lot of unnecessary data to be fetched and transferred from the database. You could just fetch the minimum of data from the company and job tables and then get more data later if you need it.

2. A second approach would be to have the program fetch the company and job data and then perform separate queries to fetch the corresponding domain and benefits data. This would require three separate queries (four if you fetch the company and job data separately--that would depend on how you fetch the job). There will be some extra overhead in each fetch but it shouldn't be too bad.

Some of this goes back to how the user selects the job. For example, you could list the companies and let the user pick one, then list the jobs and let the user pick one, and then finally list the domains and benefits of the job. In that scenario the program doesn't have the information it needs to fetch all of the data at once anyway so it can naturally fetch more information as the user selects details. In this case, this approach works very well.

For another example, suppose the user enters some data such as desired start date, salary, and domains. You could execute a query to get the job data for jobs that match those criteria and then display a list of the matching jobs. The user could then pick one to see the detail. Again the program doesn't need to fetch all of the data until the user makes some choices.

In contrast, suppose you want to select every job and process them all in some way. In that case it would probably be better to get all of the data at once instead of performing a huge number of separate fetches.

Unfortunately I donât think standard SQL syntax does a good job of supporting this kind of hierarchical data where one set of job data corresponds to many rows of domain and benefit data. Relational databases are really designed to select table-like results.

You may be able to get some mileage from XML queries. I havenât done much of that so Iâm not sure if it will be a big help or whether it will slow things down. For more information, see:

XML Support in Microsoft SQL Server 2005

I usually use the approach where the user makes a series of selections to drill down to the desired data. I would probably either use that or try selecting everything in one huge result and see if thereâs a problem. My guess is that it should work without causing you too many problems.

masterlayouts · April 12th, 2012, 01:57 PM

First observation, very good point, I already eliminated the ids for the benefit and domain tables.

Yes, the real example has indeed many columns, but for brevity I have included only a small set.

Indeed, the problems is that each job have many rows of domain and benefit data. I used the (1) solution, the big query. I guess it is not that bad, but looks inelegant.

I will probably move to the (2) solution if all I have it's (1) and (2) and craft the pages that display the information to accommodate. I am thinking that the (1) will be after all too big considering the number of columns in the real situation.

For example I can possibly give up to 'domains' when displaying the information. Similarly I can give up to 'benefits' on the search page. It feels a little odd, but it looks that it has to be some kind of trade-off to make it work. Thank you again for your help, really appreciate it.

I do not think I may be able to move or understand XML in the time I have at my disposal to finish the application I'm working on. I have to work with the shaky skills I have.

May I ask if you have any plans of writing another book on the subject that will take the matters further? I am a big fan of your book... now when I'm reading other books on the subject having your book as the gold standard I do not feel the same enthusiasm to say at least.

Rod Stephens · April 13th, 2012, 12:53 PM

Quote:

I used the (1) solution, the big query. I guess it is not that bad, but looks inelegant.

It feels a bit inelegant to me, too, but that's the way relational databases do things, in big table-like blocks of data. Usually they're pretty good at it and this kind of duplication isn't a big problem. (If you do a lot of network programming, working with graphs, math, geometry, etc, it does seem inelegant.

Quote:

I will probably move to the (2) solution if all I have it's (1) and (2) and craft the pages that display the information to accommodate. I am thinking that the (1) will be after all too big considering the number of columns in the real situation.

If possible, I would do a quick test using a realistic amount of data to see how bad it is. Memory is so cheap these days that if the data adds up to under a gigabyte it's probably not worth worrying about.

Quote:

I do not think I may be able to move or understand XML in the time I have at my disposal to finish the application I'm working on. I have to work with the shaky skills I have.

It's not too bad but I understand the time constraint. Something to look into for the future.

Quote:

May I ask if you have any plans of writing another book on the subject that will take the matters further? I am a big fan of your book... now when I'm reading other books on the subject having your book as the gold standard I do not feel the same enthusiasm to say at least.

I would like to and the publisher is interested but I'm not sure what would go into a new edition. More complete examples might be useful, or more exercises. I think the kinds of discussions we get in this forum are interesting but I'm not sure how well they would fit in a book.

I'm certainly open to suggestions! If you have ideas about topics that you would like added, please let me know.

And post a review at Amazon, Barnes & Noble, or wherever else you like to buy books when you have a chance. Those reviews are crucial to book sales and that will determine whether the publisher will want to create a new edition.