Another complex query and the use of UNION

masterlayouts · April 13th, 2012, 09:34 AM

In this scenario, I would like to fetch all the jobs posted by an unique employer, than fetch all employees that applied for each particular job posted by that employee (employeesJobs table).

Code:

jobs
----
jobsID (PK)
employeeID (FK)
... more

employees
----------
employeeID (PK)
... more

employers
----------
employerID (PK)
... more

EmployeesJobs
-------------
employeeID (PK)
jobsID (PK)

What it would be more efficient (real example has all tables with more columns) to use. Two queries or one query containing a subquery? Is this a case where making use of views may considerably help?

Regarding the question I post yesterday, it was mentioned as solutions (1) a big query or (2) subsequent queries. Isn't possible to use UNION and put together two queries in one load? Does this option present any advantage to a series of queries?

Rod Stephens · April 13th, 2012, 01:04 PM

Quote:

What it would be more efficient (real example has all tables with more columns) to use. Two queries or one query containing a subquery? Is this a case where making use of views may considerably help?

If you're going to fetch all of the data anyway, I usually start with one query and see if there's a problem. Using multiple queries just makes the database perform more round trips and makes it construct and optimize the queries separately, and all of that takes time. You can help reduce the time by doing things like trying to fetch on primary keys and using precompiled queries (the details vary depending on what database system you're using) but there will always be extra overhead in multiple queries.

Views are mostly to restrict the fields in a query's results. That can be useful if you want a user to see only some fields in a table (for example, see customer contact info but not credit card info).

It can also simplify the code. For example, you can define a view that performs some sort of query and then the code can simply fetch using the view so it doesn't need to know how the view is built.

But I don't think either of those really helps with performance. (Unless perhaps they can be precompiled somehow. That would probably depend on the database system.)

Quote:

Regarding the question I post yesterday, it was mentioned as solutions (1) a big query or (2) subsequent queries. Isn't possible to use UNION and put together two queries in one load? Does this option present any advantage to a series of queries?

It's possible that a single query that uses a union might be faster than multiple queries, particularly if you would need to (for example) perform one sub-query for each of a large number of main queries. (I.e. for each company, fetch the jobs posted by that company.)

I *think* what the database could do would be to compile the sub-query and then execute it separately for each of the returned companies. That would be faster than performing the sub-queries from scratch each time.

But I don't know whether a particular database would be smart enough to figure that out (I suspect most would). And you could do the same thing yourself, which would improve your performance. There would still probably be an advantage to making only one round trip, but even there it may depend on the amount of data. If the whole thing returned 20 GB, you might want to fetch and process it in chunks rather than trying to fetch everything all in one go.

I think these sorts of issues are no problem for 95% of typical databases. For the others where this might be an issue, I would prefer to make some quick tests on realistic sized databases to see what actually happens. Some databases are surprisingly good at handling situations that your intuition may say would cause a problem. If you can find out for certain quickly, you may be able to avoid a lot of unnecessary work.

masterlayouts · April 13th, 2012, 03:15 PM

Quote:

Originally Posted by Rod Stephens

If you're going to fetch all of the data anyway, I usually start with one query and see if there's a problem. Using multiple queries just makes the database perform more round trips and makes it construct and optimize the queries separately, and all of that takes time. You can help reduce the time by doing things like trying to fetch on primary keys and using precompiled queries (the details vary depending on what database system you're using) but there will always be extra overhead in multiple queries.

I am using MySQL and I searched for precompiled queries and did not find anything. Could you please point me in the right direction? This precompiled queries are somehow the result of optimization from working with database settings or should be compiled at the time of database build? What language is used?

...

Quote:

Originally Posted by Rod Stephens

It can also simplify the code. For example, you can define a view that performs some sort of query and then the code can simply fetch using the view so it doesn't need to know how the view is built.

Exactly this I had in mind when I asked the question. An increase in performance, if I understand correctly, will not be; but as far as it will not be a significant decline, simplifying the code it's a plus. So your advise will be to use views in this particular example?

Quote:

Originally Posted by Rod Stephens

It's possible that a single query that uses a union might be faster than multiple queries, particularly if you would need to (for example) perform one sub-query for each of a large number of main queries. (I.e. for each company, fetch the jobs posted by that company.)

The downside in my mind is that it is a little bit more difficult to read, also in one big chunk you cannot give up to parts of it for certain page where you can live without having it all and lastly I am thinking that error reporting and code logic, at least it seems to me, cleaner if using several queries. What would be your advise for this situation?

Quote:

Originally Posted by Rod Stephens

I *think* what the database could do would be to compile the sub-query and then execute it separately for each of the returned companies. That would be faster than performing the sub-queries from scratch each time.

I will definitely want to find out more about this subject. Do you have anything written on this subject?

Rod Stephens · April 13th, 2012, 03:43 PM

In MySQL, look at "prepared statements." See this link:

Prepared Statements

In particular read the "benefits of Prepared Statements" section at the beginning to see what they can do for you.

Quote:

An increase in performance, if I understand correctly, will not be; but as far as it will not be a significant decline, simplifying the code it's a plus. So your advise will be to use views in this particular example?

It may depend on your point of view. I usually do more complicated selection within my program's code so I tend to put things there. If you work more with the database, then you may prefer to put things there in a view or stored procedure.

Ask yourself, "If the fields needed here change, where would be it easier to change the view? In the program code or the database?"

Quote:

The downside in my mind is that it is a little bit more difficult to read, also in one big chunk you cannot give up to parts of it for certain page where you can live without having it all and lastly I am thinking that error reporting and code logic, at least it seems to me, cleaner if using several queries. What would be your advise for this situation?

My general advice for many design decisions is to do it the way that makes the most sense for you and that will be easiest. Then if you find there is a problem, you can fix it later. (That doesn't mean you need to be stupid. If you suspect there may be a problem, you should test to get evidence before you build the easy solution.)

But all things being equal, if it makes more sense to you and you understand it better, you are more likely to be able to implement it without making mistakes and it'll be easier to debug and modify later as needed.

Quote:

I will definitely want to find out more about this subject. Do you have anything written on this subject?

I don't think so. Try that link I gave above.

You should also read a bit about the query optimizer. Before a query is executed, the optimizer makes a plan to try to find the information in the most efficient way. For example, when joining companies to job postings, would it be better to fetch all companies and then find their jobs or fetch all the jobs and then find their customers?

For example, take a look at: