Enzo, there was an excellent explanation here 2 weeks ago which touched on
this topic and more by Jeff Mason (you legend you). I have copied it below.
I hope you don't mind Jeff. Note point 4.
Regards,
Nicholas Trevatt
"In order to understand what is going on in your HAVING clause, it might be
good to review how a SELECT statement works. What I'm going to describe
here is what happens in theory; any decent optimizer would never do it the
way I describe. But this is the model so it's helpful to think that things
work this way.
1. The first thing that happens is that all the tables in the FROM clause
are combined together into an intermediate working table. This table
contains all the columns from all of the tables being joined.
2. The WHERE clause is evaluated for each row in the intermediate table,
and those rows that do not pass criteria, that is, that do not test to TRUE
are discarded.
3. If there is a GROUP BY clause, groups are made, and each group is
reduced to a single row. This result replaces the original working table.
The columns in the rows of this intermediate grouped table must be either a
grouping column (mentioned in the GROUP BY clause) or an aggregate functions
on a column.
4. Apply the HAVING clause to this grouped table in a manner similar to how
it's done for the WHERE clause. If there was no GROUP BY clause, then the
entire intermediate table is treated as one group.
5. Next, the SELECT clause is evaluated, and only those columns mentioned in
the SELECT clause are retained from the intermediate table. If there are
any expressions in the SELECT clause, these results are constructed. This
means that any scalar subqueries, function calls and expressions in the
SELECT are done after all the other clauses are done. The AS operator can
give a name to expressions in the SELECT list, too. These new names come
into existence all at once, but after the WHERE clause has been executed;
you cannot use them in the SELECT list or the WHERE clause for that reason.
If there is a SELECT DISTINCT, then all redundant duplicate rows are removed
from the result.
6. Lastly, the ORDER BY clause is evaluated. The results of an ORDER BY is
actually a cursor, and not part of a SELECT statement. The result set has
to be completed, then passed to the cursor, so that it can be converted into
a sequential set of rows to be returned to the caller. This means that the
only columns names the cursor can see are the ones in the SELECT list.
So, looking at your query:
SELECT EmployeeID, COUNT(OrderID) AS NumOrders
FROM Orders
GROUP BY EmployeeID
HAVING COUNT(OrderID) > 100
we can see that the intermediate table is the entire contents of the
'Orders' table. Since there is a GROUP BY clause, this will be replaced by
a new intermediate table which consists of one row for each unique
EmployeeID. Each row contains two columns: the (unique) EmployeeID, and the
aggregate function COUNT(OrderID). This aggregate counts the number of
unique, non-null values of OrderID in each group (each employee). Note that
this would be the same as COUNT(*), unless there were rows in the original
intermediate table which had null values for OrderID. Count(*) counts the
rows which were used to construct the group; COUNT(OrderID) counts the
number of distinct OrderID values.
Since there is a HAVING clause, only those rows which have a COUNT(OrderID)
greater than 100 are retained in the resultset.
While we can't make any assumptions about what a given optimizer will do,
it's fairly safe to say that the expression 'COUNT(OrderID) AS NumOrders' is
constructed while the group is being constructed, and the HAVING clause is
being evaluated at the same time, so the chances are that the optimizer will
notice that there is a common subexpression and only evaluate things once.
But who knows? And frankly, who cares? If you need to know how many
distinct OrderIDs are associated with each employee, but only if that number
exceeds 100, this is the only sane, set based way to do it. It could
probably also be done with a cursor and some procedural code, but why?
--
Jeff Mason Custom Apps, Inc.
Jeff@c...
"
-----Original Message-----
From: Enzo Zaragoza [mailto:enzaux@g...]
Sent: Friday, November 15, 2002 5:30 PM
To: sql language
Subject: [sql_language] Difference IN Using HAVING and WHERE together
with GROUP BY
Before I thought that HAVING clause is used when the GROUP clause is used
in
replacement for WHERE clause. But it seems that WHERE clause works with
GROUP clause,
so it confuses me what are the difference of using WHERE between HAVING in
use together
with GROUP clause.
Thanks,
Enzo
YahooID: yackydidakdak