The reason why your query is not giving you the results you want has to do with a bit of a misunderstanding you seem to have of how (OUTER) JOIN clauses and WHERE clauses work, or, more accurately, of the relative
timing of the execution of these two clauses.
Here is a description of how a SELECT clause works. It's important to realize that no self respecting query processor would actually do things the way I describe, but it is helpful to think of things as happening this way:
1. The first thing is that a temporary intermediate table is constructed which is the result of all the rows resulting from all the JOINS in the FROM clause. Any JOIN predicates (ON clauses) are applied as the rows are inserted into this intermediate table. Thus, the rows which end up in the intermediate table are only those rows which meet the ON condition(s).
2. If any tables in the FROM clause are given an alias, then from this point forward, the columns from that table must include the alias given whenever they are referenced anywhere else in the query. This is because from this point on, the column values used elsewhere in the query actually refer to the columns in the intermediate table and
not to the columns in the "base" tables.
3. The conditions in the WHERE clause are now applied to the rows in the intermediate table. Only those rows for which the entire WHERE clause is true are retained in the intermediate table.
4. Next, the GROUP BY clause is processed, and all rows in the intermediate table are combined according to the GROUP BY clause, and one row is retained for each distinct group. Group aggregate
function values are computed at this time and added as columns to the intermediate table.
5. Next, the HAVING clause is processed, in a manner identical to that for the WHERE clause. Thus, the difference between the two is that the WHERE clause operates on individual rows
before grouping, whereas the HAVING clause operates on rows (groups)
after the grouping operation takes place.
6. Next, the columns specified in the SELECT clause are retained from the intermediate table, and the other columns are discarded. Any aliases assigned to columns come into existence at this time,
all at once. This is why you can't refer to SELECT clause aliases in a WHERE clause - the WHERE clause is processed before those aliases come into existence. The resultant columns comprise the
query's resultset.
7. If there is an ORDER BY clause, the resultset is ordered according to that clause.
Let's rework your query to see if we can understand what is happening. I'll remove the grouping and aggregate functions to simplify things, and I'll add some more columns so we can easier see what's going on:
Code:
SELECT Products.ProductID, ProductName, price, transdate
FROM Products LEFT OUTER JOIN Transactions
ON Products.ProductID = Transactions.ProductID
If you run this query, you'll get 11 rows given your data in your original post:
Code:
ProductID ProdName price trandate
----------- ---------- ----------- ----------
1 Apples 5 20030615
1 Apples 10 20030620
1 Apples 15 20030625
1 Apples 20 20030705
1 Apples 25 20030707
2 Pears NULL NULL
3 Oranges 8 20030622
3 Oranges 16 20030702
3 Oranges 24 20030703
3 Oranges 32 20030710
4 Peaches NULL NULL
as we can see, the OUTER JOIN is correctly setting the price and date to NULL for those products for which there are no transactions.
But look what will happen when we apply your WHERE clause. The clause will select rows for which the condition is true, i.e. transactions in July:
Code:
ProductID ProdName price trandate
----------- ---------- ----------- ----------
1 Apples 20 20030705
1 Apples 25 20030707
3 Oranges 16 20030702
3 Oranges 24 20030703
3 Oranges 32 20030710
and this is why you get no pears or peaches - you've removed them because your WHERE clause will only retain rows whose date is in July, and those two rows have a NULL date, which isn't in any month.
You can modify your WHERE clause to allow for the NULL possibility:
Code:
WHERE TransDate IS NULL
OR (DATEPART(month, TransDate) = 7
AND DATEPART(year, TransDate) = 2003)
or you could move the date selection into the ON clause:
Code:
...
FROM Products LEFT OUTER JOIN Transactions
ON Products.ProductID = Transactions.ProductID
AND DATEPART(month, TransDate) = 7
AND DATEPART(year, TransDate) = 2003
...
this works because the OUTER JOIN retains
all rows from the left hand table and only those rows from the right which meet the ON condition.
Either way works, although I marginally prefer the first since to me it is clearer what is going on.
Now, if you run your query with the modification to the WHERE clause as I've suggested, you'll see that the Pears and Peaches rows have a transaction count of 1, which really isn't correct. This is because, as I posted earlier, COUNT(*) counts
rows in a group
and in this case those groups each contain 1 row (of NULLs).
If we replace the COUNT(*) with COUNT(TransDate), you'll get a more accurate result. COUNT(TransDate) will count the number of non-null
TransDates in each group ignoring NULL values. Thus, your query should read:
Code:
SELECT Products.ProductID, ProductName,
COUNT(TransDate) AS MTrans, SUM(Price) AS MRev
FROM Products LEFT OUTER JOIN Transactions
ON Products.ProductID = Transactions.ProductID
WHERE TransDate is NULL
OR (DATEPART(month, TransDate) = 7
AND DATEPART(year, TransDate) = 2003)
GROUP BY Products.ProductID, ProductName
ORDER BY MTrans DESC
Note that you can't use COUNT(MTrans) because, as I describe above, the MTrans column alias doesn't come into existence until
after the COUNT value is calculated. Note that you can use it in the ORDER BY clause (in fact you must) because by the time the ORDER BY clause is processed, the alias has come into existence, and the resultset column name becomes that alias, and the ORDER BY clause only "knows" the columns in the resultset, and not in the intermediate/base table.
Hope this helps...
Jeff Mason
Custom Apps, Inc.
www.custom-apps.com