Wrox Programmer Forums
|
SQL Language SQL Language discussions not specific to a particular RDBMS program or vendor.
Welcome to the p2p.wrox.com Forums.

You are currently viewing the SQL Language section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
 
Old July 27th, 2003, 04:29 PM
Registered User
 
Join Date: Jun 2003
Posts: 8
Thanks: 0
Thanked 0 Times in 0 Posts
Default SQL Syntax

I'm trying to get the correct SQL syntax to get a desired result. I'd appreciate your help in coming up with the correct query -

I'm trying to join two tables, Products and Transactions. The resulting table (MonthTrans) should be a table of all Products, with the count of Transactions and the sum of prices for each in a selected month, ordered by the count of transactions. I can't seem to get the resulting table to include all products (the ones with no transactions in the selected month are not included).

Here's an example:

Products Table:

ID Name
-- ----
1 Apples
2 Pears
3 Oranges
4 Peaches


Transactions Table:

Prod Trans
 ID Price Date
---- ----- -----
1 5 6/15/2003
1 10 6/20/2003
3 8 6/22/2003
1 15 6/25/2003
3 16 7/02/2003
3 24 7/03/2003
1 20 7/05/2003
1 25 7/07/2003
3 32 7/10/2003


MonthTrans table (the desired result) for July 2003:

Prod Trans
Name Count Revenue
------- ----- -------
Oranges 3 72
Apples 2 45
Pears 0 0
Peaches 0 0


I've tried the following SQL statement, but get only the Oranges and Apples lines; not the Pears and Peaches lines:

SELECT Products.ProductID, ProductName,
 COUNT(*) AS MTrans, SUM(Price) AS MRev
 FROM Products LEFT OUTER JOIN Transactions
 ON Products.ProductID = Transactions.ProductID
 WHERE DATEPART(month, TransDate) = 7
 AND DATEPART(year, TransDate) = 2003
 GROUP BY Products.ProductID, ProductName
 ORDER BY MTrans DESC


Thanks for any help/suggestions that you can provide.

- Jeff
 
Old July 28th, 2003, 12:47 AM
Friend of Wrox
 
Join Date: Jun 2003
Posts: 215
Thanks: 0
Thanked 0 Times in 0 Posts
Default

I did a little messing around and found that you need to specify the name of the column in COUNT for it to include NULLs in the count:

Code:
SELECT P.ProductID, P.ProductName,
    COUNT(T.ProductID) AS MTrans, SUM(T.Price) AS MRev
FROM Products P
    LEFT OUTER JOIN Transactions T ON
    P.ProductID = T.ProductID
WHERE DATEPART(month, T.TransDate) = 7
    AND DATEPART(year, T.TransDate) = 2003
GROUP BY P.ProductID, T.ProductName
ORDER BY MTrans DESC
regards
David Cameron
 
Old July 28th, 2003, 06:58 AM
Friend of Wrox
 
Join Date: Jun 2003
Posts: 839
Thanks: 0
Thanked 1 Time in 1 Post
Default

Quote:
quote:Originally posted by David Cameron
 I ... found that you need to specify the name of the column in COUNT for it to include NULLs in the count

Are you sure about this?

As I understand it, the difference between COUNT(*) and COUNT(somecolumn) is that COUNT(*) counts the rows in the query, whether or not there may be any NULL values, whereas COUNT(somecolumn) counts the number of nonnull values of <somecolumn>.

Jeff Mason
Custom Apps, Inc.
www.custom-apps.com
 
Old July 28th, 2003, 12:16 PM
Friend of Wrox
 
Join Date: Jun 2003
Posts: 839
Thanks: 0
Thanked 1 Time in 1 Post
Default

The reason why your query is not giving you the results you want has to do with a bit of a misunderstanding you seem to have of how (OUTER) JOIN clauses and WHERE clauses work, or, more accurately, of the relative timing of the execution of these two clauses.

Here is a description of how a SELECT clause works. It's important to realize that no self respecting query processor would actually do things the way I describe, but it is helpful to think of things as happening this way:

1. The first thing is that a temporary intermediate table is constructed which is the result of all the rows resulting from all the JOINS in the FROM clause. Any JOIN predicates (ON clauses) are applied as the rows are inserted into this intermediate table. Thus, the rows which end up in the intermediate table are only those rows which meet the ON condition(s).

2. If any tables in the FROM clause are given an alias, then from this point forward, the columns from that table must include the alias given whenever they are referenced anywhere else in the query. This is because from this point on, the column values used elsewhere in the query actually refer to the columns in the intermediate table and not to the columns in the "base" tables.

3. The conditions in the WHERE clause are now applied to the rows in the intermediate table. Only those rows for which the entire WHERE clause is true are retained in the intermediate table.

4. Next, the GROUP BY clause is processed, and all rows in the intermediate table are combined according to the GROUP BY clause, and one row is retained for each distinct group. Group aggregate
function values are computed at this time and added as columns to the intermediate table.

5. Next, the HAVING clause is processed, in a manner identical to that for the WHERE clause. Thus, the difference between the two is that the WHERE clause operates on individual rows before grouping, whereas the HAVING clause operates on rows (groups) after the grouping operation takes place.

6. Next, the columns specified in the SELECT clause are retained from the intermediate table, and the other columns are discarded. Any aliases assigned to columns come into existence at this time,
all at once. This is why you can't refer to SELECT clause aliases in a WHERE clause - the WHERE clause is processed before those aliases come into existence. The resultant columns comprise the
query's resultset.

7. If there is an ORDER BY clause, the resultset is ordered according to that clause.

Let's rework your query to see if we can understand what is happening. I'll remove the grouping and aggregate functions to simplify things, and I'll add some more columns so we can easier see what's going on:
Code:
SELECT Products.ProductID, ProductName, price, transdate
 FROM Products LEFT OUTER JOIN Transactions
 ON Products.ProductID = Transactions.ProductID
If you run this query, you'll get 11 rows given your data in your original post:
Code:
ProductID   ProdName   price       trandate   
----------- ---------- ----------- ---------- 
1           Apples     5           20030615  
1           Apples     10          20030620  
1           Apples     15          20030625  
1           Apples     20          20030705  
1           Apples     25          20030707  
2           Pears      NULL        NULL
3           Oranges    8           20030622  
3           Oranges    16          20030702  
3           Oranges    24          20030703  
3           Oranges    32          20030710  
4           Peaches    NULL        NULL
as we can see, the OUTER JOIN is correctly setting the price and date to NULL for those products for which there are no transactions.

But look what will happen when we apply your WHERE clause. The clause will select rows for which the condition is true, i.e. transactions in July:
Code:
ProductID   ProdName   price       trandate   
----------- ---------- ----------- ---------- 

1           Apples     20          20030705  
1           Apples     25          20030707  
3           Oranges    16          20030702  
3           Oranges    24          20030703  
3           Oranges    32          20030710
and this is why you get no pears or peaches - you've removed them because your WHERE clause will only retain rows whose date is in July, and those two rows have a NULL date, which isn't in any month.

You can modify your WHERE clause to allow for the NULL possibility:
Code:
WHERE TransDate IS NULL
    OR (DATEPART(month, TransDate) = 7
    AND DATEPART(year, TransDate) = 2003)
or you could move the date selection into the ON clause:
Code:
...
FROM Products LEFT OUTER JOIN Transactions
    ON Products.ProductID = Transactions.ProductID
    AND DATEPART(month, TransDate) = 7
    AND DATEPART(year, TransDate) = 2003
...
this works because the OUTER JOIN retains all rows from the left hand table and only those rows from the right which meet the ON condition.

Either way works, although I marginally prefer the first since to me it is clearer what is going on.

Now, if you run your query with the modification to the WHERE clause as I've suggested, you'll see that the Pears and Peaches rows have a transaction count of 1, which really isn't correct. This is because, as I posted earlier, COUNT(*) counts rows in a group
and in this case those groups each contain 1 row (of NULLs).

If we replace the COUNT(*) with COUNT(TransDate), you'll get a more accurate result. COUNT(TransDate) will count the number of non-null TransDates in each group ignoring NULL values. Thus, your query should read:
Code:
SELECT Products.ProductID, ProductName,
 COUNT(TransDate) AS MTrans, SUM(Price) AS MRev
 FROM Products LEFT OUTER JOIN Transactions
 ON Products.ProductID = Transactions.ProductID
 WHERE TransDate is NULL
 OR (DATEPART(month, TransDate) = 7
 AND DATEPART(year, TransDate) = 2003)
 GROUP BY Products.ProductID, ProductName
 ORDER BY MTrans DESC
Note that you can't use COUNT(MTrans) because, as I describe above, the MTrans column alias doesn't come into existence until after the COUNT value is calculated. Note that you can use it in the ORDER BY clause (in fact you must) because by the time the ORDER BY clause is processed, the alias has come into existence, and the resultset column name becomes that alias, and the ORDER BY clause only "knows" the columns in the resultset, and not in the intermediate/base table.

Hope this helps...

Jeff Mason
Custom Apps, Inc.
www.custom-apps.com
 
Old July 28th, 2003, 06:28 PM
Friend of Wrox
 
Join Date: Jun 2003
Posts: 215
Thanks: 0
Thanked 0 Times in 0 Posts
Default

I am completely sure ... that I was wrong. I should check things a little more carefully, thanks for pointing that out. Specifying the colum name means that the NULLs go to 0 in the count, using COUNT(*) means you get back a 1 whether there is one row in the table that was outer joined, or even if there were no rows (if that makes sense).

regards
David Cameron
 
Old July 28th, 2003, 06:35 PM
Friend of Wrox
 
Join Date: Jun 2003
Posts: 839
Thanks: 0
Thanked 1 Time in 1 Post
Default

Quote:
quote:
Specifying the column name means that the NULLs go to 0 in the count, using COUNT(*) means you get back a 1 whether there is one row in the table that was outer joined, or even if there were no rows (if that makes sense).
That last part confuses me. COUNT(*) returns the number of rows in the query (or Group). The row can have NULLs in it and it will be still counted, regardless of where the NULLS came from, be it via an OUTER JOIN, or because that's what's stored in the column in the base table. COUNT(somecolumn) returns the number of rows (in the query or group) which have a value (i.e. not NULL) in 'somecolumn'.

Jeff Mason
Custom Apps, Inc.
www.custom-apps.com
 
Old July 28th, 2003, 06:41 PM
Friend of Wrox
 
Join Date: Jun 2003
Posts: 215
Thanks: 0
Thanked 0 Times in 0 Posts
Default

That is what I meant, only you have said it in an understandable way. This is what initially threw me with the OUTER JOIN problem.

regards
David Cameron





Similar Threads
Thread Thread Starter Forum Replies Last Post
Need Help with SQL syntax DeannaF829 VB Databases Basics 2 April 30th, 2007 12:12 PM
SQL-syntax grstad SQL Language 9 January 30th, 2007 02:40 PM
syntax of sum in SQL heghtera Access VBA 1 March 10th, 2006 03:17 AM
SQL Syntax yves SQL Server 2000 6 February 2nd, 2006 08:26 PM
SQL Syntax Cinderella Classic ASP Basics 3 July 21st, 2004 01:06 PM





Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright (c) 2020 John Wiley & Sons, Inc.