Yes, this is a very trimmed down version of the real problem. I'm
rewritting a program that updates inventory tables (among others) with data
from the orders tables. The original program was a large plate of
spaghetti that would merrily chug along updating inventory one order at a
time until finished (do until rs.eof: CallABunchOfSubRoutines: rs.update:
rs.movenext: loop). I am hoping to simplify the program by using a more
set-based approach.
The actual tables involved have complex keys. I will be using a more
complex WHERE clause, and I do have indexes on the appropriate columns.
I'm going to do some more testing but I am probablly not going to worry
about this too much as the application will *probably never* update more
than 200 rows at any given time. And given that the old program was so
poorly designed, either method will represent a HUGE increase in
performance to the end user. Just wanted to keep scalablity in mind:-)
Thanks again for your input.
-Greg
-----Original Message-----
From: Jeff Mason [SMTP:jeffm.ma.ultranet@r...]
Sent: Thursday, August 22, 2002 1:36 PM
To: sql language
Subject: [sql_language] RE: Update with aggregate
It looks like your analysis is accurate; whether its conclusions are valid
or not is another question ;-)
You should probably inspect the query execution plans for both approaches.
You have given no information about the exact definition of either table.
Such information as column constraints and indexes are vital to determining
the efficiency of a query plan. I assume that your example is a stripped
down version of your real tables; the presence or absence of WHERE clauses
on other columns can have significant input into the decision about what
indexes go where.
You should probably make sure that indexes exist on the key and join
columns. The question of whether an index should be created on a
particular
column is not a simple one; many factors, such as the frequency and
distribution of updates, deletes, and inserts versus queries come into
play.
Maintaining an index involves overhead, so you may optimize one operation
at
the expense of another. Whether this is a good or a bad thing depends upon
your situation.
You cannot control any aspect of the table creation with SELECT ... INTO.
It's the primary reason I never use the thing.
--
Jeff Mason Custom Apps, Inc.
Jeff@c...
-----Original Message-----
From: Greg Normandin [mailto:gnormandin@s...]
Sent: Thursday, August 22, 2002 1:42 PM
To: sql language
Subject: [sql_language] RE: Update with aggregate
Thank you Jeff, that crossed my mind after I posted the question, Now next
the question is efficiancy. It appears, when I test this on small amounts
of data (<200 rows updated), that the "subquery option" greatly
out-performs the "temp table option", however when I test against large
datasets (>150,000 rows updated) the "temp table option" out-performs:
SMALL Datasets (<200 rows updated):
SUBQUERY:
SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 31 ms.
TEMP TABLE:
Creating the temp table:
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 22 ms.
Executing the update using the created temp table:
SQL Server Execution Times:
CPU time = 31 ms, elapsed time = 43 ms.
LARGE Datasets (>150,000 rows updated):
SUBQUERY:
SQL Server Execution Times:
CPU time = 27671 ms, elapsed time = 46168 ms.
TEMP TABLE:
Creating the temp table:
SQL Server Execution Times:
CPU time = 2563 ms, elapsed time = 5031 ms.
Executing the update using the created temp table:
SQL Server Execution Times:
CPU time = 6279 ms, elapsed time = 6704 ms.
So my questions: Is this anylisis accurate? If so, what other factors
will effect the results, such as the complexity of the "joins", number of
fields involved in the update, etc.? And, finally, where is the mystical
point where the temp table option becomes more efficiant that the subquery
option? I will be doing some more detailed tests, but if anyone has any
experience in this your comments are appreciated.
One more tought, when creating a temp table with the "SELECT ... INTO"
statement is ther a way to define an index?
-Greg
-----Original Message-----
From: Jeff Mason [SMTP:jeffm.ma.ultranet@r...]
Sent: Thursday, August 22, 2002 11:18 AM
To: sql language
Subject: [sql_language] RE: Update with aggregate
"This works, however it seems inefficiant as it will UPDATE *every* row in
TestTable1."
You just gave yourself a hint. You are correct that your update will
update
*every row*, so instead, why not restrict the rows to be updated:
UPDATE H SET QtyOh = QtyOh - COALESCE((SELECT SUM(L.OrderQty) FROM
TestTable2 L WHERE L.fk1 = H.K1 AND L.Post = -1),0)
FROM TestTable1 H
WHERE H.k1 IN (SELECT distinct(fk1) FROM TestTable2);
--
Jeff Mason Custom Apps, Inc.
Jeff@c...
-----Original Message-----
From: Greg Normandin [mailto:gnormandin@s...]
Sent: Thursday, August 22, 2002 11:16 AM
To: sql language
Subject: [sql_language] Update with aggregate
Consider these tables:
TestTable1
k1 QtyOh
-- ----
1 1000
2 1200
3 500
4 100
TestTable2
k1 fk1 Post OrderQty
-- --- ---- ----
1 1 -1 100
2 1 -1 50
3 2 -1 500
4 2 -1 100
5 4 0 100
I need to:
UPDATE H SET QtyOh = QtyOh - SUM(L.OrderQty)
FROM TestTable1 H INNER JOIN TestTable2 L ON H.k1 = L.fk1
WHERE L.Post = -1
That doesn't work, of course, because you can't have an aggregate in the
set
clause.
so I tried:
UPDATE H SET QtyOh = QtyOh - COALESCE((SELECT SUM(L.OrderQty) FROM
TestTable2 L WHERE L.fk1 = H.K1 AND L.Post = -1),0)
FROM TestTable1 H
This works, however it seems inefficiant as it will UPDATE *every* row in
TestTable1.
The only other way I can think of is to use a temp table:
SELECT fk1, SUM(OrderQty) as sOrderQty INTO #TemTestTable2
FROM TestTable2
WHERE Post = -1
GROUP BY fk1
UPDATE H SET QtyOh = QtyOh - L.sOrderQty
FROM TestTable1 H INNER JOIN #TempTestTable2 L ON H.k1 = L.fk1
DROP TABLE #TemTestTable2
Does anyone know of another way to do this or am I stuck with using a temp
table?
-Greg