Sequential Numbers

Jeff Moden · October 31st, 2006, 10:29 PM

And, please... no more talk of (yeeech) Cursors (haaaaacckk!! Pattoooooi!) :D

--Jeff Moden

Jeff Mason · November 1st, 2006, 07:37 AM

OK. I tried it again and indeed the update works. I've been running SQL Server 2005 lately, and I'm still getting used to the new SQL Server Management console and I guess I didn't correctly refresh the table display after I ran my test query.

The column with NULL values does update. My bad. Sorry.

However, I believe that the premise is still flawed. The technique assumes that the rows will be updated in a certain order via the clustered index, and I do not believe you can depend on that. It may work in this instance, but there is no guarantee it will work in all cases, for all tables whatever their size and on all servers regardless of their load. If the table is very large, the query processor may very well process pages out of "order", if it happens that some pages have been cached by other query activity.

This is similar to the attempt by some to use "aggregate string concatenation", the problem where you want to return column values from rows concatenated together, as "Name1, Name2, Name3" where the individual names are in different rows, as:

Code:

SELECT @Names = @Names + ', ' + NameColumn
  FROM yourtable

This also appears to "work", but in fact is officially undocumented behavior and thus undefined because there can be no guarantee of the order of the rows processed.

I believe the update technique is a similar situation.

But what do I know. I've been wrong before. See above.

And what's wrong with cursors? :) In their place, they are a valuable tool.

It's just that few know their place. And there ain't many places, either.

Jeff Mason
Custom Apps, Inc.
[email protected]

Jeff Moden · November 1st, 2006, 10:10 AM

Heh, heh... yeah, I still occasionally forget the primary rule of using EM in 2000 myself... refresh, refresh, then refresh, again.

I believe if you do a DBCC ReIndex on the table just before you run such a query, that will reorg the pages because it is a clustered index. Haven't had a problem, yet (I use a similar method for grouped-running totals), on larger tables, but like you said, who knows? I'll have to run a larger test (say, 20 million rows) and see what happens. Thanks for the advise.

I will say that I agree... when in doubt of the result because of SQL "anomolies", find a different way. A cursor is hardly ever that way. In the case of numbering a table as we've done in this thread, an IDENTTY column would be the way to go as we both know.

What's wrong with cursors? Other than being much slower and suffering a fair bit more locking contention than set based, I'd have to say that most people give up on the search for a set based solution because they know they can fall back on a cursor. True, firehose cursors don't produce that type of locking but then you've still failed to produce a higher speed set based solution. The only time I'd even consider using a cursor is when I have to use a 2 or 3rd party solution (like, maybe, sp_SpaceUsed) in the form of a call to a stored procedure. But, even for sp_SpaceUsed, I'll copy the code and change it to a set based solution before I'll even consider using a cursor. I know of no cursor solution that has ever beat the equivelent set based solution.

Most folks also abuse functions the same way to overcome having to use a cursor. Because it's convenient and easy, they write something like a CSV splitter into a function and think they did a great job of avoiding the cursor. What they don't realize is that they haven't avoided processing by RBAR (pronounced "ree-bar" and is a "Modenism" for "Row By Agaonizing Row :D ). They don't understand that the same thing can be done in a set based fashion that will separate the whole CSV column for the entire table (with or without criteria), along with the Primary Key, into another table (temp or otherwise) and then join to that table. For some reason, they think the function will be faster than creating an entire new table.

So, do I think all cursors are bad? For the most part, yes. Do I think all functions are bad... absolutely not but it depends. They have incredible utility for standardizing complex algorithms for a development team but they shouldn't be used to overcome a lack of knowledge of the set based principle that is the basis of RDBMS's in general.

But, like you've said, what do I know :) ... all I know is that every 8 to 12 hour run that I've converted to a 10 to 30 minute run has been done without the use of cursors and with the occasional exception due to 3rd party garbage code, usually contains no loops and has very few, if any, UDF's involved.

There are a lot of folks that say speed is not important... accuracy is. While I agree that accuracy is paramount in all code, speed is also important... it usually means the code is more well written than it's slower cousin but even that isn't THAT important in the face of accuracy. What IS important about speed is that it usually means you've used the resources available in a more effecient manner and you can stuff more jobs into the same timeframe (usually overnight) AND, if something goes wrong and you need to do a rerun, you don't have to wait for the rerun to occur over an 8 to 12 hour period in a crisis situation. One more thing... anyone who says speed isn't important, hasn't had to stay up the night baby-sitting such code when they could be doing other more fun things like sleeping, hugging the wife, playing with the kids, etc.

--Jeff Moden

ricespn · November 1st, 2006, 11:04 AM

Thanks guy's...

I really apreciate all the help, I'm a novice at this SQL stuff but I just learn a lot from all your comments.:)

=======================
Strange and crazy, but everything is possible

Jeff Mason · November 1st, 2006, 12:08 PM

Quote:

quote:Originally posted by Jeff Moden
[snip]
I believe if you do a DBCC ReIndex on the table just before you run such a query, that will reorg the pages because it is a clustered index. Haven't had a problem, yet (I use a similar method for grouped-running totals), on larger tables, but like you said, who knows? I'll have to run a larger test (say, 20 million rows) and see what happens. Thanks for the advise.

I don't doubt that it will "work", in the sense that it will probably virtually always give you the results you think you ought to get.

That does not change the fact that if you depend on the order of any query where you do not explicitly state what the order should be, you are living dangerously.

In fact, the query processor is free to process rows in any order it wants, and it is free to pick any index it thinks will help it, clustered or not. It is free to use pages it finds already in memory if it thinks that will help - it will process pages already in the cache before it has to go out to the disk. Thus, you may find your query returning rows in a different order depending on what else is going on with the system, how up-to-date the table and index statistics are, and the phases of the moon.

And, you are spot-on with your comments about cursors and (especially) functions. I agree completely.

I can't recall the last time I used a cursor in production code. I admit to using them when doing database conversions or test setup because of application upgrades, etc. Sometimes they're the easiest way to deal with a problem in run-once situations.

Enjoyed the discussion ...

Jeff Mason
Custom Apps, Inc.
[email protected]