"almost" a running sum I guess

RobCarter · March 13th, 2007, 06:15 AM

Hi

I have to get a result set that basically counts the number of records from one column that corresponds with a value in another e.g.

idCount contact_no
------- ----------
1 15648
2 15648
1 15649
2 15649
3 15649
1 15650

etc

so for contact_no 15649 there are 3 different entries

I can't think how to go about getting it.

Can someone help please.

Rob

Rob Carter

Jeff Mason · March 13th, 2007, 06:38 AM

What distingushes one row with a given contact_no from another with the same contact_no value?

That is, how did you number your counts as you did, rather than, say, this way:

idCount contact_no
------- ----------
2 15648
1 15648
2 15649
3 15649
1 15649
1 15650

I ask because there must be some way to order these rows to get what you want...

Jeff Mason
Custom Apps, Inc.
[email protected]

RobCarter · March 13th, 2007, 06:50 AM

I missed out a column I guess.

idCount contact_no idNo start_date
------- ---------- ------- ----------
1 15648 d56789 20061212
2 15648 d56789 20061215
1 15649 d78945 20061212
2 15649 d78945 20061218
3 15649 d78945 20061226
1 15650 d12345 20061202

I can't post any actual data as it is healthcare related but this is as close as I can get. The analogy goes: patient d78945 had a hospital stay (15649) and while they did they had 3 ITU stays starting on those dates (there is an ITU stay ID in the data set too if that is more useful than dates (what isn't more useful than dates? :) ))

RobCarter · March 13th, 2007, 06:52 AM

As an addendum, the counts are what I want to derive from my dataset. They dont currently exist anywhere, I don't know if I was clear enough on my original request.

Rob Carter

RobCarter · March 13th, 2007, 10:16 AM

table:

stayID idNo start_date CC_localID
------ ------- ---------- ----------
15648 d56789 20061212 4567
15648 d56789 20061215 7854
15649 d78945 20061212 1246
15649 d78945 20061218 6589
15649 d78945 20061226 1235
15650 d12345 20061202 4687
15651 d78945 20070121 4258

structure (important fields)

stayID char Overall Hospital stay
idNo char Patient Identifier
StartDate char Date of ITU spell Start
CC_LocalID char ID of ITU Spell

I need to derive the idCount column shown in previous entries in this thread to show a running count of the number of ITU spells each patient has during their hospital stay. It is important that there is a row for each ITU spell in each patient's hospital stay (so no grouping by and coming up with 1 row per patient saying the total number of ITU spells). Also, if only to complicate matters - the final row in my sample data contains a new hospital stay ID for a patient that has already had a spell. As this is a new hospital stay the count of ITU spells per hospital stay would begin again at 1.

I hope this clarifies my requirements a little (or, more to the point, I hope it hasn't confused you utterly.)

Rob Carter

Jeff Mason · March 14th, 2007, 07:24 AM

The way to solve this is to partition your data into subsets, where each subset contains all the rows for a given idNO and contact_no combination whose start_date value is less than the start_date value for each given idNO and contact_no row in turn. You then count the rows in each subset.

We'll use a correlated subquery to do that:

Code:

SELECT (SELECT COUNT(*) + 1 FROM yourtable T2
           WHERE T2.idNo = T1.idNo
             AND T2.contact_no = T1.contact_no
             AND T2.start_date < T1.start_date) as idCount, contact_no, idNo, start_date
  FROM yourtable T1
 ORDER BY idNo, contact_no, idCount;

Jeff Mason
Custom Apps, Inc.
[email protected]

RobCarter · March 14th, 2007, 09:20 AM

Thanks for the answer you gave. It works really well until we get to patients where one start_date is the same as the other, then the count doesn't work. e.g if we have

stayID idNo start_date CC_localID
------ ------- ---------- ----------
15648 d56789 20061212 4567
15648 d56789 20061215 7854
15649 d78945 20061212 1246
15649 d78945 20061218 6589
15649 d78945 20061218 1235
15650 d12345 20061202 4687
15651 d78945 20070121 4258

The column becomes

ITUCount | | | |
WithinStay |stayID |idNo |start_date|CC_localID
-----------|-------|--------|----------|----------
1 | 15648 | d56789 | 20061212 | 4567
2 | 15648 | d56789 | 20061215 | 7854
1 | 15649 | d78945 | 20061212 | 1246
3 | 15649 | d78945 | 20061218 | 6589
3 | 15649 | d78945 | 20061218 | 1235
1 | 15650 | d12345 | 20061202 | 4687
1 | 15651 | d78945 | 20070121 | 4258

Are there any suggestions as to how I can overcome this?

Jeff Mason · March 14th, 2007, 10:05 AM

Well then, back to my original question.

You must have something that unambiguously distingushes one row from another and allows you to order them in some way. This ordering is what allows you to compute, in effect, a row number. Apparently the start_date does not do that.

The subquery uses the start_date to construct subsets of rows "less than" (an ordering) of the row currently being processed. You have a tie on start_date there, so, though the count is correct (there ARE that many rows less than the current row) if fails as a row number.

You'll need to add further qualification to the less than test in the subquery so as to guarantee that each subset contains unique, distingushable rows which you can count.

Jeff Mason
Custom Apps, Inc.
[email protected]

Peso · March 16th, 2007, 02:00 AM

Solved here
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=80508