"In" and "Not In"

niallhannon · January 14th, 2004, 10:48 AM

Hi,

Has anyone experienced the following behaviour in an sql statement?

I want to return records in TableA that exist in TableB based on checking a field as follows

select * from TableA where serial in (select serial from TableB)

TableA has 100 records and the result to the above query says that 45 of them exist in TableB. So that would tell me that 55 do not exist in TableB.

Yet when I run the following query no records are returned.
select * from TableA where serial not in (select serial from TableB).

This is on SQL 2000. I know an inner join is more efficient than a subquery and there are others ways using "exist" to do what I want but has anyone found similar things using "in" and "not in"?

Any ideas?

Thanks
Niall

Jeff Mason · January 14th, 2004, 11:08 AM

Do either of these tables have any rows with NULL as a value for 'serial'?

A NOT IN predicate will test all the rows in the subquery for inequality, and if any values are NULL, the test will fail, so the predicate will be FALSE, and no rows are returned.

And you are generally correct that a JOIN is faster than a subquery, but you may find that the optimizer will generate identical plans in many cases. For example, a correlated subquery is almost always indistingushable from a JOIN.

And you'd generally be better off using EXISTS instead on IN with a subquery, since EXISTS may be better able to make use of indexes, whereas IN almost always results in a table scan or the creation of a temporary representing the subquery, which will not have any indexes and so may be slower.

Jeff Mason
Custom Apps, Inc.
www.custom-apps.com

pgtips · January 14th, 2004, 11:10 AM

but do the 100 records in TableA each have a non-NULL value for 'serial'?

niallhannon · January 14th, 2004, 11:31 AM

Hi,

Well that was annoying, I had just typed out a long reply which got discarded due to me not typing in my username. Here goes again.

Thanks for replies. I meant to say that the serial fields do not have any nulls, they all have values.

TableA was created from a Select * into from OPENROWSET statement using an excel file as its source. The thing I have just noticed is that the "not in" works fine when comparing TableA to itself and its works fine for comparing other user tables to each other.

However it does not do what you'd expect when comparing the user tables with TableA. So, do you think there is something different with TableA that was created from the OPENROWSET statement that causes the "not in" to behave oddly. Something to do with collations perhaps? Is there something I can check?

I just can't figure this out.

Thanks
Niall

Jeff Mason · January 14th, 2004, 01:22 PM

The source of the data via OPENROWSET shouldn't have any effect.

Are there any duplicates for the 'serial' value in either table? - this might be throwing your count off...

Jeff Mason
Custom Apps, Inc.
www.custom-apps.com

niallhannon · January 14th, 2004, 01:31 PM

Hi,

No, there are no duplicates in TableA. There are some duplicates in TableB which is a much larger table but I dont think the duplicates in TablesB would make any difference?

I checked and both Tables have the same collation anyways. I also tested creating another table from the select * into statement using excel as a source but the "not in" worked fine. So its not consistent and perhaps its just one of them ones which has no explanation. Or might not be worth the effort to find the explanation!

Thanks
Niall