Wednesday, August 29, 2007

When a query with IN subquery may not return the expected rows

One of the most frequent misunderstandings with the IN clause is caused by the presence of NULL values on both sides of the comparison.

Let's build a simple test case:

`create table test1 (a number, d varchar2(30))/create table test2 (b number)/insert into test1 values(null, 'a is null ')/insert into test1 values(1, 'a = 1')/insert into test2 values(null)/insert into test2 values(1)/commit/select * from test1where a in (select b from test2);`
result:
`A D- -----1 a = 1`
If for some reason you're assuming that null=null should return TRUE, well, you're taking a wrong assumption for two simple reasons:

1. NULL = NULL always returns FALSE.
2. NULL must be dealt with using the IS operator, not the equal (=) sign.

If you imagine that the IN clause could be regarded as a series of equality comparisons, you can easily make out why it can fail if values on either side of the comparison are NULLs.

In case for some reason you want to force a = b to return TRUE even in case of NULLs, then you can try this sub-optimal technique:
`select * from test1where nvl(a,-1) in (select nvl(b,-1) from test2);A D- ---------a is null1 a = 1`
applying NVL to column a however causes the optimizer to rule out any index on table test1, which will probably result in a full table scan and the consequent poor performance of the query if table test1 is big.

These types of problems can be further complicated when you use n-tuples of values.

`create table test3 (a number, b number, d varchar2(30))/create table test4 (c number, d number)/insert into test3 values(null, null, 'a and b are null ')/insert into test3 values(1, null, 'a = 1 and b is null')/insert into test3 values(1, 1, 'a = 1 and b = 1')/insert into test4 values(null, null)/insert into test4 values(1, null)/insert into test4 values(1, 1)/commit/select * from test3where (a,b) in (select c,d from test4);`
Result:
`A B D- - ---------------1 1 a = 1 and b = 1`

Conclusion:
having NULLs in the columns you retrieve may lead to unexpected results when using the IN clause, so you need to know very well the data you are working on.

You may think that this sort of tongue-in-cheek advice, however when you're called on to explain why a query doesn't work as expected on a table populated by someone else in a foreign application and without much possibility of inspecting the data yourself, you need to be prepared for the worst. 