Thursday, December 10, 2009

An (IN)famous case of runaway query aka there is more than one way to do the same thing

Always check out the original article at for latest comments, fixes and updates.

I had to find out some records whose primary key was not referenced in a secondary table and for some reason i mindlessly executed the following (typical) query:
select *
from qq_messages
where messageid not in
select messageid
from qq_message_recipients
Unfortunately each table contained several hundreds of thousands of rows, so I had to stop the query after wasting 5 minutes with the CPU constantly at 98%.

Who knows why we always come up with the worst queries first.

In these cases you can bet there is a better way of designing your SQL, especially after looking at the astronomical cost reported by the optimizer.

Indeed i rewrote the statement using a NOT EXISTS clause and i immediately got a better looking plan:

select *
from qq_messages a
where not exists
select 1
from qq_message_recipients b
where b.messageid = a.messageid

This query returned 7 rows in 0.3 seconds.

But is there any other way to get this result?
Oh yes.
If you like set algebra, then here is a query that does exactly the same job in a different fashion in about 1.6 seconds, that is 5 times slower than the best one, but considerably faster than the worst scenario:
select * from qq_messages
select * from qq_messages a
where a.messageid in
select b.messageid
from qq_message_recipients b

Again, there can be some improvement by replacing IN with EXISTS:
select * from qq_messages
select * from qq_messages a
where exists (select 1 from qq_message_recipients b
where b.messageid = a.messageid);

This query took 1.2 seconds. Notice the cost of 13620 in contrast with 13651. Very similar figures for a query that runs 25% faster.

The scenario above should turn out to be quite useful for those who are not (yet) mastering Oracle SQL, because it teaches at least three important lessons:
  • the NOT IN clause is not the solution for all situations, no matter if it is the most natural;
  • the explain plan gives you a quick qualitative estimation of how good your SQL is;
  • there is usually an alternate solution for the same problem, so, unless you are really satisfied with your first shot, you'd better off looking at alternatives.

While the number returned as the total cost is not to be taken as a forecast of the time required to execute the query, it can certainly be considered as an indicator of the resources required to carry out the statement. This however doesn't mean that two queries with similar costs will take the same time to execute, as we have seen.
Cost misinterpretation is an old standing issue in the community of developers.

It's easy to think that a low number means fast and a high number means slow, whatever fast and slow mean in your specific case, so the temptation of skipping a real test basing purely on the cost indication of the queries is always around the corner. The problem is also in that a database is not a static thing, so the total number of records in each table may vary over time, even by large numbers, and the execution plan will change as well, if the statistics are up to date, so don't forget to think about how/when it's the best time to refresh the stats, that is as important as developing "good" procedures.

Even after carefully evaluating different alternatives, your job isn't finished yet. A further step, in case results are difficult to interpret may be to run TKPROF and find out which SQL is the most resource intensive. And after that you have to try it out in the real world, with concurrent users, if that applies, and see if your program survives user acceptance test. If not, it could mean that you found an optimal solution for a suboptimal architecture, which means redesigning parts of your database objects.

But that is definetely another story.


tylermuth said...

Curious how this one stacks up:

with msg_ids as(select messageid from qq_messages
select messageid from qq_message_recipients)
select q.*
from qq_messages q, msg_ids m
where m.messageid = q.messageid

Byte64 said...

Hello Tyler,
unfortunately the table now contains more data than before, a new object column which is forcing me to change all the queries i made yesterday for the article and it doesn't contain any more spare rows, however here are the results:

query #1: 36471350
query #2: 2266
query #3: 883
query #4: 2266

your query: 2292

Funnily enough, with the query selecting only the messageid column, the best seems to be now #3 (MINUS with the IN clause).

This is an excellent example of why one is supposed to re-run all the performance tests he made whenever a query has been modified, something that the average customer doesn't seem to understand when asking for minor application changes. :-)


yes you can!

Two great ways to help us out with a minimal effort. Click on the Google Plus +1 button above or...
We appreciate your support!

latest articles