Thursday, January 07, 2010

Dispatcher process taking 99% of CPU on Oracle XE after an Apex request

Always check out the original article at http://www.oraclequirks.com for latest comments, fixes and updates.

If you are an enthusiastic user of Apex in combination with Oracle XE as i am, you may have encountered the following issue sometimes: suddenly a process called dispatcher (xe_dnnn) starts consuming 99% of CPU time without any apparent reason.
Unfortunately i failed to find a way to reproduce the problem systematically and, even in that case, Oracle XE is not supported, so there are no chances to get a patch from Oracle.

Although i said there is no apparent reason for this, i have the feeling this problem is triggered by some odd http request made by browsers like Safari and Google Chrome. I am saying this because i am quite sure i was using one of those browsers when this type of event occurred and it never occurred with Firefox.

Today i managed to find a sort of workaround that, at least, doesn't require to bounce the instance to get things back to normal.
I ignore if this problem affects the Windows version as XE is running on top of Ubuntu Linux in my case.

I am assuming that you have already realized that something odd is going on in the database server.
Looking at the running processes in the graphical system monitor of Ubuntu will just report some oracle process taking 99% of the CPU, but still there is no indication of which particular process is the culprit.
You must turn to the console to find out which one is actually consuming most of the CPU:

ps -eo "user,pid,ppid,pcpu,cmd" --sort pcpu | grep xe_


You can also run the following command to get an auto-refreshing snapshot of the situation:

top -u oracle

Now, assuming that the row with the highest CPU usage refers to the oracle process called xe_d000 and provided we have to live with this bug, what can we do to keep the ball rolling?

First of all, we (or the DBA) can kill the dispatcher process:
ALTER SYSTEM SHUTDOWN IMMEDIATE 'D000'; -- the number depends on the output of the ps command
Notice: if this command doesn't work, something that could happen if the process is unresponsive, on a linux/unix box you can the related o/s process.
KILL -9 pid
where pid is the oracle process id for D000.
I fear that on Windows you cannot kill a single thread as easily.

While there are no other active dispatcher processes available, all web clients will receive http 404 errors when attempting to load an Apex page.
Then we can issue the command to recreate at least one dispatcher process.
ALTER SYSTEM
SET DISPATCHERS =
'(PROTOCOL=TCP)(DISPATCHERS=1)(INDEX=0)';
The default number of dispatchers on Oracle XE is 1, but it can be increased if required, in order to support a higher number of concurrent connections.
You can find some information about shared server processes and dispatchers in the Administrator's Guide for Oracle 10R2, while waiting for Oracle to release the new express edition based on Oracle 11GR2, (hopefully soon?)

Updated March 9, 2010:
thanks to Jens who reported a missing parenthesis and provided a "roadmap" for restoring the service. See the comments section.

7 comments:

Anonymous said...

Hi Flavio,
thanks a lot for your post. This bug is really annoying. It totally freaks you out, when you see it the first time. Your workaround works well on our SuSE 10.3 system. I have a few comments though:

Sorry, you´ve missed a bracket ;-) ALTER SYSTEM SET DISPATCHERS = '(PROTOCOL=TCP)(DISPATCHERS=1)(INDEX=0)';

On our system, the process shutdown takes a few tries. We had to wait a few seconds and re-execute the statements several times while checking in top, ps.

You may guarantee no loss of service, if you always keep at least one dispatcher process running. On our system we are running 3 processes. So our sequence looks as follows:

1. shutdown D000
2. shutdown D001
3. SET DISPATCHERS = '... (DISPATCHERS=3)...' (this will leave any running process untouched)
4. shutdown D002
5. again, SET DISPATCHERS = '... (DISPATCHERS=3)...'

Obviously, this works also for one running process if you temporarily increase the number to 2

Jens

Christoph Rueprich said...

Excellent post! Your clear instructions were a life saver. I had exactly the same problem on a 11.2.0.3 database running Apex on the PL/SQL gateway. The system has very low usage. I'm quite curious what caused the dispatcher process to go crazy like that.
The alter system shutdown didn't work, so I did kill -9.
Thanks again,
Christoph

Byte64 said...

Hi Christoph,
I was hoping that in 11.2 that bug had been fixed (by the way, is that a full-fledged 11gR2 or is it XE?).
I have been monitoring my production instances for nearly 2 years now and I still can't the "smoking gun", sometimes they keep running for a week seamlessly, sometimes I have to kill the runaway process twice in a day and there is no clear indication of what may be causing this odd behavior. At some point I thought it might be sparked by some buffer getting full or a memory leak, such that if I restarted the database periodically I could minimize the occurrence, but a few weeks ago it happened soon after restarting the instance, so my fancy theory was broken.
Note that by enabling multiple shared servers like suggested in the comments you can minimize the impact of this bug, I enabled three and it never happened that the site was taken down, probably some user gets an error but if one reloads the page, the request should be handled by one of the remaining servers.

Thanks
Flavio

CsG said...

Thanks for the post.
Another way can be to disable the shared server processes totally.
In this case there is no dispatcher process at all, so there is no CPU consuming bug. I hope.
For example:
alter system set dispatchers = '' scope = both;
alter system set shared_servers = 0 scope = both;
and restart
The dedicated mode will work without any problems in case of the application connections.
But normally APEX wants to use shared server processes.
I have tried to insert
USE_DEDICATED_SERVER = ON
into the sqlnet.ora to solve this problem, but APEX does not like it.
APEX can be switched to normal TNS connections somehow, it is not working actually at our environment.
I am searching for the solution.
But if you are happy without APEX this way can be also a good one.

Byte64 said...

You can't switch off shared servers if you are relying on EPG, you need to set up either the Apex Listener or Oracle HTTP server (the latter doesn't work with XE as far as I know).
I must say that since I moved to Oracle XE11g, I didn't encounter this problem anymore, so thumbs up for Oracle XE11g + EPG + Apache as proxy.

Anonymous said...

This issue is still present on Ubuntu (16.10): Oracle XE11g + EPG + Apache as proxy.
It seems oracle did patch this - but only for windows.

Byte64 said...

I don't remember if it happened once in the last 3 years, it's been a long time since the last occurrence, my instances are running on top of Amazon Linux therefore I'm asking myself if the problem lies somewhere in Ubuntu, which was the same Linux flavor my Oracle XE databases were running on.

yes you can!

Two great ways to help us out with a minimal effort. Click on the Google Plus +1 button above or...
We appreciate your support!

latest articles