23 is the new 42

A bizarre one this week....

We upgraded one of our criticial production apps from 11.2.0.3 to 12.1.0.2 (+ latest PSU) and everything went well during the upgrade and seemingly afterwards too, that was until we had reports from the business that some data transfers from another system into this one were often being delayed.

After a bit of investigation we found the data was being pulled over by a scheduler job that was running very frequently - basically it ran then resubmitted itself almost straight away - almost like being in an infinite loop.

Anyway it seemed that sometimes this job was not starting when it should have done (i.e. straight away) and this was the issue.

My first thought was jobq processes is too small and the slaves are all taken up by other tasks - but this wasn't the case - it was set to 10 and there were not that many jobs defined. Perhaps there is something i the alert log to help us?

Sadly not...

So what to do?

Let's first see if there is any pattern to when this is happening to see if that gives us a clue.....

A quick query of DBA_SCHEDULER_JOB_RUN_DETAILS with some very basic date maths shows us this

Now the upgrade was done about 11:30 on the 23rd and we can see the problem started happening after that but there is seemingly no pattern to when it happens - but there is a pattern of how long the delay is....

It seems to randomly pause for 23 minutes before then kicking in - this smells like a bug (an unusual one perhaps) but it does seem like it.

So i type the search into metalink not really expecting to find anything easily but i get this hit

Thats probably the best match for any bug search i ever had...... :-)

So i download the patch for 12.1.0.2 (getting slightly confused with the one to download) and copy it over to out test system to just check it applies OK.

I shut down the problem db and then try to apply the patch and get this

Verifying environment and performing prerequisite checks...
Prerequisite check "CheckActiveFilesAndExecutables" failed.
The details are:

Following executables are active :
/oracle/12.1.0.2.160119/bin/oracle
UtilSession failed: Prerequisite check "CheckActiveFilesAndExecutables" failed.

Which reveals that there is another database running out of this home - not sure if this is a new check by OPatch or if i'm just out of practice but i don't remember seeing it do that before.

The patch applied fine after shutting down the other database - now we just need to do some basic checking before rolling out to live where i'm 99% sure this will fix the issue.

In the short term until we get the patch applied we have switched to an old style DBMS_JOB which we are assuming does not have this problem.

So it seems 42 has a rival for being the answer to everything......

#cloud blog

Search This Blog

23 is the new 42

Comments

Post a Comment