Latch, mutex and beyond

March 19, 2011

11.2.0.2 is the right patchset for mutexes

Filed under: 11.2,Mutex,Patches — andreynikolaev @ 10:05 am

Many people asked me about the second part of my blog title – the mutex. This is the first post about it. Mutexes is another Oracle spinlock, which was appeared in version 10.2.0.2. Despite being known since 2005, Oracle mutex internals is still Terra incognita.

This post is inspired by several recent escalations due to mutex contention. It occurs that 11.2.0.2 patchset contains extraordinary number of mutex related changes. Some enhancements like 10411618 exist only for 11.2.0.2. The following patches even changed the mutex architecture:

  • Patch 9499302 IMPROVEMENTS TO KGX MUTEXES (USING VOLATILES & END WAIT BEFORE MUTEX GET). With this patch Oracle mutex become TTS spinlock. Mutex contention no longer saturates the bus and “poisons” overall system performance
  • Patches 9239863 and 9282521: “LIBRARY CACHE: MUTEX X” FOR OBJECTS HIGHLY CONTENDED FOR. These enhancements introduce new possibilities to diagnose and tune the mutex contention. Now contention can be divided between multiple copies of the objects in the library cache. Patches are tuned by new parameters “_kgl_hot_object_copies” and “_kgl_debug”
  • Patch 10411618 ADD DIFFERENT WAIT SCHEMES FOR MUTEX WAITS. This patch completely changed the mutex waits and introduced exponential backoff. With this enhancement some of the mutex waits work like Oracle 8i latch waits. The patch tunables are “_mutex_spin_count”, “_mutex_wait_time” and “_mutex_wait_scheme” parameters.
  • Patch 6904068 High CPU usage when there are “cursor: pin S” waits. It introduced “_first_spare_parameter” to tune the duration of “cursor: pin S” waits.

Other most notable mutex related 11.2.0.2 fixes are:

  • Patch 10145558 Selects on library cache V$/X$ views cause “library cache: mutex X” waits. This particular patch decreases mutex contention induced by AWR snapshots capture.
  • Patch 8793492 Mutex Waits with Resource Manager. Patch resolved mutex contention spikes due to CPU caging
  • Patch 7441165 Prevent preemption while holding a mutex (Solaris)
  • Patch 9881963 Linux: Improve chance to release CPU during mutex get spins

Complete list can be found in new interesting MOS note WAITEVENT: “library cache: mutex X” [ID 727400.1].

It seems that 11.2.0.2 (plus latest PSU) is the number one recommendation for systems having mutex contention. Oracle provides us with new tools to diagnose and resolve mutex contention. In the next posts I will discuss how the mutex works and why these patches are so crucial for mutexes.

Addition as of 25-MAY-2011:
On 20-MAY-2011 My Oracle Support issued the recommendation to install
Patch 12431716 Mutex waits may cause higher CPU usage in 11.2.0.2.2 PSU / GI PSU
for all 11.2.0.2.2 installations. For previous releases Oracle recommends to reinstall the interim Patch:10411618 if you have installed it.

16 Comments »

  1. A great article!

    As you said : 11.2.0.2 contains extraordinary number of mutex related changes.

    I don’t think that this patchset release is a stable one: too many fixes included, too many new things.

    Probably in 11.2.0.3 or 11.2.0.4 things will go better and they will have time to fix this new feature.

    Comment by AlexG — March 25, 2011 @ 8:56 am | Reply

    • Thank you.
      In fact many DBAs share your concerns. My post describes the current state as of March 2011. The 11.2.0.2 patchset is tested for half year. It has the first PSU patch and we are waiting for the April PSU. As a latch patchset, it has the highest level of Oracle Support attention. These looks pretty stable.

      Comment by andreynikolaev — March 25, 2011 @ 9:45 am | Reply

  2. […] summary: Oracle 11.2.0.2 is the right patchset for mutexes. In this post I discussed how we can use new 11.2.0.2 features to “divide and conquer” […]

    Pingback by Divide and conquer the “true” mutex contention « Latch, mutex and beyond — May 1, 2011 @ 7:17 pm | Reply

  3. Hi Andrey,

    I have applied the PSU Patch 12311357 – 11.2.0.2.2 GI Patch Set Update (Includes Database PSU 11.2.0.2.2).
    I have this entry in the registry$history table:
    SQL> select VERSION, COMMENTS from registry$history;

    VERSION COMMENTS
    ——————————————————————————–
    11.2.0.2 Patchset 11.2.0.2.0

    11.2.0.2 PSU 11.2.0.2.2

    SQL>

    After that, I applied the mutex patch 12431716 on only grid infrastructure home, is it OK to do it ONLY on grid infrastructure home OR I have to apply it also to the Oracle database homes ?

    Thank you

    Comment by Wissem — June 9, 2011 @ 11:02 am | Reply

    • According to Note 1321817.1 you should apply patch:12431716 to Database homes also

      Comment by andreynikolaev — June 9, 2011 @ 1:51 pm | Reply

  4. Thank you

    Comment by Wissem — June 9, 2011 @ 2:37 pm | Reply

  5. Great post. With regard to “These looks pretty stable”….I’m a touch more cynical.

    (On AIX), we applied 12311357, only to be told to apply 10370797 which seemed a backward step to us. And sure enough, attempting to apply it reported conflicts with 12311357. When we informed Support of this, they then provided *two more* versions of 10370797, the final one, when applied, says that it is a superset of 12311357.

    (On AIX at least), I’m finding 11.2.0.2 very good in terms of db function, and pretty flakey in terms of CRS. There’s even a metalink note which has a long list of problems with running 11.1 db’s under 11.2 crs, which means if you’re going to 11.2 crs, you are pretty much committed to upgrade the databases very quickly thereafter.

    Comment by Connor McDonald — July 19, 2011 @ 12:52 am | Reply

    • Thank you for interesting comment!
      Completely agree that 11.2 Grid Infrastructure still have many problems/bugs.
      It was almost completely rewritten since 11.1.

      Your comment attracted attention to interesting Patch 10370797.
      This 1GB size patch fixed a number of bugs.
      I think the patch should be evaluated for any 11.2.0.2.2 GI running on AIX

      Comment by andreynikolaev — July 19, 2011 @ 7:35 pm | Reply

    • Hi Connor,

      Have you had issues with 11.2.0.2 databases against 11.2.0.2 GI?

      Thanks,

      Steve

      Comment by appcrawler — August 4, 2011 @ 2:07 pm | Reply

      • Replying to myself, we just applied 11.2.0.2 PSU 2 with 12431716 to a four node SLES 10 64 bit RAC, and the mutexes are giving us a lot of heartache.

        Comment by appcrawler — September 20, 2011 @ 1:07 am

      • Hello!
        Interesting.
        Is this “library cache mutex X” contention? Or something else?
        Do you have any diagnostics to discriminate between

        1. Large version counts. 11.2.0.2.2 have several bugs that results in thousands of cursor versions.
        2. Hot objects
        3. May be your CPUs starve?

        Comment by andreynikolaev — September 20, 2011 @ 7:54 am

  6. thanks for the article. Andrey.

    I’ve a question here. I run a 4Node Oracle 11.2.0.2 RAC on (RHEL 5.5 boxes) and i’m trying a CPU saturation test with 1 node up (rest 3 nodes are down). This is observe our product’s performance from 1 vs 2 vs 3 vs 4 Node (thereby finding the RAC overhead as well).

    I see that when I saturate my CPU to 100% – my top wait event is “librady cache mutex x”.
    Seeing your last comment above, can this be mainly due to the CPU starvation? I see that my run queue length from vmstat is also very high.
    I don’t see any performance type issues in my product – but wanted to confirm this wait event..

    Thanks,
    Aswin

    Comment by Aswin — November 8, 2011 @ 11:19 am | Reply

    • Hello!
      Yes, if “librady cache mutex x” pops up only when you saturate the CPU, this is likely the consequence of CPU starvation.

      However, other reasons may be “true contention” for hot objects, excessive cursor versions or some Oracle/OS bug.
      The diagnostics depend on platform and Oracle version and patchlevel.

      Comment by andreynikolaev — November 18, 2011 @ 6:42 am | Reply

  7. […] It seems that this scheme was accidentally enabled by default in original PSU 11.2.0.2.2. Almost immediately Oracle changed the default wait scheme to 2 in Patch 12431716 Mutex waits may cause higher CPU usage in 11.2.0.2.2 PSU / GI PSU. […]

    Pingback by Mutex waits. Part III. Contemporary Oracle wait schemes diversity. « Latch, mutex and beyond — July 30, 2012 @ 12:22 pm | Reply

  8. 11.2.0.2 was the most unstable patchset in recent memory. We had to get to 11.2.0.3 to resolve most of our issues.

    Comment by rajesh — January 27, 2014 @ 7:40 pm | Reply

    • Thank you for commenting.
      This post was written almost 3 years ago when 11.2.0.2 was at the edge of technology.
      11.2.0.2 was the crucial patchset in the history of Oracle mutexes.
      It had solved most of problems with mutexes and introduced contemporary mutex wait schemes.

      Comment by andreynikolaev — January 29, 2014 @ 9:14 am | Reply


RSS feed for comments on this post. TrackBack URI

Leave a reply to andreynikolaev Cancel reply

Create a free website or blog at WordPress.com.