Oracle KGX mutexes appeared more than 7 years ago. However, mutex waits are still obscure. Oracle Documentation provided only brief description of mutex wait events without any information about wait durations and timeouts.
Look at the following timeline:
- In 2005 mutexes were introduced with Oracle 10.2.0.2 patchset. Documentation states that “cursor: pin S” wait event should rarely be seen.”
- In June 2008, Oracle released Patch 6904068: High CPU usage when there are “cursor: pin S” waits. Patch used “_first_spare_parameter” to tune the waits duration.
- In December 2010, Oracle created another Patch 10411618: Add different wait schemes for mutex waits for 11.2. This patch completely changed the mutex waits and introduced exponential backoff. With this enhancement, mutex waits work like Oracle 8i latch waits. The patch tunables are “_mutex_spin_count”, “_mutex_wait_time” and “_mutex_wait_scheme”. This patch was later was included in 184.108.40.206.2 PSU
- In May 2011, new MOS note Waitevent: “cursor: pin S” Reference Note [ID 1310764.1] shed some light on “cursor: pin S” waits. I will refer to this note during the post.
- On 14 May 2011 Oracle released new version of Patch 10411618 and recommended to reapply the patch. Note 10411618.8 now states that Patch 6904068 is superseded by Patch 10411618
- On 17 May 2011 Oracle urgently recommended to apply Patch: 12431716 Unexpected Change in Mutex Wait Behavior on top of the 220.127.116.11.2 PSU.
The above seems arcane. Why wait could cause CPU consumption? What is the meaning of mutex waits? What all these patches did? How we should interpret mutex waits? How we should tune them?
This series of posts will investigate these questions. The first post will explore the “cursor: pin S” waits in most commonly used now Oracle 10g Release 2 and 11g Release 1.
I. The Experimental Environment
I would like to explore the “cursor: pin S” waits in the way similar to my investigation of latch waits. In previous posts, I acquired latches artificially using oradebug call kslgetl. Unfortunately, KGX mutex is more complex to acquire then latch. I encountered two major difficulties:
We do not know the mutex address.
“Cursor Pin” mutex is dynamic structure created within library cache object. The only place where Oracle externalizes the mutex address is the x$mutex_sleep_history fixed table. We need to have contention for mutex to obtain its address. Fortunately, I already have corresponding testcase.
The testcase uses ‘select 1 from dual where 1=2′ SQL operator with hash value 3222383532. All my experiments in this post will be done using Oracle for SPARC 64bit platform. At first I will experiment with fresh 10.2.0.4:
sqlplus /nolog @library_cache_mutex_contention 2 ... SQL> SELECT MUTEX_ADDR,mutex_type,MAX (GETS),MAX (SLEEPS) FROM x$mutex_sleep_history WHERE MUTEX_IDENTIFIER = 3222383532 AND MUTEX_TYPE='Cursor Pin' GROUP BY mutex_addr, mutex_type MUTEX_ADDRESS MUTEX_TYPE MAX(GETS) MAX(SLEEPS) ---------------- -------------------------------- ---------- ----------- 00000003A9371338 Cursor Pin 3988556 4253
Now we have the mutex address. The another problem was:
There is no single Oracle routine to acquire mutex.
According to note 1310764.1:
“A session waits for “cursor: pin S” when it wants a specific mutex in S (share) mode on a specific cursor and there is no concurrent X holder but it could not acquire that mutex immediately. … The reason for the wait is that in order to acquire the mutex in S mode (or release it) the session has to increment (or decrement) the mutex reference count and this requires an exclusive atomic update to the mutex structure itself… ”
In other words, session waits for “cursor: pin S” event when the mutex is in some transient state neither X, nor S. Systemstate dumps and DTrace also confirm that, unlike the latch, busy mutex can be held in three different modes:
- “Shared” (“SHRD” in systemstate dumps) mode allows mutex to be held several sessions simultaneously. It allows read (execute) access to structure protected by mutex. In shared mode the lower 4(2 on 32bit) bytes of mutex value represent the number of sessions holding the mutex. Upper bytes are all zero.
- “eXclusive” (“EXCL”) mode is incompatible with all other modes. Only one session can hold the mutex in exclusive mode. It allows session exclusively access the structure protected by mutex. In X mode upper bytes of mutex value are equal to holder SID. Lower bytes are zero
- “Examine” (“SHRD_EXAM” in dumps) mode indicates that mutex or its protected structure is in transition. In E mode upper bytes of mutex value are equal to holder SID. Lower bytes represent the number of sessions simultaneously holding the mutex in S mode. Session can acquire mutex in E mode or upgrade it to E mode even if other sessions are holding mutex in S mode. No other session can change mutex at that time.
Session that acquires mutex in S mode waits on “cursor: pin S” event when another session is holding mutex in E mode.
Oracle session changes the mutex state through another dynamic structure called Atomic Operation Log (AOL). The AOL is an analog of latch recovery area. To operate on mutex, session first creates AOL, fills it with mutex data, and calls one of mutex acquisition routines. I will discuss mutex modes and routines in one of my next posts.
It is clear that artificial mutex acquisition is too complex to do using oradebug call. In future, I will use DTrace “destructive actions” for this purpose.
However, in this post, I will only simulate busy mutex in platform independent way using oradebug poke. This sqlplus command allows me to change any location in Oracle SGA. To simulate mutex held in E mode I will change the mutex 8 bit value to 0×100000001 (upper 8 bytes are 1 and the lower bytes are 1):
SQL>oradebug setmypid SQL>oradebug poke 8 0x100000001 BEFORE: [3A9371338, 3A9371340) = 00000000 00000000 AFTER: [3A9371338, 3A9371340) = 00000001 00000001
This look exactly like session with SID 1 is holding the mutex in E mode. At this moment any concurrent ‘select 1 from dual where 1=2? hangs waiting for “cursor: pin S” wait:
SQL> select 1 from dual where 1=2; ...
Oracle indeed thinks that wait is "blocked" by session "1":
$sqlplus "/ as sysdba" @mutex_waits SID SQL_ID EVENT IDN BLKS RFC LOC MUTEX_OBJECT ----- ------------- ------------------- ---------- ----- --- --- --------------- 159 47z9gc7013axc cursor: pin S 3222383532 1 1 5 select 1 from dual
To "release" the mutex, I will change its value back to zero. I wrote script mutex_e_mode.sql that "holds" mutex in E mode for 50 seconds.
I combined all previous SQL commands into single script cursor_pin_s_waits.sql. I will use it here and in the next posts to investigate "cursor: pin S" waits in different Oracle versions.
II. "Cursor: Pin S" wait event in Oracle 10.2 - 11.1
When the ‘select 1 from dual where 1=2' hangs, we caged the beast that had never been seen before. This is the session statically waiting for “cursor: pin S” event during 49 (50-1) seconds. Now we can explore this phenomenon. Surprisingly all in this definition occurred to be incorrect:
- Did we observe the static wait? Obviously no. One of my CPUs gets crazy. Its utilization jumps to 100%.
mpstat 1 ... CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl ... 16 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 17 0 0 55 180 138 134 48 16 0 0 31969 89 11 0 0 18 0 0 0 2 0 2 0 1 3 0 1 0 0 0 100 ...
This is the busy wait. The waiting process consume ones of the CPUs completely. Frequent syscalls cause high sys context CPU time consumption.
- Was this the single wait? No. Look at corresponding AWR report:
Top 5 Timed Foreground Events
Event Waits Time(s) Avg wait (ms) % Total Call Time Wait Class CPU time 59 93.5 control file parallel write 384 11 28 17.0 System I/O db file parallel write 436 6 13 9.0 System I/O log file parallel write 161 4 24 6.2 System I/O cursor: pin S 2,865,013 3 0 4.7 Other
It shows 2.8 million mutex waits. The 10064 trace contains many waits of microseconds durations:
... WAIT #2: nam='cursor: pin S' ela= 1 idn=3222383532 value=4294967297 where|sleeps=21474885649 obj#=57980 tim=2121493718879 WAIT #2: nam='cursor: pin S' ela= 4 idn=3222383532 value=4294967297 where|sleeps=21474885649 obj#=57980 tim=2121493718951 WAIT #2: nam='cursor: pin S' ela= 3 idn=3222383532 value=4294967297 where|sleeps=21474885649 obj#=57980 tim=2121493719001 ...
- As observer, we know that the session really waited for 49 seconds. However, Oracle Wait Interface registered only 3 seconds of mutex wait time out of actual 49 seconds wait. 94% of "wait" time was accounted as CPU. It was not considered “wait” in pre-11.2 Oracle. Background AWR housekeeping almost hide the mutex problem by standard waits.
Only one row of x$mutex_sleep_history continuously increased during wait.
SQL> select to_char(SLEEP_TIMESTAMP,'hh:mi:ss.ff') SLEEP_TIMESTAMP,MUTEX_ADDR,MUTEX_IDENTIFIER,MUTEX_TYPE,GETS,SLEEPS, 2 REQUESTING_SESSION,BLOCKING_SESSION,LOCATION,MUTEX_VALUE from x$mutex_sleep_history where mutex_identifier=3222383532;
|12:05:18.326511||3A9371338||3222383532||Cursor Pin||4000000||2871014||159||1||kkslce [KKSCHLPIN2]||100000001|
This confirmed my previous conclusion that x$mutex_sleep_history is an array hashed by MUTEX_ADDR and BLOCKING_SESSION.
Obviously the next step is to trace operating system calls made by the process waiting for “cursor: pin S” using Solaris truss utility. The results are interesting:
truss -p 15750 ... /1: yield() = 0 /1: yield() = 0 /1: yield() = 0 /1: yield() = 0 /1: yield() = 0 ...
Here we observe only the yield() syscalls in the truss output for waiting shadow process. There were no one real "wait" syscalls like pollsys(), semop() or nanosleep(). The man yield page states:
"The yield() function causes the current lightweight process to yield its execution in favor of another lightweight process with the same or greater priority."
Craig Shallahamer first described this behaviour in post "How mutexes sleep". MOS note 1310764.1 confirms this:
"Typically each wait is just a yield of the CPU in releases up to (and including) 18.104.22.168. A session will wait over and over on the same wait event until it acquires the mutex in S mode. If the wait is just a yield of the CPU then this can result in quite a high amount CPU, especially if there is very heavy concurrency for a particular cursor..."
Let me discuss the meaning of such "waits." After unsuccessful spin, the Oracle session registered wait begin and yields the CPU. If the process was descheduled from CPU, the session had to wait until the process run again. At that time session registered the end of wait and computed time waited.
The session waiting for "cursor: pin S" just allows other processes to run. If system has free CPU power, the session will not be descheduled, and Oracle thinks it is not waiting at all. My experiments demonstrated that Oracle versions 10.2 to 11.1 used this wait pattern.
Therefore, in Oracle 10.2-11.1 the "cursor: pin S" wait time is the pure wait for CPU.
If you observe long "cursor: pin S" waits in these Oracle versions, this only mean contention for the CPU, not for mutex.
On the opposite, until you have free CPU resources, 10.2-11.1 mutex contention is almost invisible to Oracle Wait Interface and AWR/ADDM.
The diagnostics signature of 10.2-11.1 "cursor: pin S" mutex contention is the large number of waits for mutex, not the wait time. Your system may suffer from huge mutex contention without any sign of it in "Top 5 Events."
Therefore, unlike latch, 10.2-11.1 mutex is "classic" spinlock without sleeps. If the mutex holding time is always small, this algorithm minimizes the elapsed time to acquire mutex. Spinning session acquires mutex immediately after its release. All waiting time is the "cpu time".
Such spinlock are vulnerable to variability of holding time. If some sessions hold "Cursor Pin" mutex in E mode for long, pure spinning wastes CPU. Spinning sessions can aggressively consume all the CPUs and affect the performance by priority inversion and CPU starvation. Tanel Poder described this phenomenon in detail.
This "waits" are not eco-friendly. We have registered abnormally high electric power consumption and overheating of server room during my experiments.
On the other hand, when sessions do not hold mutexes for a long time and you have CPU power, this algorithm achieves the best possible performance.
If you continuously observe long “cursor: pin S” waits," then you system do not have enough spare CPU for busy waiting. For such a case, Oracle provides the possibility to convert "busy" mutex wait into "standard" wait. This is the legendary "Patch 6904068: High CPU usage when there are “cursor: pin S” waits".
On Solaris SPARC 10.2.0.4 this patch is tuned by “_second_spare_parameter”. On other platforms and versions, the patch uses “_first_spare_parameter”.
I installed the patch, set “_second_spare_parameter”=1 and rerun the testcase. High CPU usage was magically diappeared:
Top 5 Timed Foreground Events
|Event||Waits||Time(s)||Avg wait (ms)||% Total Call Time||Wait Class|
|cursor: pin S||977||48||10||84.3||Other|
|db file sequential read||386||1||2||1.3||User I/O|
|log file parallel write||24||0||21||.9||System I/O|
|control file parallel write||19||0||16||.5||System I/O|
... /1: semtimedop(99, 0xFFFFFFFF7FFF7F64, 1, 0xFFFFFFFF7FFF7F50) Err#11 EAGAIN /1: semnum=22 semop=-1 semflg=0 /1: timeout: 0.010000000 sec /1: semtimedop(99, 0xFFFFFFFF7FFF7F64, 1, 0xFFFFFFFF7FFF7F50) Err#11 EAGAIN /1: semnum=22 semop=-1 semflg=0 /1: timeout: 0.050000000 sec /1: semtimedop(99, 0xFFFFFFFF7FFF7F64, 1, 0xFFFFFFFF7FFF7F50) Err#11 EAGAIN /1: semnum=22 semop=-1 semflg=0 /1: timeout: 0.010000000 sec
The parameter value defines the wait timeout in centiseconds.
The semtimedop() is "normal" OS wait. The waiting process can be awakened by post. However, I did not found any case when Oracle uses post/wait mechanism for mutexes.
The patch significantly decreases CPU consumption by spinning. Its drawback is larger elapsed time to obtain mutex. Ten milliseconds is long wait in Oracle timescale. Usually average mutex holding time is around 1 us and most sessions are waiting for nothing.
MOS note Database Hang due to Mutex Contention in 11g [ID 1292249.1] recommends “_first_spare_parameter”=10. This results in huge 0.1 sec timeout between spins. However, for all workloads I saw, the value 1 performs better. I will discuss the effectiveness of timeout choice in separate post. Look also on this results.
The “_first_spare_parameter” is dynamic. If you omit it or set it to 0, the instance behave exactly like without the patch. Therefore, I recommend to install the patch 6904068 in OLTP for the possibility to react dynamically on "cursor: pin S" contention without downtime.
I discussed how "cursor: pin S" mutex wait works in popular Oracle versions 10.2 to 11.1.
This is CPU consuming busy wait, almost invisible to Oracle Wait Interface and standard diagnostics tools. Patch 6904068 converts this wait into "normal" wait.
Stay tuned on this blog to see how revolutionary the mutex waits changed in Oracle 11g Release 2