I like testcases. One testcase results in more understanding than ten page article or weeks of data collection. This is why we need reproducible testcases if we want to explore mutex contention. Testcases will also give me a possibility to demonstrate how to use mutex contention diagnostics tools embedded in Oracle. I will use Oracle 126.96.36.199 for Linux X86 32bit on my Dual-Core laptop in this posts. Your numbers for other Oracle versions and platforms may vary.
I. “Cursor: pin S” contention testcase:
Each time the session execute SQL operator, it needs to ‘pin’ the cursor in library cache using mutex. True mutex contention arises when the same SQL operator executes concurrently at high frequency. Therefore the simplest testcase for “Cursor: pin S” contention should look like:
begin for i in 1..1000000 loop execute immediate 'select 1 from dual where 1=2'; end loop; end; /
The script uses PL/SQL loop to “execute immediate” simple SQL one million times. I will name such scripts as “one_thread.tmp”.
In Oracle 11g ‘select … from dual’ actually do not use the DUAL table at all. This allows me to create pure testcase without additional latch contention. Hash value of SQL operator ‘select 1 from dual where 1=2’ is 3222383532. It is worth to note that session_cached_cursors parameter value must be nonzero to avoid soft parses. Otherwise, we will see contention for “Library Cache” and “hash table” mutexes also. This script is essentially the same as lotssparses.sql script by Tanel Poder used in no soft parses environment.
In order to induce mutex contention I will execute this script by several simultaneous concurrent sessions. To achieve this I will use the following sqlplus trick:
... set define % spool many_threads.tmp select 'host sqlplus "/ as sysdba" @one_thread.tmp &' from dba_objects where rownum <= %1; spool off @many_threads.tmp ...
I combined both scripts in single “cursor_pin_s_contention.sql” script. In addition, the script creates snapshots and generates AWR report. At first I will submit only 2 concurrent sessions to simplify the analysis.
sqlplus /nolog @library_cache_mutex_contention 2 ...
II. Mutex contention in Oracle Wait Interface:
The following AWR report section confirms that testcase really induces mutex contention:
Top 5 Timed Foreground Events
|Event||Waits||Time(s)||Avg wait (ms)||% DB time||Wait Class|
|cursor: pin S||567||16||27||16.70||Concurrency|
|cursor: pin S wait on X||3||0||35||0.11||Concurrency|
|latch: shared pool||30||0||2||0.06||Concurrency|
|db file sequential read||376||0||0||0.05||User I/O|
You can see that AWR (and v$system_event) registered 567 waits for cursor: pin S in total. Let me look at sample of these waits recorded by ASH:
SQL> set markup HTML on SQL>select session_id,sql_id,event,p1text,p1,p2text,p2,p3text,p3,time_waited from v$active_session_history where session_state='WAITING' and (event like 'library%' or event like 'cursor%') ;
|15||47z9gc7013axc||cursor: pin S||idn||3222383532||value||8912896||where||196608||20773|
|136||47z9gc7013axc||cursor: pin S||idn||3222383532||value||983041||where||196608||18754|
|136||47z9gc7013axc||cursor: pin S||idn||3222383532||value||983041||where||196608||18695|
- P1 = “idn” = Unique Mutex Identifier. Hash value of library cache object protected by mutex or hash bucket number.
- P2 = “value” = “Blocking SID | Shared refs” = Top 2 (4 on 64bit) bytes contain SID of blocker. This session is currently holding the mutex exclusively or modifying it. Lower bytes represent the number of shared references when the mutex is in-flux
- P3 = “where” = “Location ID | Sleeps” = Top 2(4) bytes contain location in code (internal identifier) where mutex is being waited for. Lower bytes contain the number of sleeps for this mutex. These bytes not populated on some platforms, including Linux
This allows me to write query to decompose P2 and P3 as:
SQL> set markup HTML on SQL>SELECT event,p1,FLOOR (p2/POWER(2,4*ws)) blocking_sid,MOD (p2,POWER(2,4*ws)) shared_refcount, FLOOR (p3/POWER (2,4*ws)) location_id,MOD (p3,POWER(2,4*ws)) sleeps, blocking_session FROM (SELECT DECODE (INSTR (banner, '64bit'), 0, '4', '8') ws FROM v$version WHERE ROWNUM = 1) wordsize, v$active_session_history WHERE event = 'cursor: pin S';
|cursor: pin S||3222383532||00880000||136||0||00030000||3||0|
|cursor: pin S||3222383532||000f0001||15||1||00030000||3||0||15|
|cursor: pin S||3222383532||000f0001||15||1||00030000||3||0|
From this example I can conclude that BLOCKING_SESSION column of v$active_session_history (and v$session) not always populated for mutex sleeps. One should not rely on this column when diagnosing mutex performance problems.
Anticipating discussion of mutex waits I would like to note that signature of mutex contention is not the wait time, but the frequency of waits.
To simplify diagnostics of mutex waits in sqlplus I wrote the scripts:
- mutex_waits.sql – list sessions currently waiting for mutexes
- mutex_last_waits.sql – list up to last 10 session waits for mutexes from v$session_wait_history
- mutex_ash_waits.sql – list waits for mutexes from v$active_session_history
III. Library Cache counters:
In the above output P1 column corresponds to hash value of library cache object. We can find corresponding object querying kglnahsh column of x$kglob fixed table. MOS note 9239863.8 provide us with another mutex contention diagnostics tool hidden in x$kglob. The note states that high values of kglobt23 and kglobt24 columns allow us to identify objects being accessed heavily. According to 11.2 v$sql fixed view definition these columns mean LOCKED_TOTAL and PINNED_TOTAL correspondingly. These are cumulative numbers of locks and pins for library cache object. Such counters were introduced in Oracle 188.8.131.52. Look at x$kglob in my instance:
SQL> select case when (kglhdadr = kglhdpar) then 'Parent' else ' Child '||kglobt09 end cursor, kglhdadr ADDRESS,kglnaobj name, kglnahsh hash_value,kglobtyd type,kglobt23 LOCKED_TOTAL,kglobt24 PINNED_TOTAL from x$kglob where kglnahsh=3222383532;
|Parent||2D58FE64||select 1 from dual where 1=2||3222383532||CURSOR||4||1|
|Child 0||2D58FB9C||select 1 from dual where 1=2||3222383532||CURSOR||4||1999974|
|Child 1||2B202480||select 1 from dual where 1=2||3222383532||CURSOR||1||2|
The counters reveal that sessions were contended for mutex of child cursor 0 only. This object was pinned 2 million times during test load.
IV. Mutex specific fixed tables:
The mutex sleep location_id corresponds to latch where parameter. Unlike the latches, Oracle did not externalize all possible location_id values in separate table. However, we can see location_id meaning in x$mutex_sleep fixed table:
SQL> select mutex_type,location_id,location,sleeps,wait_time from x$mutex_sleep;
|Cursor Pin||9||kksLockDelete [KKSCHLPIN6]||392||4534082|
|Cursor Pin||5||kkslce [KKSCHLPIN2]||4||75129|
|Cursor Pin||3||kksfbc [KKSCHLFSP2]||1168||12359336|
SLEEPS column counts sleeps at this LOCATION by all sessions for all mutexes of this type. WAIT_TIME is cumulative time (in microseconds) slept at this LOCATION. Sum of the SLEEPS over all LOCATIONS is equal to 1564. This is the total number of waits for “Cursor Pin”. Comparing the this number with only 567 waits registered by Oracle Wait Interface, I can conclude that not every mutex sleep is visible to Oracle Web Interface.
In my testcase most sleeps were at kksfbc [KKSCHLFSP2] and kksLockDelete [KKSCHLPIN6] locations. Function kksfbc means “KKS: Find Bound Cursor” and was called during parsing of SQL. Function kksLockDelete was called just after SQL execution when the session releases runtime locks. MOS note 1292249.1 attributed these locations to patch 6904068 High CPU usage when there are “cursor: pin S” waits. This patch is really enhancement, that still is not in mainline. I will explore it in separate post.
Look now at x$mutex_sleep_history table:
SQL> select to_char(SLEEP_TIMESTAMP,'hh:mi:ss.ff') SLEEP_TIMESTAMP,MUTEX_ADDR,MUTEX_IDENTIFIER,MUTEX_TYPE,GETS,SLEEPS, REQUESTING_SESSION,BLOCKING_SESSION,LOCATION,MUTEX_VALUE from x$mutex_sleep_history order by sleep_timestamp;
|06:19:23.465130||2CDA1E80||3222383532||Cursor Pin||2973608||1256||136||0||kksfbc [KKSCHLFSP2]||00|
|06:19:25.145753||2CDA1E80||3222383532||Cursor Pin||3728317||1552||136||15||kksLockDelete [KKSCHLPIN6]||000F0001|
|06:19:25.200443||2CDA1E80||3222383532||Cursor Pin||3742424||1559||15||136||kksLockDelete [KKSCHLPIN6]||00880002|
The above output consists of only 3 rows for 1564 sleeps! It is clear that, contrary to common believe x$mutex_sleep_history fixed table can not be circular buffer for mutex sleeps. If I spawn 4 sessions, this view will contain 5 rows, and so on.
My guess is that the x$mutex_sleep_history is actually hash array in SGA. Experiments show that hash key of this array is likely to depend on MUTEX_ADDR and BLOCKING_SESSION. Row for each next sleep for the same mutex and blocking session will replace the row jf previous sleep. This also explains why each of SLEEPS values in the above output are close to total number of 1564 sleeps.
I can interpret this output as following:
- The last sleep for mutex 0x2CDA1E80 occurred at 06:19:25.200443. It was blocked by SID 136. At this time, the mutex was requested 3742424 and slept for 1559 times. These counters are analogs of latch GETS and SLEEPS statistics. Last 1559-1552=7 sleeps for this mutex were blocked by this SID. Each time the next row replaced the previous.
- The 1552th sleep for this mutex was blocked by session with SID 15 at 06:19:25.145753. Since the blocking session was different, sleep was recorded in another row of x$mutex_sleep_history. The mutex was requested 3728317 times before this happened
- The 1256th sleep for this mutex was blocked by unknown session at 06:19:23.465130. Actually the mutex was in-flux. All sleeps between 1257 and 1559 were blocked by sessions 15 and 136.
- Since my simplest testcase used only one mutex and 2 sessions, there are no other combinations of mutex and blocking SID. All the mutex sleeps had been hidden behind these 3 rows.
This interpretation also mean that we can not not sum up SLEEPS or GETS values.
According to Oracle documentation and my experimens columns of x$mutex_sleep_history mean:
- MUTEX_ADDR – Address of mutex in memory. To my knowledge, this is the only place where Oracle externalizes “Cursor Pin” mutex address. Due to dynamic nature of mutexes, Oracle do not have any fixed table like v$latch to list all mutexes
- MUTEX_IDENTIFIER – Mutex get identifier. It is equal to library cache hash value or bucket number to protect. Therefore it may be different for the same mutex depending on get context.
- SLEEP_TIMESTAMP – The last timestamp this mutex identifying by MUTEX_ADDR was slept because mutex being held by BLOCKING_SESSION.
- MUTEX_TYPE_ID – Internal ID of mutex type
- MUTEX_TYPE – Meaning of MUTEX_TYPE_ID. Known values are “Cursor Pin” “Cursor Parent”, “Library Cache” and “hash table”
- GETS – Cumulative number of times the mutex was requested at all locations before this sleep. This is the only place where Oracle externalizes this statistics.
- SLEEPS – Cumulative number of times the mutex was slept at all locations before this sleep
- REQUESTING_SESSION – The SID of a session requesting the mutex
- BLOCKING_SESSION – The SID of a session holding the mutex
- LOCATION_ID – The ID of code location where the waiter slept for the mutex
- LOCATION – Meaning of LOCATION_ID
- MUTEX_VALUE – Mutex value while slept. the same as P2 in the above discussion of mutex wait event
In the next posts I will discuss more complex testcases and parameters to tune mutex contentions