Latch, mutex and beyond

May 1, 2011

Divide and conquer the “true” mutex contention

Filed under: 11.2,Contention,Mutex,Patches — andreynikolaev @ 7:16 pm

Oracle contains enhancements 9282521 and 9239863 named “Library cache: mutex X” for objects highly contended for. Part I and II. These enhancements introduce new interesting possibilities to tune some types of the mutex contention.

Contention for heavily accessed objects can now be divided between multiple copies of object in the library cache. According to notes 9282521.8 and 9239863.8 describing the patches, the enhancements should be used:
When there is true contention on a specific library cache object….
Let me investigate this deeper. I will use Oracle for Solaris SPARC (64-bit) on 8Cores/32Threads Sun Fire T2000. I chose this platform in order to emphasize how the enhancements work.

Standard disclaimer: Production workloads are much more complex than testcases. Not yet officially supported technique described here may be or may be not applicable to your system. You must thoroughly investigate your workload and receive explicit approval of Oracle Support Services before implementing this in production system.

I. The “Library cache: mutex X” contention testcase.

First of all I need the reproducible testcase that will allow me to explore the “Library cache: mutex X” contention.

As I wrote in previous post about “Cursor: pin S” testcase the true contention arises when the same operator executes concurrently at high frequency. This differs from the contention induced by many versions of cursor. The testcase for “Library cache: mutex X” contention looks like:

i number;
for i in 1..1000000
   execute immediate 'begin dbms_application_info.set_client_info(''mutex'');end;';
end loop;

The script uses PL/SQL loop to “execute immediate” the following anonymous PL/SQL block one million times:

begin dbms_application_info.set_client_info('mutex');end;

Hash value of this block is 2253951259. The dbms_application_info.set_client_info() procedure is fast and does not need latches or other resources. This allows me to create pure testcase without unneeded contention. In addition, it is surprisingly common to see mutex contention for dbms_application_info in contemporary over-instrumented applications.

When executed alone, the script takes 3:20 min on average in T2000. Similarly to my previous post I will execute this script by several concurrent sessions. Each time the session ‘execute immediate’ the PL/SQL block, it needs to acquire corresponding “Cursor Pin” mutex to pin the block itself and “Library Cache” mutex to pin the dbms_application_info package in the Library Cache. This will induce the mutex contention.

Look at “library_cache_mutex_x_contention.sql” script. It spawns several sqlplus sessions to execute the above PL/SQL block concurrently. In addition, it takes AWR snapshots before and after the script execution. I will submit 30 concurrent sessions to thrash my 32 CMT T2000 server:

sqlplus /nolog @library_cache_mutex_x_contention 30

The following AWR report section confirms that testcase induced heavy “library cache: mutex X” contention:

Top 5 Timed Foreground Events

Event Waits Time(s) Avg wait (ms) % DB time Wait Class
DB CPU   21,485   83.17  
library cache: mutex X 90,680,016 14,489 0 56.09 Concurrency
cursor: pin S 10,357 2 0 0.01 Concurrency
log file sync 34 1 17 0.00 Commit
direct path sync 1 0 247 0.00 User I/O

Host CPU (CPUs: 32 Cores: 8 Sockets: 1)

As expected the server runqueue raised to 30. Due to concurrency, the average time to execute the script increased 4 times from 3:20 min to 14:27 min.

For the discussion of mutex wait event parameters, diagnostics tools and scripts see my previous post. It is interesting that “library cache: mutex X” wait event is still absent in 11.2 documentation. Up to now the only official information source is MOS Note WAITEVENT: “library cache: mutex X” [ID 727400.1]. Look at mutex waits during contention peak:

SQL> set markup HTML on
SQL> @mutex_ash_waits
10:05:40 11 d8dxvkk7hxvq0 cursor: pin S 2253951259 161 8 begin dbms_application_info.set_client_i
10:05:41 155 d8dxvkk7hxvq0 cursor: pin S 2253951259 144 106 begin dbms_application_info.set_client_i
10:05:44 50 11c4s0a35j48v library cache: mutex X 791245243 169 0 DBMS_APPLICATION_INFO
10:05:46 203 11c4s0a35j48v library cache: mutex X 3225908833 26 0 DBMS_APPLICATION_INFO
10:05:47 11 11c4s0a35j48v library cache: mutex X 3225908833 114 0 DBMS_APPLICATION_INFO
10:05:49 155 11c4s0a35j48v cursor: pin S 2253951259 112 3 begin dbms_application_info.set_client_i

Most mutex waits occurred for heavily accessed PL/SQL anonymous block, package DBMS_APPLICATION_INFO and its body:

SQL> set markup HTML on
SQL> @library_cache_objects_heavily_accessed.sql
Child 0 000000038B975DC8 begin 2253951259 CURSOR 30 56602935 27766476 0
Parent 000000038D8DE8E8 DBMS_APPLICATION_INFO 791245243 PACKAGE 444 30000414 0 1
Parent 000000038D1DD3A8 DBMS_APPLICATION_INFO 3225908833 PACKAGE BODY 234 30000204 30000846 2

According to x$mutex_sleeps most mutex waits were occurred during library cache pin/unpin activity inside kglpin (“KGL: PIN heaps and load data pieces of an object”), kglpndl (“KGL PiN DeLete”) and kglpnal1 (“KGL PiN ALlOcate”) routines:

SQL> set markup html on
SQL> select mutex_type,location_id,location,sleeps,wait_time from x$mutex_sleep order by sleeps desc;
Library Cache 90 kglpnal1 90 218814816 155840921
Library Cache 95 kglpndl1 95 217545345 248695706
Library Cache 4 kglpin1 4 209842796 155358935
Cursor Pin 9 kksLockDelete [KKSCHLPIN6] 26255 368250
Cursor Pin 8 kksxsccmp [KKSCHLPIN5] 17171 187894
Cursor Pin 3 kksfbc [KKSCHLFSP2] 9150 112699

II. Cloning the library cache objects

The root cause of this contention is the concurrent access to the same “hot” library cache objects. If each session would have its own copy of object there would be no contention! Before this requires artificial tricks.

Enhancements 9282521 and 9239863 allow creation of multiple copies of specified library cache objects. Parameter “_kgl_hot_object_copies” controls the maximum number of copies. Complementary parameter _kgl_debug marks hot library cache objects as a candidate for cloning. Syntax of this parameter can be found in MOS descriptions of bugs 9684368, 11775293 and others. One form of such marking is:

“_kgl_debug”=”name=’schema=’ namespace= debug=33554432″

Both parameters are dynamic. However, my experiments showed that after dynamic change of parameters Oracle unevenly distributes mutex requests between hot object copies. This is why I will use these parameters in my demonstrations as static.

Let me make an experiment. In the above testcase the “library cache: mutex X” contention occurred for package DBMS_APPLICATION_INFO and its body. Corresponding library cache namespaces are 1 and 2. I can mark both the package and the body as “hot” using:

SQL>alter system set “_kgl_debug”=”name=’DBMS_APPLICATION_INFO’ schema=’SYS’ namespace=1 debug=33554432″, “name=’DBMS_APPLICATION_INFO’ schema=’SYS’ namespace=2 debug=33554432″ scope=spfile;

If I allow only two copies of hot objects, the mutex contention will be halved. The average time to execute script will decrease from 14:27 min to 7:50 min. In order to demonstrate full-scale improvement I will go further and allow each concurrent session to have its copy of DBMS_APPLICATION_INFO:

SQL>alter system set “_kgl_hot_object_copies”= 255 scope=spfile;
SQL>@library_cache_mutex_x_contention 30

ORACLE instance started.

Now, the script takes 5:30 min to execute on average. The “library cache: mutex X” contention is magically disappeared:

Top 5 Timed Foreground Events

Event Waits Time(s) Avg wait (ms) % DB time Wait Class
DB CPU   10,148   99.99  
cursor: pin S 43,713 6 0 0.06 Concurrency
log file sync 35 0 13 0.00 Commit
direct path sync 1 0 294 0.00 User I/O
latch free 882 0 0 0.00 Other

Now concurrently executed PL/SQL blocks do not interfere with each other. The library_cache_objects_heavily_accessed.sql script shows that library cache contains 30 DBMS_APPLICATION_INFO packages and 30 bodies. Each object was pinned one million times:

Child 0 000000038BB05640 begin dbms_applicati 2253951259 CURSOR 30 22740992 10459264 0
Parent 000000038BAD6B08 DBMS_APPLICATION_INF 3759149138 PACKAGE 2 1000001 0 1
Parent 000000038BAD6D28 DBMS_APPLICATION_INF 778396835 PACKAGE 2 1000001 0 1
Parent 000000038BA0A0B0 DBMS_APPLICATION_INF 1977222849 PACKAGE BODY 1 1000000 1000000 2
Parent 000000038B9824E8 DBMS_APPLICATION_INF 2894603672 PACKAGE 1 1000000 0 1

Sessions execute independently after tuning. We have unleashed the full power of Oracle TSeries Chip Multithreading parallelism!

III. “Cursor: pin S” contention

Being inspired by successful resolution of “Library cache: mutex X” contention, I will use the same approach for “Cursor: pin S” waits. In my previous post, I published corresponding testcase. In T2000 environment the testcase load profile is:

sqlplus /nolog @cursor_pin_s_contention.sql 35 

Top 5 Timed Foreground Events

Event Waits Time(s) Avg wait (ms) % DB time Wait Class
DB CPU   6,680   90.91  
cursor: pin S 1,056,416 425 0 5.79 Concurrency
resmgr:cpu quantum 6,940 373 54 5.08 Scheduler
log file sync 8 0 27 0.00 Commit
direct path sync 1 0 208 0.00 User I/O

Now I need another form of _kgl_debug syntax to mark SQL cursor “select 1 from dual where 1=2″ as ‘hot’:
“_kgl_debug”=”hash=’full hash value‘ debug=33554432″

The ordinary hash value of this SQL is 3222383532. However, we need a complete 128 bit hash value for library cache object here. You can find it in kglhahsv column of x$kglob:

SQL>select kglnahsv,kglnaobj from x$kglob where kglnahsh=3222383532 and kglhdadr =kglhdpar;

KGLNAHSV                          KGLNAOBJ
--------------------------------- ----------------------------------------
4d1ef2753f3bb11043fd2f61c011abac  select 1 from dual where 1=2
SQL>alter system set "_kgl_debug" = "hash='4d1ef2753f3bb11043fd2f61c011abac' debug=33554432" scope=spfile;
SQL>@cursor_pin_s_contention.sql 35
ORACLE instance started.

Again the mutex waits disappeared from AWR report after tuning:
Top 5 Timed Foreground Events

Event Waits Time(s) Avg wait (ms) % DB time Wait Class
DB CPU   7,047   91.34  
latch: shared pool 1,488 67 45 0.87 Concurrency
resmgr:cpu quantum 38 6 161 0.08 Scheduler
db file sequential read 1,268 1 1 0.01 User I/O
direct path sync 1 0 206 0.00 User I/O

The enhancements allow us to resolve the “Cursor: pin S” mutex contention also!

IV. No more underscores. The dbms_shared_pool.markhot() procedure.

I believe that supported way to use this functionality is not yet documented dbms_shared_pool.markhot() procedure. This new in procedure takes two forms:

schema IN VARCHAR2,
objname IN VARCHAR2,
namespace IN NUMBER DEFAULT 1, — library cache namespace to search
global IN BOOLEAN DEFAULT TRUE); — If TRUE mark hot on all RAC instances

hash IN VARCHAR2, — 16-byte hash value for the object
namespace IN NUMBER DEFAULT 1,

corresponding to two _kgl_debug syntaxes described above. There is also dbms_shared_pool.unmarkhot() procedure to unmark the hot object.

I repeated the experiments now marking objects as ‘hot’ using:

exec dbms_shared_pool.markhot(‘SYS’,’DBMS_APPLICATION_INFO’,1);
exec dbms_shared_pool.markhot(‘SYS’,’DBMS_APPLICATION_INFO’,2);
exec dbms_shared_pool.markhot(hash=>3222383532,NAMESPACE=>0);

However, the results were not consistent enough in Oracle created multiple copies of marked objects in library cache, but frequently uses only one of them. May be this depend upon history of object usage. To achieve more uniform distribution you should mark objects as hot immediately after the instance restart.

By default, Oracle created approximately “number of CPU Cores” hot objects copies. This observation should be further investigated. You can adjust this number by _kgl_hot_object_copies parameter.

Therefore if you prefer not to use hidden parameters, the dbms_shared_pool.markhot() can help you to resolve mutex contention.

In summary: Oracle is the right patchset for mutexes. In this post I discussed how we can use new features to “divide and conquer” the mutex contention which was caused by heavy concurrent access to library cache object.

About these ads


  1. Amazing post. Thanks for sharing!

    Comment by Thiago Maciel — May 2, 2011 @ 3:07 am | Reply

  2. Very interesting Andrey.

    Comment by Dom Brooks — May 2, 2011 @ 7:28 am | Reply

  3. Very interesting and detailed post. Thanks for sharing.

    Comment by Gokhan Atil — May 30, 2011 @ 6:30 pm | Reply

  4. Thanks for sharing this, did provide much insight for me on the matter.
    But still, a question remains: “Why do we need to acquire a “mutex X” for executing a stored procedure?”
    In my opinion, we only want to prevent someone else from changing/dropping the procedure, so a “mutex S” lock should be sufficient?

    kr, Bernhard

    Comment by Bernhard Wesely — July 25, 2011 @ 2:28 pm | Reply

    • Hello!
      Interesting question.
      I suppose that Oracle acquires “library cache” mutex in X mode here during examination of user rights for objects referenced inside anonymous PL/SQL block.

      Comment by andreynikolaev — November 18, 2011 @ 6:32 am | Reply

  5. Andrey,
    extermle interersting and elaborating post .
    I have a couple of questions:
    1) What is the actual difference between “markhot” and “keep” procedures of dbms_shared_pool package ?
    (apart from the fact that “markhot” procedure is according dbmspool.sql RAC aware …).

    Should I use only the new “markhot” instead of “keep” procedure ?

    procedure keep(hash varchar2, namespace number, heaps number);
    procedure markhot(hash varchar2, namespace number DEFAULT 1,
    global boolean DEFAULT true);

    2) When using dbms_shared_pool.markhot how do I know that I should specify NAMESPACE=>0 ,
    Where does the number 0 come from ? Could you elaborate please ?

    I have tried to “markhot” a with NAMESPACE= 1,2,3,4,5..64 and it worked.
    With NAMESPACE=100 I hava got ORA-03113 (Oracle on 64 bit OEL5.5).

    3) When I “keep” a cursor and then I try to “purge” the cursor (heap0 and/or heap6), I am getting an exception (expected behaviour).
    But when I use ‘markhot’ and then “purge”, there is no problem. Is this your experience too ?

    Best Regrads. Milen

    Comment by Milen — August 31, 2011 @ 2:35 pm | Reply

  6. Andrey,

    Thank you for such a wonderful post about latch mutexes! Your blog is quite excellent.
    One quick question, recently we ran a 600 user load test using Quest Benchmark Factory and see a ton of library cache:mutex X contention from the AWR reports. How would I tune these to resolve the concurrency performance issue with Oracle

    Ben Prusinski

    Comment by Ben Prusinski — November 17, 2011 @ 6:54 pm | Reply

    • Hello!
      Resolution of “library cache: mutex X” contention depends on its root cause.
      It may be “True contention” discussed in this post, CPU starvation, excessive cursor versions or some Oracle/OS bug.
      It depends on platform and Oracle version.

      What is your platform?
      Did you installed any patches on top of
      Did you saw huge runqueues during load test?
      What mutex wait “locations” you observed?

      Comment by andreynikolaev — November 18, 2011 @ 6:19 am | Reply

  7. [...] me ( may not be the good solution for production like instances ), I was lucky to came across with    Andrey S. Nikolaev has a great post sharing his experience to handle mutex issues. I was [...]

    Pingback by Library cache: mutex X « Jagjeet's Oracle Blog! — December 12, 2011 @ 11:54 am | Reply

  8. Чудесно!
    Есть небольшое дополнение: 33554432 это 0×2000000 в 16-теричной или просто установка в 1 26-го (считать с единицы) бинарного флага.

    Comment by Andrey Shindarev — December 16, 2011 @ 5:57 am | Reply

    • Thank you!
      However, this number should be decimal only otherwise _kgl_debug will not mark object as hot.

      Comment by andreynikolaev — March 15, 2012 @ 2:09 pm | Reply

  9. Hi,

    Good to read your very investigative post.

    Let me share my scenario with you. First of all, some background info –

    We have a n-tier J2EE web application and we use spring jdbc templates. We use connection pool, however we don’t use statement caching. We have a statement like this which runs about 10 to 12 times per second (and commits as well) during peak hours.

    INSERT INTO lb_stat (id, lb_id, login_id, read_date, lb_ufid, lb_created) VALUES (lb_seq.NEXTVAL, :1, :2, :3, :4, :5);

    The table where we insert, is a nested partitioned one (range-> hash). We have one unique local index on that table. Currently the table contains around 51 million records, anyways.

    Most of the times, I see some concurrency waits associated with this insert statement when I look at OEM (Top Activity). The waits are on “library cache: mutex X”. Sometimes this problem goes very severe and database load spikes and connection pool exhausts. I did some investigation to make sure we are not doing anything wrong which allure Oracle to behave like this. I have checked, as we are using bind variables properly, the statement is not producing many child cursors, and not invalidating those child. People have been saying, there might have mutex problems while there are many child cursors – which is not our case.

    Here is the “Mutex Sleep Summary” last time the problem went severe. I don’t have much mutex experience, not very sure about the meanings of mutex location.

    Mutex Type Location Sleeps Wait Time (ms)
    Library Cache kglhdgn2 106 50,520,523 -2,519,544
    Library Cache kglget1 1 30,510,854 75,750
    Library Cache kgldtin1 42 3,226,449 874,969
    Library Cache kgllkdl1 85 1,459,037 403,792

    Here are some AWR 30 min snap info for the problem window –

    Load Profile
    Parses: 913.4 (per second)
    Hard parses: 1.9 (per second)
    Logons: 0.3 (per second)
    Executes: 965.6 (per second)
    Transactions: 31.6 (per second)

    Instance Efficiency
    Execute to Parse %: 5.41
    Parse CPU to Parse Elapsd %: 11.75

    Top Wait Event

    Library cache: mutex X; Waits: 2,339; Times: 11,746; Avg (ms): 5022; % DB Time: 45.65; Class: Concurrency

    System Info –
    Oracle version:
    OS: Solaris 10 (x86-64), CPU: 22 Cores
    SGA Size: 16 GB
    Shared Pool: 5 GB
    Buffer Cache: 9 GB

    Would be nice to have your thoughts on this.


    Comment by Shawon — May 29, 2012 @ 9:30 am | Reply

    • Hello!
      1. Most probably you hit the bug.
      I do not know the exact patch level. Can you post the results of “opatch lsinventory”.

      The known bug candidates include:
      12633340 Heavy “library cache lock” and “library cache: mutex X” contention for a “$BUILD$.xx” lock
      and other bugs

      2. Can you post or send to me the results of
      mutex_ash_waits.sql script
      and full AWR report.
      Best Regards

      Comment by andreynikolaev — May 29, 2012 @ 10:36 am | Reply

  10. Thanks so much Andrey. I will collect those and share with you.


    Comment by Shawon — May 30, 2012 @ 9:14 am | Reply

    • Hi Andrey,

      I have read the bug notes you mentioned. Here are my understandings on those –

      10632113: They say, this only applies when OLS is being used. We don’t use VPD or OLS. Seems not applicable in our case.
      12633340: They say, if reload failures frequently occur while parsing, then it could be this bug. From the Shared Cursor Statistics below, I am not very sure if the reloads fails or not for the statement, but seems like everything normal on that. It shows for 30 min window –

      Total Parses 23,453
      Hard Parses 0
      Child Cursors 3
      Loaded Plans 2
      Invalidations 0

      May be some other bugs – there are tons of bugs discovered on mutex!


      Comment by Shawon — May 30, 2012 @ 12:16 pm | Reply

      • Hello!

        1. Yes, there are many bugs related to mutex.
        This is why I asked about exact list of patches already aplied.
        Which of PSU do you use?

        2. 5 seconds average wait for library cache mutex is enormous.
        Is your server overloaded? How large is the CPU loadverage?

        3. Use my scripts mutex_ash_wait.sql and mutex_wait.sql to diagnose whether the contention is for single mutex or widespread.
        If the contention is for single mutex, what Library Cache object is protected by it?

        Best Regards

        Comment by andreynikolaev — May 31, 2012 @ 6:56 pm

  11. [...] parameter. Look at the graph comparing the throughputs (number of PL/SQL calls per second) of my “library cache: mutex X” testcase with maximum sleep times set to 1/100s and [...]

    Pingback by Mutex waits. Part III. Contemporary Oracle wait schemes diversity. « Latch, mutex and beyond — July 30, 2012 @ 12:22 pm | Reply

  12. Very inspiring post… I have tried to also pin an object by its hash_value but this doesn’t seem to work :(:

    ADDRESS NAME HASH_VALUE TYPE Locked Pinned Executed Nsp
    —————- —————————— ———- ————— —— ——— ——— —-
    000000053884ADE8 TEST 1505372733 FUNCTION 5508 7995711 7995637 1

    SQL> exec dbms_shared_pool.markhot(hash=>1505372733,NAMESPACE=>0);
    BEGIN dbms_shared_pool.markhot(hash=>1505372733,NAMESPACE=>0); END;

    ERROR at line 1:
    ORA-04043: object does not exist
    ORA-06512: at “SYS.DBMS_SHARED_POOL”, line 138
    ORA-06512: at line 1

    Why am I not able to pin the function “TEST” by its hash_value?

    DB = with latest PSU


    Comment by Pawel — October 12, 2012 @ 2:16 pm | Reply

    • Hello!
      You should use ’full hash value‘ (KGLNAHSV) and NAMESPACE=>1
      to mark PL/SQL function or procedure

      Best Regards
      Andrey Nikolaev

      Comment by andreynikolaev — October 17, 2012 @ 7:05 pm | Reply

  13. [...] PL/SQL packages and procedures, use dbms_shared_pool.markhot() to mark them as hot. As suggested by Andrey Nikolaev, Oracle creates multiple copies of marked objects in the library cache, but frequently uses only [...]

    Pingback by Reducing “library cache: mutex X” concurrency with dbms_shared_pool.markhot() « Julian Dontcheff's Database Blog — February 12, 2013 @ 2:42 pm | Reply

  14. Andrey,

    Great article!

    Further questions, once a coursor/plsql block is markhot’ed, how do we verify how many copies of library cache object this particular cursor has?
    How many executions are spreaded across these muptiple copies?


    Comment by Eric — July 17, 2013 @ 5:33 am | Reply

  15. Good post. Does this also applicable to row cache? Can we use mark hot for row cache? Do you have the script to identify row cache contention?

    I am currently using this
    select p1text,p1,p2text,p2,p3text,p3 from v$session where event = ‘row cache lock’

    Comment by Des — July 24, 2013 @ 4:24 am | Reply

  16. Thanks Great article .in our current environment we execute the library_cache_objects_heavily_accessed.sql we see the select statement in the name column , so do we need to mark the table name in the select clause to hot . please advice .

    Comment by Hariharan — October 4, 2013 @ 7:15 pm | Reply

  17. Hi Andrey!

    Could you please advice if it’s possible to mark an anonymos PL/SQL block as hot?

    Comment by Artyom — December 18, 2013 @ 6:21 am | Reply

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme Blog at


Get every new post delivered to your Inbox.

Join 63 other followers

%d bloggers like this: