Latch, mutex and beyond

January 6, 2011

Spin tales: Part 1. Exclusive latches in Oracle 9.2-11g

Filed under: DTrace,Latch,Spinlock — andreynikolaev @ 9:59 pm

How does Oracle process spin for a latch? How many times does it check the latch before going to sleep? Anyone knows. This is the _spin_count=2000. Two thousand cycles by default. Oracle established this value long ago in version 6 at the beginning of 90s. However, let me count.

My previous investigation showed that latch wait was cardinally changed in Oracle 9.2. At that time, the exponential backoff disappeared. The latches have been using completely new wait posting since 2002. We may expect that latch spin have been changed too. Controversial results of _spin_count tuning in Oracle 9.2 confirm this also. In this series of posts, I will explore how the contemporary Oracle latches spin. The first post is about exclusive latches that form the majority of Oracle latches. For example, 460 out of 551 latches are exclusive in Oracle 11.2.0.2.

I will demonstrate that exclusive latches spin 10 times more than we expected. The _spin_count occurred to be effectively static for exclusive latches, and there is a big difference between not setting _spin_count and setting it to 2000.

Oracle Solaris has unique DTrace facility, which allows us to trace function flow. I will experiment with Oracle 10.2.0.4 for Solaris X86. My plan is to count the number of spins using the DTrace. The main idea is the same as in my investigation of latch wait. Again, I will use oradebug call to acquire the latch concurrently from two sessions. My tool will be latch_spin_trace.sql. This script differs from previously used latch_wait_trace.sql by timeout values. In addition, it spawns spin.d DTrace script instead of truss. Through this post, I will introduce several DTrace scripts to be used as spin.d.

At first, we do not know Oracle function that responsible for spins. To find it I used the following DTrace script:

#pragma D option quiet
pid$target::kslgetl:entry
{
 in_latch= 1;
}
pid$target:::entry
/ in_latch /
{
 printf("%s\n",probefunc);
}

pid$target::kslfre:entry
{
 exit(0);
}

This script traces all function calls inside the latch acquisition routine kslgetl(). The first DTrace trigger (probe) pid$target::kslgetl:entry turns on in_latch flag at the entry to kslgetl() function. The second trigger pid$target:::entry will fire at the entry to any Oracle function but only inside the latch acquisition due to predicate / in_latch /. This second trigger will print out the function name. I will use the (usually) unused “first_spare_latch” for demonstrations. The following output allows me to guess meaning of some functions:

sqlplus /nolog @latch_spin_trace.sql 45
...
LATCH_FUNC ADDR     LNAME
---------- -------- --------------------------------------------------
kslgetl    500063D0 first_spare_latch
...
kslgetl          -  KSL  GET exclusive Latch
sskgslgf        -  immediate latch get
kslges           - wait latch get
skgslsgts
sskgslspin       - spin for the latch
sskgslspin
sskgslspin
sskgslspin
sskgslspin
...

This is the latch spin! Function sskgslspin() is specific for Solaris X86 platform. Oracle process calls sskgslspin() each time just before the check for latch value. This function implements pause Intel X86 instruction, which is dedicated for spinlocks.

SPARC platform do not have such instruction. The spinning function named skgsltst() on Solaris SPARC. It implements test for the latch. Note that these functions are optimized out in Oracle 11.2. However, the spin is still the same.

Now I can write DTrace script to count the number of spinning function calls during one latch acquisition. This will be the actual spin count.

#pragma D option quiet
pid$target::kslgetl:entry
{
 in_latch= 1;
 actual_spin_count=0;
}
pid$target::sskgslspin:entry,
pid$target::skgsltst:entry
/ in_latch /
{
 actual_spin_count++;
}

pid$target::kslfre:entry
{
 printf("\nActual spin count was %d \n",actual_spin_count);
 exit(0);
}

DTrace probe pid$target::sskgslspin:entry will fire on each entry to sskgslspin() (or skgsltst() on SPARC). Script counts number of spinning function calls. Trigger at kslre() will report this count when the latch become free.

sqlplus /nolog @latch_spin_trace.sql 45
...
Actual spin count was 20000

This is surprising. We observe twenty thousands cycles. Ten times more then the default _spin_count value:

SQL> SELECT  ksppinm,ksppstvl,ksppdesc
  FROM x$ksppi  JOIN x$ksppcv USING (indx,inst_id)
  WHERE ksppinm = '_spin_count';
KSPPINM              KSPPSTVL   KSPPDESC
-------------------- ---------- -----------------------------------
_spin_count          2000       Amount to spin waiting for a latch

Let me dynamically change the _spin_count parameter in memory:

SQL> alter system set "_spin_count"=100 scope=memory;
$sqlplus /nolog @latch_spin_trace.sql 45
...
Actual spin count was 20000

Again, we observed twenty thousand cycles. Actual spin count was not changed. I repeated the experiment for all exclusive latches with the same result. It occurs that exclusive latch spin do not dynamically influenced by _spin_count. This explains why I never saw any change in contention for Oracle 10g library cache and shared pool latches after _spin_count adjustment.

If you are wondering where this 20000 number comes from, it is x$ksllclass table, which describe eight latch classes:

SQL>  select indx,spin from x$ksllclass;
      INDX       SPIN
---------- ----------
         0      20000
         1      20000
...

By default, all the latches except process allocation latch belong to class 0. All such exclusive latches spin 20000 times. This value can be directly adjusted by _latch_class_0 parameter. This is static parameter:

alter system set "_latch_class_0"="100" scope=spfile
System altered.
SQL> startup force;
ORACLE instance started.
...
SQL> select indx,spin from x$ksllclass;

      INDX       SPIN
---------- ----------
         0        100
         1      20000
         2      20000
...
SQL> exit
$sqlplus /nolog @latch_spin_trace.sql 45
...
Actual spin count was  100 times

Indeed, actual spin count for latch class 0 changed. The _latch_class_[0-7] parameters allow fine-tuning of spin counts and wait timeouts for latch classes.

Surprisingly, the another way to change the actual spin count for exclusive latches is the _spin_count parameter. If _spin_count is specified on instance startup, Oracle fills the x$ksllclass.spin column with the _spin_count value. As you may expect, the _latch_class_[0-7] parameter have the preference over _spin_count for corresponding latch class:

SQL> alter system reset "_latch_class_0" scope=spfile sid='*';
System altered.
SQL> alter system set "_spin_count"=300 scope=spfile;
System altered.
SQL> startup force
ORACLE instance started.
...
SQL> select indx,spin from x$ksllclass;

      INDX       SPIN
---------- ----------
         0        300
         1        300
         2        300
         3        300
         4        300
...
$ sqlplus /nolog @latch_spin_trace.sql 45
SQL*Plus: Release 10.2.0.4.0 - Production on Thu Dec 2 03:34:54 2010
...
Actual spin count was  300 times

Note that instance restart with the _spin_count parameter changed x$ksllvlass.spin values for all the latch classes. This can induce unexpected exclusive latch contention if someone sets the _spin_count to its “default” value of 2000 and restarts the instance.

In summary, the above experiments show that:

  • Oracle spins 20000 times for exclusive latch by default
  • The _spin_count is effectively static parameter for exclusive latches. There is no sense to dynamically tune _spin_count if we observe the contention for exclusive latch.
  • There is a big difference between _spin_count=2000 and not setting the _spin_count in parameter file.

7 Comments »

  1. […] Filed under: DTrace,Latch,shared latch,_spin_count — andreynikolaev @ 6:23 pm My previous experiments demonstrated that, opposite to common belief, spin count for exclusive latches in Oracle 9.2-11g […]

    Pingback by Spin tales: Part 2. Shared latches in Oracle 9.2-11g « Latch, mutex and beyond — January 14, 2011 @ 6:23 pm | Reply

  2. […] belong to the standard class 0. In my previous posts, I discussed how the standard class exclusive and shared latches spin and wait. Now, it is time to explore the non-standard class latch behaviors. […]

    Pingback by Spin tales: Part 3. Non-standard latch classes in Oracle 9.2-11g « Latch, mutex and beyond — January 18, 2011 @ 8:20 pm | Reply

  3. […]   Max spin time for exclusive latch (~20000*5 ticks) […]

    Pingback by Latch Timescales « Latch, mutex and beyond — December 22, 2011 @ 12:44 pm | Reply

  4. […] de CPU welke 10-12 miliseconde duurt, is dit ontzettend lang vergeleken bij een latch. Daarom “spint” een latch als deze bezet is. Eigenlijk is dit niets anders dan een loop; een latch “loopt” […]

    Pingback by Werken met Latches | Vijfhart Weblog — November 22, 2012 @ 9:10 am | Reply

  5. […] via Spin tales: Part 1. Exclusive latches in Oracle 9.2-11g | Latch, mutex and beyond. […]

    Pingback by Spin tales: Part 1. Exclusive latches in Oracle 9.2-11g | Latch, mutex and beyond | clusterclouds — November 4, 2014 @ 3:50 pm | Reply

  6. […] per Andrey Nikolaev 460 out of 551 latches are exclusive in Oracle 11.2.0.2. In each new version, Oracle tries to make […]

    Pingback by Latches: What do we know ? | Persistent Storage Solutions — November 26, 2015 @ 2:10 pm | Reply


RSS feed for comments on this post. TrackBack URI

Leave a comment

Create a free website or blog at WordPress.com.