Latch, mutex and beyond

January 14, 2011

Spin tales: Part 2. Shared latches in Oracle 9.2-11g

Filed under: DTrace,Latch,shared latch,_spin_count — andreynikolaev @ 6:23 pm

My previous experiments demonstrated that, opposite to common belief, spin count for exclusive latches in Oracle 9.2-11g cannot be tuned dynamically. The _spin_count parameter is effectively static for exclusive latches. This seems to disagree with the well-known study “Using _spin_count to reduce latch contention in 11g” by Guy Harrison. The study explored how dynamic tuning of _spin_count influenced latch waits, CPU consumption and throughput. I think that there is no contradiction. Probably Guy Harrison’s experiments have been performed with the cache buffers chains latch contention. This is the shared latch.

We already know that exclusive latch in Oracle 9.2-11g uses static spin value from x$ksllclass fixed table. This spin can be adjusted by _latch_class_0 parameter. By default, the exclusive latch spins up to twenty thousand cycles.

This post will show that shared latch in Oracle 9.2-11g is governed by _spin_count value and spins upto four thousand cycles by default.

I will use the latch_spin_trace.sql script from my previous post. The script already checks whether the latch is shared and uses kslgetsl() function to acquire latch concurrently from two sessions. The first acquisition will be in S mode. The second incompatible X mode latch acquisition will wait. Through this post, I will introduce several DTrace scripts to explore how Oracle process spins for the shared latch. My platform is soon to be deprecated Oracle 10.2.0.4 32 bit for Solaris X86, but the results are generic. In additions, I have checked the results for all shared latches.
At first, I need to know Oracle functions responsible for shared latch operations. To find it I used the simple DTrace script:

#!/usr/sbin/dtrace -s
#pragma D option quiet
BEGIN {
 printf("func  pid: %d\n", $target);
}
pid$target::kslgetsl:entry
{
 in_latch= 1;
 count_ =0;
}
pid$target:::entry
/ in_latch && (count_< 50) /
{
 printf("%s\n",probefunc);
 count_++;
}
pid$target::kslfre:entry
{
 in_latch= 0;
 exit(0);
}

The script traces function execution inside the latch acquisition routine kslgetsl(). Only first 50 function calls will be displayed. I will experiment here with not used by my single-instance Oracle gcs partitioned table hash shared latch (latch#=102).

sqlplus /nolog @latch_spin_trace.sql 102
...
LATCH_FUNC ADDR     LNAME
---------- -------- --------------------------------------------------
kslgetsl    50009BAC gcs_partitioned_table_hash
...
pid: 7270
kslgetsl       - KSL GET Shared Latch
kslgess        - wait latch get
kslskgs        - shared latch spin get
sskgslspin    - spinning function
sskgslspin 
sskgslspin 
sskgslspin 
sskgslspin 
...

The script output allowed me to guess the meaning of some functions. Like in my previous post, the second DTrace script will count the actual number of spin cycles:

#pragma D option quiet
pid$target::kslgetsl:entry
{
 in_latch= 1;
 actual_spin_count=0;
}
pid$target::sskgslspin:entry,
pid$target::skgsltst:entry
/ in_latch /
{
 actual_spin_count++;
}
pid$target::kslfre:entry
{
 printf("\nActual shared latch spin count was %d \n",actual_spin_count);
 exit(0);
}
sqlplus /nolog @latch_spin_trace.sql 102
...
Actual shared latch spin count was 4000

This is intriguing. Default value of _spin_count parameter is only 2000. Let me change it:

SQL> alter system set "_spin_count"=100 scope=memory;
$sqlplus /nolog @latch_spin_trace.sql 102
...
Actual shared latch spin count was 200

We just observed that shared latch spin could be dynamically tuned by _spin_count. The actual spin count for shared latch is 2*_spin_count. Since we do not know the spin implementation, this 4000 value has no meaning itself. It may be only compared to 20000 cycles for exclusive latch spin. It seems that Oracle developers supposed that shared latches like cache buffer chains are busy for shorter periods then exclusive latches.

In order to shed light on why _spin_count was multiplied by 2, I wrote a little more complex DTrace script. It traces the order of function execution:

#!/usr/sbin/dtrace -ZCs
#pragma D option quiet

/* latch# offset inside latch structure */
#ifdef  _LP64
#define KSLLT_KSLLTNUM 12
#else
#define KSLLT_KSLLTNUM 8
#endif

BEGIN {
 printf("\nShared Latch spin trace for pid: %d\n", $target);
 spins=0;
}

pid$target::kslgetsl:entry	/* KSL GET Shared Latch */
{
 in_latch= 1;
 depth=0;
 this->lnum= * (unsigned short int *) copyin((uintptr_t) (arg0+KSLLT_KSLLTNUM),2);
 printf("%s(0x%X,%d,%d,%d,%d)\t\t - KSL GET Shared Latch# %d\n",probefunc,arg0,arg1,arg2,arg3,arg4, this->lnum);
}

pid$target::sskgslspin:entry	/* X86 spinning function */
/ in_latch /
{
 actual_spin_count++;
}

pid$target::kslgess:entry	/* shared latch wait get */
/ in_latch /
{
  depth+=2;
  printf("%*s%s(0x%X, ...)\t\t - shared latch wait get\n",depth,"",probefunc,arg0);
}
pid$target::kslgess:return
/ in_latch /
{
  depth-=2;
}

pid$target::kslskgs:entry 	/* spin for shared latch */
/ in_latch /
{
  actual_spin_count=0;
  laddress=arg0;
  depth+=2;
  printf("%*s%s(0x%X, ...)\t\t - spin for shared latch\n",depth,"",probefunc,arg0);
}
pid$target::kslskgs:return
/ in_latch /
{
  printf("%*s  sskgslspin(0x%X)\n",depth,"",laddress);
  printf("%*s    ... previous call repeated %d times. \n",depth,"",actual_spin_count);
  depth-= 2;
  spins+=actual_spin_count;
  actual_spin_count=0;
}

pid$target::kslwlmod:entry	/* KSL latch Wait List MODification */
/ in_latch /
{
  in_latch=0;
  in_wlmod=1;
  depth+=2;
  printf("%*s%s(...)\t\t\t - KSL Wait List MODification\n",depth,"",probefunc);
}

pid$target::kslwlmod:return
/ in_wlmod /
{
  in_latch=1;
  in_wlmod=0;
  depth-=2;
}

pid$target::skgpwwait:entry	/* Oracle Wait Interface event*/
/ in_latch /
{
  depth+=2;
  printf("%*s%s(...)\t\t\t - latch sleep until posted\n",depth,"",probefunc);
}
pid$target::skgpwwait:return
/ in_latch /
{
  depth -=2;
}

syscall::semsys:entry		/* semop syscall -  wait until posted */
/(pid == $target) && in_latch&& ((int)arg0 == 2)/
{
  depth+=2;
  printf("%*s  semop(%d,...)\n",depth,"",(int) arg1);
  exit(0);
}

END {
 printf("Disconnected\n");
}

The script enables DTrace probes (triggers) on entry and return of all previously mentioned functions related to shared latch. The probes fire only inside the kslgetsl() call due to predicate /in_latch/. Each probe adjusts function execution depth variable, and prints out the name and description of function in tree-like order. The descriptions beginning with “KSL” were found using MOS and Google. For the spinning function sskgslspin() the script counts the number of its cycles.

Note, that the script demonstrates some usefull technique. It reports the latch number (v$latch.latch#) which is stored inside the latch structure at some offset. This offset differs for 32 and 64-bit Oracle ports. I denote it KSLLT_KSLLTNUM. Probe at pid$target::kslgetsl:entry uses DTrace copyin() function to copy the latch number from userland memory into DTrace kernel buffer. Such technique allows tracing of any latch structure member, if someone knows the offset.

For those who are not familiar with the DTrace, probe syscall::semsys:entry fires at every semsys() system call in my Solaris box. The predicate /(pid == $target)/ filters only calls originating from my process.

$ sqlplus /nolog @latch_spin_trace.sql 102
...
Shared Latch spin trace for pid: 7949
kslgetsl(0x50009BAC,1,2,3,16)            - KSL GET Shared Latch# 102
  kslgess(0x50009BAC, ...)               - shared latch wait get
    kslskgs(0x50009BAC, ...)             - spin for shared latch
      sskgslspin(0x50009BAC)
        ... previous call repeated 2000 times.
    kslwlmod(...)                        - KSL Wait List MODification
    kslskgs(0x50009BAC, ...)             - spin for shared latch
      sskgslspin(0x50009BAC)
        ... previous call repeated 2000 times.
    skgpwwait(...)                       - latch sleep until posted
        semop(27,...)
Disconnected

The results demonstrate that Oracle process spins upto _spin_count twice. After first unsuccessful spin, the process puts itself into the latch wait list and spins again. If the second spin also not succeed, the process sleeps using semop() syscall until posted. Compare this with onetime spin for the exclusive latch:

sqlplus /nolog @latch_spin_trace.sql 45
...
LATCH_FUNC ADDR     LNAME
---------- -------- --------------------------------------------------
kslgetl    500063D0 first_spare_latch
...
Latch spin trace for pid: 8101
kslgetl(0x500063D0,1,2,3)                - KSL GET exclusive Latch# 45
  sskgslgf(0x500063D0)                   - immediate latch get
  kslges(0x500063D0, ...)                - wait get of exclusive latch
    skgslsgts(...,0x500063D0, ...)       - spin get of exclusive latch
      sskgslspin(0x500063D0)
        ... previous call repeated 20000 times.
    kslwlmod(...)                        - KSL Wait List MODification
    sskgslgf(0x500063D0)                         - immediate latch get
    skgpwwait(...)                       - latch sleep until posted
        semop(31,...)

Another interesting thing is that spin for shared latch depends on get mode. I discussed shared latch modes here. Shared latch may be acquired in S (shared) and X (exclusive) modes. At the time of acquisition, the latch itself may be held in S, X or in special blocking mode. It occurs that Oracle process spins differently in these cases. I repeated the above experiments with latch spin tracing in all possible combinations:

S mode get X mode get
Held in S mode Compatible 2*_spin_count
Held in X mode 0 2*_spin_count
Blocking mode 0 2*_spin_count

The matrix showed that S mode latch get do not spin at all. Possibly Oracle developers decided that there is no sense to spin if shared latch already is held in incompatible mode. Of course there is a possibility that S mode latch get spins in some unknown way.

In summary. This post demonstrated that shared latch spin in Oracle 9.2-11g is directly influenced by _spin_count value at least for X mode gets. It also showed that Oracle process spins for latch in a very non-trivial way. We must consider it when tuning the latch contention.

6 Comments »

  1. […] to the standard class 0. In my previous posts, I discussed how the standard class exclusive and shared latches spin and wait. Now, it is time to explore the non-standard class latch behaviors. Parameter […]

    Pingback by Spin tales: Part 3. Non-standard latch classes in Oracle 9.2-11g « Latch, mutex and beyond — January 18, 2011 @ 8:20 pm | Reply

  2. Oracle’s default spin_count value is 2000.
    why the spin_count value in x$ksllclass as 20,0000?

    Thanks

    Comment by orapsdba — November 26, 2011 @ 10:49 am | Reply

    • Please read my previous post

      Comment by andreynikolaev — November 26, 2011 @ 11:08 am | Reply

  3. sorry! typo! please read 20,0000 as 20,000.

    thanks.

    Comment by orapsdba — November 26, 2011 @ 10:50 am | Reply

  4. […]   Max spin time for shared latch (~2*2000*5 ticks) […]

    Pingback by Latch Timescales « Latch, mutex and beyond — December 22, 2011 @ 12:44 pm | Reply

  5. […] shared latch, number of times process spins is _spin_count * 2. This has been proved by Andrey Nikolaev. Also, _spin_count parameter is applicable to shared latches by default. So since default value of […]

    Pingback by Latches: What do we know ? | Persistent Storage Solutions — November 26, 2015 @ 2:10 pm | Reply


RSS feed for comments on this post. TrackBack URI

Leave a comment

Blog at WordPress.com.