Latch, mutex and beyond

November 17, 2010

Shared latch behaves like enqueue

Filed under: DTrace,Latch,shared latch — andreynikolaev @ 10:16 pm

We know a lot about the exclusive latches. This is Oracle realization of TTS spinlocks. Only one process at a time can hold the exclusive latch. Other processes compete for the exclusive latch by spinning. If process can not get the latch by spinning, it will wait until the latch becomes free.

But since the version 8.0 Oracle had another spinlock – shared latch. This is a realization of “Read-Write” spinlocks. Such spinlocks can be held by several “reader” processes simultaneously in SHARED mode. But if the process needs to write to the protected structure it must acquire RW spinlock in EXCLUSIVE mode. This mode prevents any concurrent access to the latch.

From version to version the number of shared latch increased. Several widely known latches like “session idle bit”, “In memory undo latch”. “Result Cache: RC Latch”, “resmgr group change latch” are shared.

Famous “cache buffers chains” latch was also became shared in Oracle 9.2. We usually react on “cache buffers chains” latch contention finding ineffective SQL plans and “hot blocks”. Recently Kyle Hailey posted an excellent graphical explanation of Oracle mechanics related to “latch: cache buffers chains” contention. But it always was a mystery to me why the sessions have to wait for SHARED latch during READ operations like searching the hash chains. Other busy shared latches like “session idle bit” do not experience such contention. This is why I would like to dive deeper into shared latch internals.

Let us explore the shared latch acquisition. Hopefully Oracle gives us a possibility to call any its kernel function using the oradebug utility. Oracle uses kslgetl(laddr, wait, why, where) kernel function to acquire the exclusive latch. Tanel Poder first used this function to artificially acquire the latch for demonstrations. Recently I blogged about this function and its parameters.
For the shared latches Oracle 10g uses kslgetsl(laddr, wait, why, where, mode) function. Oracle 11g has kslgetsl_w() function with the same interface, but internally uses ksl_get_shared_latch(). Like in my previous post, I guess the meaning of kslgetsl() arguments as:

  • laddress – address of latch in SGA
  • wait – flag. If not 0, then willing-to-wait latch get
  • where –location from where the latch is acquired (x$ksllw.indx)
  • why – context why the latch is acquired at this where.

And the last one is:

  • mode – Exclusive or shared mode

On Solaris 10 it is easy to trace the values of arguments using the following DTrace one-liner:

dtrace -n 'pid$target::kslgetsl:entry{printf("laddress=%p mode=%d\n",arg0,arg4)}' –p
 …
  0  45881                   kslgetsl:entry laddress=0x5cb40888 mode=8
  0  45881                   kslgetsl:entry laddress=0x5cb40888 mode=16
  1  45881                   kslgetsl:entry laddress=0x5cb40888 mode=8
  1  45881                   kslgetsl:entry laddress=0x5cb40888 mode=16
…

This one-liner creates “trigger” (probe in the DTrace language) for entry to kslgetsl() function inside the process identified by Solaris process pid. On every call of kslgetsl() it printed out first and fifth arguments – latch address, and mode. For more about the DTrace see my previous post.
Latch with address 0x5cb40888 is “session idle bit” latch in my test Oracle 10.2.0.4 Solaris X86 32 bit instance. It is clear from trace that that the mode argument took only two values:

  • 8 – “SHARED”
  • 16 – “EXCLUSIVE”

Using this I can artificially acquire the latch by oradebug. But it is dangerous to experiment with highly used “session idle bit” latch. Any wrong action with it can be fatal for instance. I will choose instead completely unused in my single-instance database ‘gcs partitioned table hash’ shared latch.

Also, I have to use 32bit Oracle ports because 64bit oradebug call utility has a bug (it corrupt call arguments) since 10.2.0.4.

SQL> select addr, name from v$latch_parent where name='gcs partitioned table hash';
ADDR     NAME
-------- --------------------------------------------------
50009BAC gcs partitioned table hash
SQL> oradebug setmypid
SQL> oradebug peek 0x50009BAC 1
[50009BAC, 50009BB0) = 00000000  -- Latch is free
/*   artificially get latch in S mode */
SQL> oradebug call kslgetsl 0x50009BAC 1 2 3 8
Function returned 1
SQL> oradebug peek 0x50009BAC 1
[50009BAC, 50009BB0) = 00000001 -- Latch holded in S mode by one process
SQL> oradebug call kslfre 0x50009BAC
Function returned 0

If several processes are simultaneously holding the shared latch, its value incremented to represent the number of holders. To simplify further demonstrations I wrote scripts shared_latch_S.sql and shared_latch_X.sql This scripts acquire the ‘gcs partitioned table hash’ latch in S and X mode respectively and hold it for 100 sec. For example:

$ sqlplus / as sysdba
SQL> set define %
SQL>                /* spawn 2 processes to acquire the latch simultaneously  */
SQL> host sqlplus /nolog @shared_latch_S.sql &
SQL> host sqlplus /nolog @shared_latch_S.sql &
...
SQL> host sleep 5
...
SQL> select ksuprpid pid,ksuprsid sid ,ksuprlat laddr,ksuprlnm name,
       ksuprlmd,ksulawhy,ksulawhr, ksulagts gets
        from x$ksuprlat;
 PID  SID LADDR    NAME                       KSUPRLMD  KSULAWHY KSULAWHR GETS
---- ---- -------- -------------------------- --------- -------- -------- ----
  18  145 50009BAC gcs partitioned table hash SHARED           4        5    3
  21  148 50009BAC gcs partitioned table hash SHARED           4        5    3
SQL> oradebug setmypid
SQL> oradebug peek 0x50009BAC 1
SQL> [50009BAC, 50009BB0) = 00000002  -- latch held by 2 processes in S mode
...

Let us try the eXclusive latch mode:

SQL> oradebug call kslgetsl 0x50009BAC 1 2 3 16
Function returned 1
SQL> oradebug peek 0x50009BAC 1
[50009BAC, 50009BB0) = 2000000E -- Latch holded in X mode by  pid=0xE

The 0x20000000 bit in the latch value is a flag for X mode, and the lower bits are the Oracle PID of holding process from v$process
In order to demonstrate latch holders and waiters I wrote another script latch_tree.sql. This script prints out the trees of processes currently holding and waiting for the latches. The script uses latch get instrumentation from x$ksuprlat (v$latchholder) and x$ksupr (v$process). Two processes that hold the shared latch simultaneously are represented by this script as:

SQL> @latch_tree
Process 18
 holding: 50009BAC "gcs partitioned table hash" SHARED lvl=6 whr=5/4 SID=145
Process 21
 holding: 50009BAC "gcs partitioned table hash" SHARED lvl=6 whr=5/4 SID=148

Predictably, like the enqueue, S and X shared latch modes are incompatible:

host sqlplus /nolog @shared_latch_X.sql &
host sleep 1
/* this second latch acquisition will block */
host sqlplus /nolog @shared_latch_S.sql &
host sleep 1
SQL> @latch_tree
Process 18
 holding: 50009BAC "gcs partitioned table hash" EXCLUSIVE lvl=6 whr=5/4 SID=148  - BLOCKER
  Process 21, waiting for: 50009BAC whr=5/4    - WAITER 

Waiting process is in "latch free" wait. Vice-versa, S mode also blocks X mode latch acquisition:

SQL> host sqlplus /nolog @shared_latch_S.sql &
SQL> host sleep 1
SQL> host sqlplus /nolog @shared_latch_X.sql &
SQL> host sleep 1
SQL> @latch_tree
Process 18
 holding: 50009BAC "gcs partitioned table hash" SHARED lvl=6 whr=5/4 SID=145 - BLOCKER
  Process 21, waiting for: 50009BAC whr=5/4  - WAITER 

This blocking was achieved by special 0x40000000 bit in the latch value:

SQL> oradebug peek 0x50009BAC 1
[50009BAC, 50009BB0) = 40000001

This bit is a flag that some incompatible latch acquisition is in progress. Lower bits of latch value still represent the number of processes currently holding the latch in SHARED mode. All further latch gets will be blocked and form a queue. This is another enqueue-like property of shared latch.

SQL> host sqlplus /nolog @shared_latch_X.sql &
SQL> host sqlplus /nolog @shared_latch_S.sql &
SQL> host sqlplus /nolog @shared_latch_S.sql &
SQL> host sqlplus /nolog @shared_latch_S.sql &
SQL> @latch_tree
Process 20
 holding: 50009BAC "gcs partitioned table hash" EXCLUSIVE lvl=6 whr=5/4 SID=143
  Process 21, waiting for: 50009BAC whr=5/4    --  WAITERS 
  Process 22, waiting for: 50009BAC whr=5/4
  Process 23, waiting for: 50009BAC whr=5/4

But wait! After 100 sec, my script will automatically release the exclusively held shared latch and the first S waiter will acquire it:

...
oradebug call kslfre 0x50009BAC
...
SQL> @latch_tree
Process 21
 holding: 50009BAC "gcs partitioned table hash" SHARED lvl=6 whr=5/4 SID=149  - SHARED BLOCKER!
  Process 22, waiting for: 50009BAC whr=5/4
  Process 23, waiting for: 50009BAC whr=5/4
SQL> oradebug peek 0x50009BAC 1
[50009BAC, 50009BB0) = 00000001

Despite the fact that the latch now acquired in S mode, the last two processes still wait for compatible S mode! These processes wait in post-wait latch queue. This can be clearly seen in the systemstate:

SQL> oradebug dump systemstate 1
...
  (latch info) wait_event=0 bits=40
    holding    (efd=4) 50009bac Parent gcs partitioned table hash level=6
        Location from where latch is held: ksliwat:add:nowait: pwq#
        Context saved from call: 4
        state=busy(shared) (val=0x1)
        waiters [orapid (seconds since: put on list, posted, alive check)]:    - Latch wait list
         22 (135, 1288683848, 135)                                                      -Waiters
         23 (135, 1288683848, 135)
         waiter count=2
...

These unfortunate processes will be woken up one by one only when the latch will be freed. Each kslfre() call will wake up one sleeping process from queue. Moreover, incoming S requests will “cut the line” and acquire shared latch, while other requests still stay in queue! I will need to write more math post about this interesting queueing discipline some day.

In summary. The above experiments show that X mode latch gets effectively serialize the shared latch. It is the X gets that form a queue, and induce long latch waits even for S mode operations. If we want to reduce the “latch free” waits for shared latch, we need to reduce the X requests frequency. In one of the next posts I will explore how different row source operations use the eXclusive “cache buffer chains” latch gets.

About these ads

3 Comments »

  1. [...] in my previous post, Shared and Exclusive Oracle latches differ significantly. Shared latch behaves like enqueue. It has S and X incompatible modes. Moreover  X mode serializes the shared latch. The [...]

    Pingback by Shared latches by Oracle version « Latch, mutex and beyond — November 27, 2010 @ 11:28 pm | Reply

  2. [...] will use oradebug call technique to explore the latch wait. Let me start with Oracle 10.2.0.4 for Solaris X86 32bit. From the first [...]

    Pingback by Hidden latch wait revolution that we missed « Latch, mutex and beyond — December 16, 2010 @ 8:26 pm | Reply

  3. [...] function to acquire latch concurrently from two sessions. The first acquisition will be in S mode. The second incompatible X mode latch acquisition will wait. Through this post, I will introduce [...]

    Pingback by Spin tales: Part 2. Shared latches in Oracle 9.2-11g « Latch, mutex and beyond — January 14, 2011 @ 6:23 pm | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 63 other followers

%d bloggers like this: