Latch, mutex and beyond

December 16, 2010

Hotsos Symposium 2011

Filed under: Uncategorized — andreynikolaev @ 9:33 pm

I will be speaking at Hotsos Symposium 2011 which will be held on March 6 — 10, 2011 in Dallas.

Of course, I will speak about “Contemporary Latch Internals“. I would like to post my abstract here:

“Latches in Oracle Database were radically changed during the last 10 years to achieve higher levels of concurrency and performance. Using oradebug and DTrace, this presentation explores how the latch works. Contemporary latch does not use exponential backoff, most of them spin 10 times more then expected, and they may wait for an infinite time in a queue.

This presentation will show how Oracle instruments the latch operations, discuss parameters and statistics related to latch performance diagnostics, and fine grain tuning. It will also explore the long-standing question: when and how to tune the “_spin_count” and “_latch_classes“. DTrace provides additional capabilities to measure effectiveness and statistical properties of latch in production.”

Some parts of this abstract were appeared in my blog and discussed in seminars. There will be many new topics!
The Hotsos Symposium is the unique conference dedicated to Oracle performance. Hopefully to see you there!

Hidden latch wait revolution that we missed

Filed under: History,Latch,shared latch — andreynikolaev @ 8:25 pm

The way an Oracle process waits for a latch. This seems trivial. Oracle 11.2 documentation explicitly states about “latch free” wait event:
… The wait time increases exponentially and does not include spinning on the latch (active waiting). The maximum wait time also depends on the number of latches that the process is holding. There is an incremental wait of up to 2 seconds…

In this post, I would like to show that this exponential backoff was obsoleted in Oracle 9.2. Since that time, all the Oracle versions use completely new latch wait posting mechanism and FIFO queuing for all, except one latch.
Oracle no longer uses “repeatedly spin and sleep” approach. The process spins and waits only once. The pseudo code for contemporary latch acquisition should looks like:

  Immediate latch get 
    Spin latch get 
       Add the process to the queue of latch waiters
          Sleep until posted


November 23, 2010

Shared latches by Oracle version

Filed under: 11.2,Latch,shared latch,Summary tables — andreynikolaev @ 6:13 pm

As I described in my previous post, Shared and Exclusive Oracle latches differ significantly. Shared latch behaves like enqueue. It has S and X incompatible modes. Moreover  X mode serializes the shared latch. The contention for shared and exclusive latches has different patterns. This leads to different methods to tune such contentions.

But we do not know which latches are shared. Oracle never published the list of shared latches. Every time looking in the AWR or Statspack report we had to guess the type of contending latch. We only know that “cache buffer chains” latches became shared since in 9.2.

Oracle executable internally determines that latch is shared using flag hidden somewhere in x$kslld (v$latchname) structure. Google search shows that  KSLLD means  [K]ernel [S]ervice [L]ock [L]atch [D]escriptor. Unfortunately this shared flag was not externalized to SQL. It is possible to check the flag manually using oradebug peek or DTrace. But the flag offset is version and platform dependent. We need more systematic way to determine the latch type.

November 17, 2010

Shared latch behaves like enqueue

Filed under: DTrace,Latch,shared latch — andreynikolaev @ 10:16 pm

We know a lot about the exclusive latches. This is Oracle realization of TTS spinlocks. Only one process at a time can hold the exclusive latch. Other processes compete for the exclusive latch by spinning. If process can not get the latch by spinning, it will wait until the latch becomes free.

But since the version 8.0 Oracle had another spinlock – shared latch. This is a realization of “Read-Write” spinlocks. Such spinlocks can be held by several “reader” processes simultaneously in SHARED mode. But if the process needs to write to the protected structure it must acquire RW spinlock in EXCLUSIVE mode. This mode prevents any concurrent access to the latch.

From version to version the number of shared latch increased. Several widely known latches like “session idle bit”, “In memory undo latch”. “Result Cache: RC Latch”, “resmgr group change latch” are shared.

Famous “cache buffers chains” latch was also became shared in Oracle 9.2. We usually react on “cache buffers chains” latch contention finding ineffective SQL plans and “hot blocks”. Recently Kyle Hailey posted an excellent graphical explanation of Oracle mechanics related to “latch: cache buffers chains” contention. But it always was a mystery to me why the sessions have to wait for SHARED latch during READ operations like searching the hash chains. Other busy shared latches like “session idle bit” do not experience such contention. This is why I would like to dive deeper into shared latch internals.

October 28, 2010

Appetizer for DTrace

Filed under: DTrace,Latch — andreynikolaev @ 3:33 pm

To discover how the Oracle latch works, we need the tool. Oracle Wait Interface allows us to explore the waits only. Oracle X$/V$ tables instrument the latch acquisition and give us performance counters. To see how latch works through time and to observe short duration events, we need something like stroboscope in physics. Likely such tool exists in Oracle Solaris. The DTrace, Solaris 10 Dynamic Tracing framework!

Here I would like to give brief, Oracle DBA inclined into to some of DTrace topics. Tanel Poder, James Morle , Dough Burns were used the DTrace for performance diagnostics for years. But it is still not popular as should be in our DBA community. One of the problems is another “language”. The best DTrace presentations talk about “probes”, “actions”, unfamiliar Solaris kernel structures, etc… Begging pardon to the DTrace inventors, I will use more database-like terminology here.

August 24, 2010

Exclusive latches in memory. Oracle versions 7-11.2

Filed under: 11.2,Latch,Uncategorized — andreynikolaev @ 8:54 am

Contrary to popular believe Oracle latches were significantly evolved through the last decade. Not only additional statistics appeared (and disappeared) and new (shared)  latch type was introduced,  the latch  itself was changed

It is interesting to see how the latch was organized in the past and contemporary  versions.

To see the latch in-memory seems hard, since latches typically held for a small amount of time. Hopefully Oracle gives us a possibility to call any its kernel function using oradebug call utility. We only need to know that Oracle itself uses kslgetl(laddr, wait, why, where) function to acquire the exclusive latch. Recently I blogged about this function and its parameters.

This function can be also used to artificially acquire any latch for demonstration by:

SQL> oradebug call kslgetl <latch address> <wait> <why> >where>


July 29, 2010

Strange “db file async I/O submit” wait event

Filed under: 11.2,Uncategorized — andreynikolaev @ 5:05 pm

This post will not be directly related to the blog theme. I would like to discuss “db file async I/O submit” wait event. This new event was introduced in Oracle 11.2. So far it have not been described in Oracle documentation and Metalink.

At the beginning of this story, this event became the topmost background wait for one production instance under HP-UX:

                                        %Time Total Wait    wait    Waits   % bg
Event                             Waits -outs   Time (s)    (ms)     /txn   time
-------------------------- ------------ ----- ---------- ------- -------- ------
db file async I/O submit        151,159     0     35,625     236      0.7   96.3
log file parallel write         427,728     0        308       1      2.0     .8

This looks mystique. HP-UX not supports AIO for filesystem at all!

July 11, 2010

Latch get and spin instrumentation. The unknown knowns. V2

Filed under: Instrumentation,Latch — andreynikolaev @ 12:11 am
Top 5 Timed Events
~~~~~~~~~~~~~~~~~~                                                     % Total     Waits Time (s)
Event                                               Waits    Time (s) Ela Time   per sec  per sec
-------------------------------------------- ------------ ----------- -------- --------- --------
enqueue                                         1,801,215   3,281,392    59.82     499.9   910.74
buffer busy waits                               1,984,703   1,235,865    22.53     550.8   343.01
latch free                                      6,425,043     847,386    15.45   1,783.2   235.19
SQL*Net break/reset to client                      50,394      35,937      .66      14.0     9.97
CPU time                                                       23,828      .43               6.61

This is the statspack report for instance suffered from heavy latch contention.
Every time I saw such CPU bound Oracle instance with latch contention, I asked myself. Which part of this CPU power is currently burned for useless latch spin attempts? How many processes spin for the latch? How can we estimate this?

Unfortunately I still do not have contemporary answer yet. But in this post I would like to show that we had had such estimations before 11g.

We all do know that latch wait is instrumented well in Oracle wait interface. Oracle 11.2 has 32 specific latch wait events and one general ‘latch free’. But all these events are only for latch sleeps. Oracle Wait Interface don’t know anything about latch gets and spins.

It occurs that Oracle had instrumented the latch acquisition also, and even documented it. I do not know why it is not popular enough. This instrumentation resides in process array v$process. The fixed table behind v$process view is x$ksupr.

Of course, my post is about v$process.latchwait and v$process.latchspin. (more…)

April 12, 2010

Latch get “where” and “why”.

Filed under: Uncategorized — andreynikolaev @ 7:48 pm

It is widely known that the Oracle server uses kslgetlKernel Service Lock Management Get Latch function to acquire the latch. In 2006 Tanel Poder first demonstrated that  oradebug call  kslgetl/kslfre can be used to acquire the latch manually. This is very useful  to simulate latch related hangs and contention.

For several years it was commonly supposed that kslgetl() has two parameters – latch address and wait. But on AIX 5L we have unique procstack tool. This tool shows the actual number of parameters for function. It occurs that kslgetl has four parameters:

sskgpwwait(??, ??, ??, ??, ??) + 0x38
skgpwwait(??, ??, ??, ??, ??) + 0xbc
kslges(??, ??, ??, ??, ??) + 0x54c
kslgetl(??, ??, ??, ??) + 0x33c
ksfglt(??, ??, ??, ??, ??) + 0x198

We do know the meaning the first two of parameters. What about the others? (more…)

October 4, 2009

Oracle latches and general Spinlocks

Filed under: Latch,Spinlock,Theory — andreynikolaev @ 7:45 am
Tags: ,

First of all we need to describe common background. According to Oracle® Database Concepts 11g Release 2 (11.2) latch is: “A simple, low-level serialization mechanism to protect shared data structures in the System Global Area.”

Latches and mutexes are the Oracle proprietary realizations of general spinlock concept. Later I will show that Oracle latch is one of the simplest and “obsolete” spinlock – TTS plus wait. Here I will describe where (and why) latch is placed in general spinlock theory. (more…)

« Previous PageNext Page »

Create a free website or blog at