Latch, mutex and beyond

October 28, 2010

Appetizer for DTrace

Filed under: DTrace,Latch — andreynikolaev @ 3:33 pm

To discover how the Oracle latch works, we need the tool. Oracle Wait Interface allows us to explore the waits only. Oracle X$/V$ tables instrument the latch acquisition and give us performance counters. To see how latch works through time and to observe short duration events, we need something like stroboscope in physics. Likely such tool exists in Oracle Solaris. The DTrace, Solaris 10 Dynamic Tracing framework!

Here I would like to give brief, Oracle DBA inclined into to some of DTrace topics. Tanel Poder, James Morle , Dough Burns were used the DTrace for performance diagnostics for years. But it is still not popular as should be in our DBA community. One of the problems is another “language”. The best DTrace presentations talk about “probes”, “actions”, unfamiliar Solaris kernel structures, etc… Begging pardon to the DTrace inventors, I will use more database-like terminology here.

I will not even try to touch the most popular DTrace usage. Anyone who’s interested in this revolutionary technology, should read DTrace introduction, DTrace user guide,the upcoming DTrace book and DTrace community materials.

DTrace is event-driven instrumentation of Solaris kernel and user applications. This is the key point. No application change is needed to use DTrace. This is very similar to triggers in Oracle database. You define the probe (trigger ) to fire on event, and write the action (body) to execute. I will use this analogy in this post.

A probe is a point of instrumentation made available by a provider. A provider has analogy in Oracle trigger type (system/user triggers, DML, DDL, etc…). Officially, the provider represents a methodology for instrumenting the system.

Popular Solaris providers are pid,syscall, sysinfo, … As of now (11.2.0.2), we are still waiting for Oracle specific providers to integrate Oracle Server and Oracle DTrace. But we can do a lot with a generic pid provider, which allows to set triggers on any function call in user application. My goal is to see the latch operations on-the-fly.

The DTrace describes the triggering probe in a four field format: provider:module:function:name. If one need to set trigger inside the oracle process with Solaris spid 16444, to fire on entry to function kslgetl (get exclusive latch), the probe description will be pid16444:oracle:kslgetl:entry

Surprisingly this is enough to start use the DTrace with Oracle. Suppose, I would like to see latch acquisitions by MMON process . In my database the MMON process currently has spid 16444. Ask your Solaris SysAdmin for dtrace_user privilege and type:

$/usr/sbin/dtrace  -n 'pid16444:oracle:kslgetl:entry'
dtrace: description 'pid16444:oracle:kslgetl:entry' matched 1 probe
CPU     ID                    FUNCTION:NAME
 1  67480                    kslgetl:entry
 1  67480                    kslgetl:entry
 0  67480                    kslgetl:entry
…
<ctrl-C>

This simple DTrace one-liner traces calls of kslgetl() function and shows how the process migrates between CPUs.

How it works? Unlike standard tracing tools, DTrace works in Solaris kernel. When I activated this probe, dtrace set trigger at the entry to kslgetl function. When oracle process entered this function, the execution went to Solaris kernel and the DTrace filled buffers with the data. The dtrace program printed out these buffers.

Let us compare the DTrace and obsoleted Oracle Trace. Both were event driven. Otrace tried to catch all the evens in instance, the DTrace catch only what you asked for. Otrace allowed to set filters, in the DTrace you write the program. Otrace was fully userland, DTrace works in the OS kernel.

Kernel based tracing is much more stable and have less overhead then userland. DTrace sees all the system activity and can take into account the ‘unaccounted for’ time associated with kernel calls, scheduling, etc.

Actions (trigger bodies!) are what happen when a probe is hit. Actions are fully programmable using D language, which will be familiar to anybody who ever used C and awk. Action code enclosed in curly brackets {} and could use arguments of function call as arg0, arg1, etc….

Naturally, the next step in our DTrace latch tracing is to see the latch function arguments. It is easy to write such a script (ksl_args.d). Remember that Oracle acquires exclusive latches using kslgetl(laddr, wait, why, where), and shared latches using kslgetsl(laddr, wait, why, where, rs) (ksl_get_shared_latch() in 11g):

#!/usr/sbin/dtrace -Zs
#pragma D option quiet
pid$target::kslgetl:entry
{
printf("%s(0x%X,wait=%d,why=0x%X,whr=%d)\n",probefunc,arg0,arg1,arg2,arg3);
}

pid$target::kslgetsl:entry,
pid$target::ksl_get_shared_latch:entry
{
printf("%s(0x%X,wait=%d,why=0x%X,whr=%d,rs=%d)\n",probefunc,arg0, arg1,arg2,arg3,arg4);
}
pid$target::kslfre:entry
{
printf(" %s(0x%X)\n",probefunc,arg0);
}

The script probes (triggers) will fire on each entry to latch acquisition functions. Printf()’s inside the trigger bodies (actions) will print out arguments of these functions to the dtrace kernel buffers. $target macro will be replaced at runtime by spid number from -p script oprion. And this is the output:


$ ./ksl_args.d  -p 16444
kslgetl(0x38000CC98,wait=1,why=0x0,whr=175)
kslfre(0x38000CC98)
ksl_get_shared_latch(0x38DE6DD00,wait=1,why=0x38DE6DCB8,whr=290,rs=16)
kslfre(0x38DE6DD00)
kslgetl(0x38BE4C328,wait=1,why=0x0,whr=3487)
kslfre(0x38BE4C328)
kslgetl(0x38BE4C328,wait=1,why=0x0,whr=3510)
...

One can focus on the particular latch using predicate (WHEN clause !). Predicate takes the form of / … / just before action code. The probe will fire only when the predicate evaluates to true.
Look at my test Oracle instance suffered from “transaction allocation” latch contention.

select addr,latch#,name,gets,misses,sleeps,spin_gets,wait_time  from v$latch_parent
where name='transaction allocation' 
ADDR LATCH# NAME GETS MISSES SLEEPS SPIN_GETS WAIT_TIME
50010AEC 180 transaction allocation 520710921 387182198 74543 387117330 380364770

To see what happens with this latch I used the script:

#!/usr/sbin/dtrace -Zs
#pragma D option quiet
pid$target::kslgetl:entry
/ arg0 == 0x50010AEC /
{
printf("%s(0x%X,wait=%d,why=0x%X,whr=%d)\n",probefunc,arg0,arg1,arg2,arg3);
}

pid$target::kslfre:entry
/ arg0 == 0x50010AEC /
{
printf(" %s(0x%X)\n",probefunc,arg0);
}
...
kslgetl(0x50010AEC,wait=1,why=0x0,whr=2098)
 kslfre(0x50010AEC)
...

Latch “where“=2098 mean “ktcxbr: parent” in x$ksllw. Oracle acquire parent “transaction allocation” latch at this “where” during select from v$transaction. Concurrent select from this fixed view is indeed the root cause of latch contention.

And finally I would like to show the latch in memory itself. No problem. On entry to kslgetl the arg0 is the latch address. I will check this location on entry and return from latch functions. The only thing is to remember that DTrace probe acts in kernel address space. You need to copy the latch value from user address space into kernel buffer using copyin(user_address,size) DTrace function:

#!/usr/sbin/dtrace -Zs
#pragma D option quiet

pid$target::kslgetl:entry,
pid$target::kslfre:entry
/ arg0 == 0x50010AEC /
{
laddress = arg0;                           /* save laddress */
latch= *(uint32_t *)copyin(laddress, 4);   /* copy latch value from user space*/
printf("%s(0x%X...) \tlatch=0x%X (entry),",probefunc,arg0,latch);
}

pid$target::kslgetl:return,
pid$target::kslfre:return
/ laddress /
{
latch= *(uint32_t *)copyin(laddress, 4);   /* copy latch value from user space*/
printf(" 0x%X (return)\n",latch);
laddress =0;
}
...
kslgetl(0x50010AEC...)  latch=0x00 (entry),  0xFF (return)
 kslfre(0x50010AEC...)  latch=0xFF (entry),  0x00 (return)
...
About these ads

7 Comments »

  1. [...] first and fifth arguments – latch address, and mode. For more about the DTrace see my previous post. Latch with address 0x5cb40888 is “session idle bit” latch in my test Oracle 10.2.0.4 Solaris [...]

    Pingback by Shared latch behaves like enqueue « Latch, mutex and beyond — November 17, 2010 @ 10:16 pm | Reply

  2. [...] a big difference between not setting _spin_count and setting it to 2000. Oracle Solaris has unique DTrace facility, which allows us to trace function flow. I will experiment with Oracle 10.2.0.4 for [...]

    Pingback by Spin tales: Part 1. Exclusive latches in Oracle 9.2-11g « Latch, mutex and beyond — January 6, 2011 @ 9:59 pm | Reply

  3. Hi Andrey,

    have you done similar test in AIX using procstack?

    Thanks

    Comment by antony — October 17, 2012 @ 5:49 pm | Reply

    • I think you are asking about AIX Probevue?
      Yes, the simple scripts from this post can be rewritten for Probevue.

      Comment by andreynikolaev — October 17, 2012 @ 6:25 pm | Reply

  4. […] information about the amount of time an IO spends in operating system components. DTrace is the Solaris dynamic tracing tool which allows to observe all OS […]

    Pingback by V$KERNEL_IO_OUTLIER | Latch, mutex and beyond — September 20, 2013 @ 9:11 pm | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 68 other followers

%d bloggers like this: