Latch, mutex and beyond

December 22, 2011

Latch Timescales

Filed under: Latch,Theory,_spin_count — andreynikolaev @ 12:44 pm

To compare old and new latch mechanisms, I found useful the following illustration. Since it is hard for us, humans, to visualize milli- and microseconds, imagine “time microscope” that zooms in timed events one million times.

Alternatively, just imagine contemporary Oracle software running on 1950th style hardware.

Such microscope will magnify the microsecond to second. One real “second” will transform to one million seconds. It takes more than 11 days. Light will travel at sonic speed. Lunar rocket will crawl like snail.

However, our CPUs will still be fast. Typical tick will take half millisecond.

Under the “time microscope” typical “Cache buffers chains” latch holding time will be around 1 second. Not surprisingly, max spin for shared latch will be of the same order. Look at the following table:

Reality:  One million zoomed time: 
  1 us  1 sec
  1 ms  17 min
  1 cs  2 hours 46 min
  1 sec  11.5 days
  Light speed (300000 km/sec)  Sonic speed (300 m/sec)
  Lunar rocket (11 km/sec)  Garden snail (11 mm/sec)
  2 GHz CPU tick  0.0005 sec
  Avg “Cache buffers chains” latch holding time   ~ 1-2 sec
  Max spin time for shared latch (~2*2000*5 ticks)   ~10 sec
  Avg “Library Cache” latch holding time (10-20us)   ~ 10-20 sec
  Max spin time for exclusive latch (~20000*5 ticks)   ~50 sec
  OS context switch time (10us-1ms)   10sec-17min

The old Oracle 8i latching algorithm repeatedly spins for 2000 cycles and waits using exponential backoff. Timeouts were increased on each second sleep upto 2 seconds (_max_exponential_sleep centiseconds):

0.01-0.01-0.01-0.03-0.03-0.07-0.07-0.15-0.23-0.39-0.39-0.71-0.71- 1.35-1.35-2.0-2.0-2.0-2.0…sec

This time series corresponds to the following exponential formula:

\text{timeout}=2^{\left[ \left( N_{wait}+1 \right)\/ 2 \right]}-1

Under the “time microscope”, old 8i latch algorithm looks like:
 Spin for latch during 5 seconds (2000 * 5 ticks / 2kHz).
  If spin not succeeded, go sleep for 2 hours 46 min! (1cs) in hope that congestion will dissolve.
 Spin again during 5 second.
  Sleep again for 2 hours 46 min.

 Spin again for 5 seconds.
  Sleep again for 23 days (2 sec)

Obviously, such exponential backoff was inefficient and not balanced well. It results in huge latch acquisition elapsed times.

Look at contemporary algorithm:
 Spin for shared latch during 10 seconds
  If spin not succeeded, go to sleep until posted by another process .

Since “going to sleep” phase took about several minutes, in most cases the process will be awakened even before it relinquish CPU. According to v$event_histogram the majority of latch waits now take less then 17 min (1ms). This is much more effective than waiting until timeout.

The another advantage of contemporary algorithm is that if one process is holding latch abnormally long, other processes do not waste CPU in attempts to acquire the latch. Oracle 8i burned 100% of server CPU if latch hang. Oracle 9.2 hangs at the same conditions.

Merry Christmas and Happy New Year!

About these ads


  1. Andrey, hi.
    “sleep until post” – what does it means?

    Comment by blacksaifer — December 23, 2011 @ 12:16 am | Reply

    • Hello!
      Sorry for typo.
      Since Oracle 9.2 latch waiters “sleep until posted by another process”.
      I corrected this.

      Comment by andreynikolaev — December 23, 2011 @ 6:02 am | Reply

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme Blog at


Get every new post delivered to your Inbox.

Join 63 other followers

%d bloggers like this: