To compare old and new latch mechanisms, I found useful the following illustration. Since it is hard for us, humans, to visualize milli- and microseconds, imagine “time microscope” that zooms in timed events one million times.
Alternatively, just imagine contemporary Oracle software running on 1950th style hardware.
Such microscope will magnify the microsecond to second. One real “second” will transform to one million seconds. It takes more than 11 days. Light will travel at sonic speed. Lunar rocket will crawl like snail.
However, our CPUs will still be fast. Typical tick will take half millisecond.
Under the “time microscope” typical “Cache buffers chains” latch holding time will be around 1 second. Not surprisingly, max spin for shared latch will be of the same order. Look at the following table:
|Reality:||One million zoomed time:|
|1 us||1 sec|
|1 ms||17 min|
|1 cs||2 hours 46 min|
|1 sec||11.5 days|
|Light speed (300000 km/sec)||Sonic speed (300 m/sec)|
|Lunar rocket (11 km/sec)||Garden snail (11 mm/sec)|
|2 GHz CPU tick||0.0005 sec|
|Avg “Cache buffers chains” latch holding time||~ 1-2 sec|
|Max spin time for shared latch (~2*2000*5 ticks)||~10 sec|
|Avg “Library Cache” latch holding time (10-20us)||~ 10-20 sec|
|Max spin time for exclusive latch (~20000*5 ticks)||~50 sec|
|OS context switch time (10us-1ms)||10sec-17min|
The old Oracle 8i latching algorithm repeatedly spins for 2000 cycles and waits using exponential backoff. Timeouts were increased on each second sleep upto 2 seconds (_max_exponential_sleep centiseconds):
This time series corresponds to the following exponential formula:
Under the “time microscope”, old 8i latch algorithm looks like:
Spin for latch during 5 seconds (2000 * 5 ticks / 2kHz).
If spin not succeeded, go sleep for 2 hours 46 min! (1cs) in hope that congestion will dissolve.
Spin again during 5 second.
Sleep again for 2 hours 46 min.
Spin again for 5 seconds.
Sleep again for 23 days (2 sec)
Obviously, such exponential backoff was inefficient and not balanced well. It results in huge latch acquisition elapsed times.
Look at contemporary algorithm:
Spin for shared latch during 10 seconds
If spin not succeeded, go to sleep until posted by another process .
Since “going to sleep” phase took about several minutes, in most cases the process will be awakened even before it relinquish CPU. According to v$event_histogram the majority of latch waits now take less then 17 min (1ms). This is much more effective than waiting until timeout.
The another advantage of contemporary algorithm is that if one process is holding latch abnormally long, other processes do not waste CPU in attempts to acquire the latch. Oracle 8i burned 100% of server CPU if latch hang. Oracle 9.2 hangs at the same conditions.
Merry Christmas and Happy New Year!