Several months have passed since my previous “mutex wait” post. I was so busy with work and conference presentations. Thanks to all my listeners at UKOUG2011, Hotsos2012 and Medias2012 conferences and several seminars for inspiring questions and conversations.
I. Unexpected change.
Now it is time to discuss how contemporary Oracle waits for mutexes. My previous posts described evolution of “invisible and aggressive” 10.2-11.1 mutex waits into fully accounted and less aggressive 11gR2 mutexes. Surprisingly Oracle 126.96.36.199.2 (or 188.8.131.52 PSU2) appeared in April 2011 demonstrated almost negligible CPU consumption during mutex waits. (more…)
A week ago I was back home from MEDIAS-2012 conference. It was held in Limassol (Cyprus), near the spectacular ruins of ancient city Amathus. This was unique style general Computer Science conference with speakers including legendary Soviet cosmonaut Alexandr Serebrov and the inventor of Mean Value Analysis Professor Martin Reiser.
In my experience the Reiser’s Law stating that “Software is getting slower more rapidly than hardware becomes faster” has been repeatedly illustrated by numerous performance problems I observe.
RDTeX presentations covered Oracle topics ranged from Data Warehousing by Mikhail Kozyr to Oracle Coherence by Alexei Zolotarev.
This conference gave me unique opportunity to discuss mathematics related to Oracle mutexes. If you are interested, you can download my presentation here.
And, of course, Cyprus is a great place to visit!
Thanks to Professor S.V. Klimenko for kindly inviting me to MEDIAS 2012 conference.
Thanks to RDTEX for financial support.
Thanks to RDTEX Technical Support Centre Director S.P. Misiura for years of encouragement and support of my investigations.
I would like to describe how Oracle versions 184.108.40.206-220.127.116.11.1 waited for mutexes. This algorithm also appears to be used in post-18.104.22.168.2 PSUs and new 22.214.171.124 patchset as _mutex_wait_scheme=0.
My previous post demonstrated that before version 11.2:
- “Cursor: pin S” was pure wait for CPU. Long “cursor: pin S” waits indicated CPU starvation.
- Mutex contention was almost invisible to Oracle Wait Interface
- Spin time to acquire mutex was accounted as CPU time. It was service time, not waiting time.
Things changed. Mutex waits in Oracle 11.2 significantly differ from previous versions. Contemporary mutex waits are not CPU aggressive anymore, completely visible to Oracle Wait Interface and highly tunable.
Oracle 126.96.36.199 contains enhancements 9282521 and 9239863 named “Library cache: mutex X” for objects highly contended for. Part I and II. These enhancements introduce new interesting possibilities to tune some types of the mutex contention.
Contention for heavily accessed objects can now be divided between multiple copies of object in the library cache. According to notes 9282521.8 and 9239863.8 describing the patches, the enhancements should be used:
When there is true contention on a specific library cache object….
Let me investigate this deeper. I will use Oracle 188.8.131.52.2 for Solaris SPARC (64-bit) on 8Cores/32Threads Sun Fire T2000. I chose this platform in order to emphasize how the enhancements work. (more…)
I like testcases. One testcase results in more understanding than ten page article or weeks of data collection. This is why we need reproducible testcases if we want to explore mutex contention. Testcases will also give me a possibility to demonstrate how to use mutex contention diagnostics tools embedded in Oracle. I will use Oracle 184.108.40.206 for Linux X86 32bit on my Dual-Core laptop in this posts. Your numbers for other Oracle versions and platforms may vary.
I. “Cursor: pin S” contention testcase:
Each time the session execute SQL operator, it needs to ‘pin’ the cursor in library cache using mutex. True mutex contention arises when the same SQL operator executes concurrently at high frequency. Therefore the simplest testcase for “Cursor: pin S” contention should look like:
for i in 1..1000000
execute immediate 'select 1 from dual where 1=2';
Many people asked me about the second part of my blog title – the mutex. This is the first post about it. Mutexes is another Oracle spinlock, which was appeared in version 10.2.0.2. Despite being known since 2005, Oracle mutex internals is still Terra incognita.
This post is inspired by several recent escalations due to mutex contention. It occurs that 220.127.116.11 patchset contains extraordinary number of mutex related changes. Some enhancements like 10411618 exist only for 18.104.22.168. The following patches even changed the mutex architecture: (more…)
As I described in my previous post, Shared and Exclusive Oracle latches differ significantly. Shared latch behaves like enqueue. It has S and X incompatible modes. Moreover X mode serializes the shared latch. The contention for shared and exclusive latches has different patterns. This leads to different methods to tune such contentions.
But we do not know which latches are shared. Oracle never published the list of shared latches. Every time looking in the AWR or Statspack report we had to guess the type of contending latch. We only know that “cache buffer chains” latches became shared since in 9.2.
Oracle executable internally determines that latch is shared using flag hidden somewhere in x$kslld (v$latchname) structure. Google search shows that KSLLD means [K]ernel [S]ervice [L]ock [L]atch [D]escriptor. Unfortunately this shared flag was not externalized to SQL. It is possible to check the flag manually using oradebug peek or DTrace. But the flag offset is version and platform dependent. We need more systematic way to determine the latch type.
Contrary to popular believe Oracle latches were significantly evolved through the last decade. Not only additional statistics appeared (and disappeared) and new (shared) latch type was introduced, the latch itself was changed
It is interesting to see how the latch was organized in the past and contemporary versions.
To see the latch in-memory seems hard, since latches typically held for a small amount of time. Hopefully Oracle gives us a possibility to call any its kernel function using oradebug call utility. We only need to know that Oracle itself uses kslgetl(laddr, wait, why, where) function to acquire the exclusive latch. Recently I blogged about this function and its parameters.
This function can be also used to artificially acquire any latch for demonstration by:
SQL> oradebug call kslgetl <latch address> <wait> <why> >where>
This post will not be directly related to the blog theme. I would like to discuss “db file async I/O submit” wait event. This new event was introduced in Oracle 11.2. So far it have not been described in Oracle documentation and Metalink.
At the beginning of this story, this event became the topmost background wait for one production instance under HP-UX:
%Time Total Wait wait Waits % bg
Event Waits -outs Time (s) (ms) /txn time
-------------------------- ------------ ----- ---------- ------- -------- ------
db file async I/O submit 151,159 0 35,625 236 0.7 96.3
log file parallel write 427,728 0 308 1 2.0 .8
This looks mystique. HP-UX not supports AIO for filesystem at all!