In these pre-OOW 2013 days, everyone is excited about new features of Oracle 12c. This post describes particularly interesting one – IO outliers profiling in V$KERNEL_IO_OUTLIER.
According to Oracle Documentation and Glenn Fawcett blog we know that V$KERNEL_IO_OUTLIER uses the DTrace to collect information about the amount of time an IO spends in operating system components. DTrace is the Solaris dynamic tracing tool which allows to observe all OS activity.
V$KERNEL_IO_OUTLIER should provide an operating system-based latency breakdown of IO taking an excessive amount of time to complete. This will be essential diagnostics feature. But, many people complained that V$KERNEL_IO_OUTLIER is always empty. This was enough for me to start investigating it.
I will discuss the view, when and how it works, the underlying DTrace script and how it can be used for pre-12c Oracle.
I. Oracle and Solaris configuration.
Oracle documentation explicitly states that the view is only populated on the Solaris platform. Of course, oracle user needs the OS privileges to run DTrace:
$ su - root Password: Oracle Corporation SunOS 5.10 Generic Patch January 2005 # usermod -K defaultpriv=basic,dtrace_kernel,dtrace_proc,dtrace_user oracle #
To encourage outliers’ detection, I also lowered the parameter _io_outlier_threshold to 100ms in my Oracle 12.1 test database on VMware Solaris 10 sandbox. However, this was not enough. V$IO_OUTLIER and V$LGWRIO_OUTLIER were full of rows, but V$KERNEL_IO_OUTLIER still was empty.
It is worth to look closely at the documentation description of V$KERNEL_IO_OUTLIER columns:
Column | Description |
---|---|
TIMESTAMP |
Epoch Time |
IO_SIZE |
Size of the I/O, in KB. |
IO_OFFSET |
Offset into the device of the I/O |
DEVICE_NAME |
Name of the device to which the I/O was targeted |
PROCESS_NAME |
Name of the process that issued the I/O |
TOTAL_LATENCY |
Total time in microseconds the I/O spent in the kernel |
SETUP_LATENCY |
Time in microseconds spent during initial I/O setup before sending to SCSI target device driver |
QUEUE_TO_HBA_LATENCY |
Time in microseconds spent in the SCSI target device driver before being sent to the Host Bus Adaptor |
TRANSFER_LATENCY |
Time in microseconds spent in the Host Bus Adaptor and physically transferring the I/O to the storage device |
CLEANUP_LATENCY |
Time in microseconds spent freeing resources used by the completed I/O |
PID |
Process ID that issued the I/O |
From the above table, I can conclude that:
- The columns do not contain any information about filesystem latencies. Therefore, V$KERNEL_IO_OUTLIER was designed to work for ‘raw devices’ or ASM disks only.
- Such set of latencies describes the synchronous IO. No AIO related info is present. Therefore, V$KERNEL_IO_OUTLIER was designed to work with disk_asynch_io=false only.
These are the reasons why most people did not observed any data in this view. Combination of raw devices and synchronous IO is very unusual for Solaris. I suppose it should be used only for diagnostics.
Indeed, when I have placed redo logs on raw devices and restarted Oracle instance with disk_asynch_io=false the V$KERNEL_IO_OUTLIER starts populating:
SQL> select * from v$kernel_io_outlier; TIMESTAMP IO_SIZE IO_OFFSET DEVICE_NAME PROCESS_NAME TOTAL_LATENCY SETUP_LATENCY QUEUE_TO_HBA_LATENCY TRANSFER_LATENCY CLEANUP_LATENCY PID CON_ID ---------- ---------- ---------- --------------- --------------- ------------- ------------- -------------------- ---------------- --------------- ---------- ---------- 1379336049 156 22510592 cmdk@1,0:t,raw ora_lg00_O12 122298 43 1263 120436 555 9803 0 1379336812 0 23491584 cmdk@1,0:t,raw ora_lgwr_O12 137050 62 550 136407 30 9797 0 1379337914 806 26398720 cmdk@1,0:t,raw ora_lgwr_O12 535984 487226 489 48237 29 9797 0 1379337923 806 28894208 cmdk@1,0:t,raw ora_lgwr_O12 181463 177612 657 3167 24 9797 0 1379337926 806 29720064 cmdk@1,0:t,raw ora_lgwr_O12 849048 707903 560 140553 31 9797 0 ...
In my case, the majority of IO time has been spent inside SETUP and TRANSFER stages.
II. How it works.
First of all, we need to find the process responsible for DTrace profiling. Fixed table X$KSBTABACT lists all actions performed by Oracle background processes. We are interesting in the following:
SQL>select PROCESS_NAME_KSBTABACT,ACTION_DESCRIPTION_KSBTABACT from X$KSBTABACT; PROCESS_NAME_KSBTABACT ACTION_DESCRIPTION_KSBTABACT ---------------------- --------------------------------------------- ... GEN0 DTrace based Kernel IO Outlier Processing ... 3210 rows selected.
Therefore, this is the GEN0 process. Now we can use the DTrace to trace DTrace function calls during GEN0 operation:
$ pgrep -lf gen0 2860 ora_gen0_O12 $ /usr/sbin/dtrace -n 'pid$target::dtrace*:entry ' -p 2860 dtrace: description 'pid$target::dtrace*:entry ' matched 94 probes CPU ID FUNCTION:NAME 0 59303 dtrace_status:entry 0 59223 dtrace_aggregate_snap:entry 0 59224 dtrace_aggregate_walk:entry 0 59236 dtrace_aggregate_clear:entry 0 59303 dtrace_status:entry 0 59223 dtrace_aggregate_snap:entry 0 59224 dtrace_aggregate_walk:entry 0 59236 dtrace_aggregate_clear:entry ...
The process repeatedly reads data from some DTrace aggregate (array). The interval between reads is 3 seconds. It seems that Oracle populates the V$KERNEL_IO_OUTLIER with the results of some DTrace script running in the background.
In order to see underlying script itself, one could examine arguments of dtrace_program_strcompile() function. The GEN0 process called this function and run the compiled DTrace code immediately after its startup.
For the purpose to catch this compilation, I wrote script gen0.d:
#!/usr/sbin/dtrace -ZCws #pragma D option quiet BEGIN { printf("Waiting for GEN0 process to start\n"); init=1; } syscall::exece:return / init && execname=="oracle" && (strstr(curpsinfo->pr_psargs, "gen0")!= NULL) / { printf(" %s,\t %d\n DTrace script is:\n",curpsinfo->pr_psargs,pid); stop(); system("/usr/sbin/dtrace -Zq -x strsize=2048 -n 'pid$target::dtrace_program_strcompile:entry{self->s = arg1;}' -n 'pid$target::dtrace_program_strcompile:return{trace(copyinstr(self->s));}' -p %d || prun %d ", pid,pid); init=0; }
It will start DTracing of GEN0 process just after its creation. Let us run the script and restart the Oracle instance:
$ ./gen0.d Waiting for GEN0 process to start ora_gen0_O12, 3617 DTrace script is: syscall::pwrite:entry/execname == "oracle" && substr(strtok(curpsinfo->pr_psargs, " "), -3) == "O12" && fds[arg0].fi_fs == "specfs" && strstr(fds[arg0].fi_path ...
Now, we can investigate how V$KERNEL_IO_OUTLIER works and even run it standalone.
Briefly speaking, Oracle script measures time differences between consequentially firing syscall::pwrite:entry, io:::start, io:::wait-start, io:::wait-done, and syscall::pwrite:return DTrace probes. If the overall duration of pwrite() syscall to raw device exceeds the threshold (500 ms by default), the scripts adds data to DTrace aggregate. Oracle GEN0 process reads the aggregate and fills the V$KERNEL_IO_OUTLIER.
But DTrace can do the same itself. Script kernel_io_oulier.d consist of original V$KERNEL_IO_OUTLIER script plus additional printing probes. It takes ORACLE_SID as argument and reports IO outliers in the same format as the view:
$ ./kernel_io_outlier.d O12 TIMESTAMP IO_SIZE IO_OFFSET DEVICE_NAME PROCESS_NAME TOTAL_LATENCY SETUP_LATENCY QUEUE_TO_HBA_LATENCY TRANSFER_LATENCY CLEANUP_LATENCY PID ---------- ---------- ---------- --------------- --------------- ------------- ------------- -------------------- ---------------- --------------- -------- 1379698569 807 6185984 cmdk@1,0:t,raw ora_lg00_O12 287553 283701 492 2973 385 4348 1379698571 806 7838208 cmdk@1,0:t,raw ora_lg00_O12 601281 482628 30 118478 144 4348 1379698571 807 8664064 cmdk@1,0:t,raw ora_lg01_O12 927317 580352 397 346413 153 4352 ...
The script can be used to “extend” V$KERNEL_IO_OUTLIER diagnostic functionality to pre-12c Oracle databases.
Summary.
Currently as of Oracle 12.1 the V$KERNEL_IO_OUTLIER works only on Solaris platform, for raw devices and synchronous IO. But I hope that this is only the beginning, and in future Oracle database and DTrace will become tightly integrated.
Hi Andrey,
thank you very much for the re-engineering of view V$KERNEL_IO_OUTLIER. I think it is very obvious now, why the view does not contain any data in most cases.
However i interpreted the following column description (from the official documentation) “Time in microseconds spent during initial I/O setup before sending to SCSI target device driver” that this column can include the time from VFS to block devices (next step would be SCSI call), which makes more sense and could be realized by probes on zfs_read, zfs_write, zio_wait, zio_vdev_io_done, vdev_disk_io_done, io:::start, scsi, etc.
Hopefully Oracle will extend this view / feature in the future with such probes as we all know how ZFS can become slow in various cases :-))
Thanks once again.
Regards
Stefan
Comment by Stefan Koehler — September 21, 2013 @ 8:04 am |
Hi,
I adopted the script for ZFS, see http://blog.zach.st/2013/10/detect-slow-sync-writes-on-ZFS.html
Maybe it’s useful for you.
Regards,
Manuel
Comment by Manuel Zach (@mzachh) — October 4, 2013 @ 7:14 pm |
Thats just awesome! I am very impressed.
Is the script modified? Or 1:1 read from the GEN0 process. I am just surprised why it uses only the last 3 chars of the SID (“substr(strtok(curpsinfo->pr_psargs, ” “), -3)”) and not the whole SID.
Comment by Manuel Zach (@mzachh) — September 27, 2013 @ 6:34 pm |
The part of the script before /*————*/ line is almost original.
I only added the outlier flag variable for printing, replaced literal for ORACLE_SID by $$1 argument and rescaled the units to microseconds and kilobytes.
I suppose that Oracle uses only the last 3 chars of the SID for the performance reasons.This was wrong.
Just discovered that Oracle generates here literal number equal to length of ORACLE_SID.
I corrected the script.
Thank you for this observation.
Comment by andreynikolaev — September 28, 2013 @ 12:32 pm |
[…] v$kernel_io_outlier table. And this is were i gave up.. until Google found this amazing post from Andrey Nikolaev He had already solved the parts i didn’t manage to resolve in a very good manner He also […]
Pingback by The v$KERNEL_IO_OUTLIER | The Oracle DB inn — May 21, 2014 @ 1:38 pm |
[…] we can see this in action using V$KERNEL_IO_OUTLIER view.Like explained by Andrey Nikolaev in https://andreynikolaev.wordpress.com/2013/09/20/vkernel_io_outlier/ this view is based on a DTrace script running in the background.He also managed to rewrite the […]
Pingback by Simulating V$KERNEL_IO_OUTLIER on LINUX using Systemtap | Hatem Mahmoud Oracle's blog — February 25, 2015 @ 12:39 pm |