Latch, mutex and beyond

March 25, 2012

Database under fire

Filed under: Corruption,Patches — andreynikolaev @ 6:58 pm

This post does not discuss mutexes or latches. I wrote it under the impression of recent escalation due to database corruption.

This was Oracle 10.2.0.4 database on HP-UX. Database files underwent recurrent corruption. Most commonly blocks were corrupted in system and undo tablespaces. This resulted in corrupted dictionary tables and indexes. Neither db_block_checking/db_block_checksum nor installation of latest PSU and OS patches helped. SCNs showed that these dictionary blocks were not changed for years. It looks likes some program other than DBWR overwrites the blocks. Corruptions caused downtimes to restore and recover datafiles.

The corruption always started with the same byte pattern: “00 0B 00 00 0C 00 00 00 01 00 01″. This not resembled any known for me Oracle database structure. Many thanks to customer DBA, who found the same bytes in discussion of network sniffer output. It was identified as NSPTMK packet. Being definitely related to Oracle SQL*Net, what it is doing in the middle of data block?

MOS search found interesting Bug 8943287 ORA-1578 corrupt block with SQL*Net AUTH strings [ID 976852.1]. The note states that SQL*Net may overwrite database files with sqlnet packets when sqlnet.inbound_connect_timeout is greater than 0 in the server sqlnet.ora file (the default value is 60).

This mean that under some (rare) circumstances, the shadow Oracle process may mix up its file descriptors and write() the next SQL*Net package … into file instead of socket. Such sniper shot will corrupt one of previously opened datafiles starting from the position of last read. This may be the controlfile also.

If you are running pre-11.2.0.2 Oracle database on any *nix except Solaris, look on your server sqlnet.ora file. If it is absent or not contain sqlnet.inbound_connect_timeout=0 then your database may be vulnerable to this problem. It is worth to apply the patch 8943287 or set sqlnet.inbound_connect_timeout=0 as workaround. Such workaround may require increase of processes parameter.

This problem has been named “most up-to-date high impact and urgent issue” in note
“Information Center: Oracle Net Services” [ID 1381244.2].

About these ads

3 Comments »

  1. Oracle support alerts have to be paied attention. Recently we hit block corruption which caused db crash and downtime. The problem was already discovered and alerted by Oracle in the alert “Bug 7662491 – Array Update can corrupt a row. ORA-600 [kghstack_free1] ORA-600 [kddummy_blkchk][6110/6129] [ID 861965.1]”

    The complicated and alerting too part of our case was that we have hit the problem half a year before but the db did not crash at that time. We opened SR but it went nowehere. We were left with block on which dbv and dbms_repair would fail but the support was not able to provide the solution nor identify we are hitting the Bug 7662491.

    Moral of the story is: Oracle alerts few really critical bugs each year. Not so many given the ever growing size and complexity of the software. It is better to patch than not nowdays.

    Comment by laimisnd — March 26, 2012 @ 9:05 am | Reply

  2. Hi,

    it’s quite shocking that such a thing is even possible… thanks for sharing this!

    Best regards,
    Nikolay

    Comment by savvinov — March 26, 2012 @ 12:58 pm | Reply

  3. Hi, Andrey

    I’ve seen the similar corruption on IBM DS4800.
    One datafile were interlaced with the step 64K, 64K from this datafile, next 64K from other datafile.
    This issue become after storage maintenace .
    So, the storage software chessed every 64K.

    Comment by Yury — March 29, 2012 @ 12:28 pm | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 63 other followers

%d bloggers like this: