corrupt block in ASM disk

lsllcm

unread,

Apr 28, 2011, 4:10:05 AM4/28/11

to

Hi All,

I meet one corrupt block issue in ASM disk. Below is replicate steps:

1. create tablespace
create tablespace aa_data
datafile
'+DATA/dbs11g/aa_data01.dbf' size 20M
EXTENT MANAGEMENT LOCAL AUTOALLOCATE
SEGMENT SPACE MANAGEMENT AUTO
/

2. It prompts the message:
ORA-01119: error in creating database file '+DATA/dbs11g/
aa_data01.dbf'
ORA-17502: ksfdcre:4 Failed to create file +DATA/dbs11g/aa_data01.dbf
ORA-15130: diskgroup "DATA" is being dismounted
ORA-15066: offlining disk "DATAVOL1" may result in a data loss

3. check alert.log
WARNING: IO Failed. group:1 disk(number.incarnation):0.0xe96892e8
disk_path:ORCL:DATAVOL1
AU:2 disk_offset(bytes):2097152 io_size:4096 operation:Read
type:synchronous
result:I/O error process_id:11679
WARNING: cache failed reading from group=DATA fn=1 blk=0 count=1 from
disk= 0 DATAVOL1 kfkist=0x20 status=0x02 file=kfc.c line=10225
ERROR: cache failed to read group=DATA fn=1 blk=0 from disk(s): 0
DATAVOL1
ORA-15080: synchronous I/O operation to a disk failed
System State dumped to trace file /u01/app/grid/diag/asm/+asm/+ASM/
trace/+ASM_ora_11679.trc

4. check amdu log
/u01/app/grid/diag/asm/+asm/+ASM/trace/ amdu_2011_04_26_17_13_28
---------------------------- SCANNING DISK N0002
-----------------------------
Disk N0002: 'ORCL:DATAVOL1'
AMDU-00407: asmlib error!! function = [asm_close], error = [0], mesg =
[I/O Error]
AMDU-00200: Unable to read [262144] bytes from Disk N0002 at offset
[2097152]
AMDU-00201: Disk N0002: 'ORCL:DATAVOL1'
Allocated AU's: 3
Free AU's: 0
AU's read for dump: 2
Block images saved: 512
Map lines written: 2
Heartbeats seen: 0
Corrupt metadata blocks: 0
Corrupt AT blocks: 0

5. check dmesg
dmesg|more

Info fld=0x1fa81d1, Current sda: sense key Medium Error
Additional sense: Data synchronization mark error
end_request: I/O error, dev sda, sector 33194449
scsi6: ERROR on channel 0, id 0, lun 0, CDB: Read (10) 00 01 fa 81 d1
00 02 00 0

6. I use amdu dump the asm disk
amdu -dump 'DATA'

---------------------------- SCANNING DISK N0002
-----------------------------
Disk N0002: 'ORCL:DATAVOL1'
AMDU-00209: Corrupt block found: Disk N0002 AU [84926] block [0] type
[0]
AMDU-00201: Disk N0002: 'ORCL:DATAVOL1'
AMDU-00204: Disk N0002 is in currently mounted diskgroup DATA
AMDU-00201: Disk N0002: 'ORCL:DATAVOL1'
** HEARTBEAT DETECTED **
Allocated AU's: 84927
Free AU's: 12733
AU's read for dump: 82
Block images saved: 3774
Map lines written: 82
Heartbeats seen: 1
Corrupt metadata blocks: 1
Corrupt AT blocks: 0

I tried to use remap, but the issue still exists

remap DATA DATAVOL1 173928448-173928448

Can anyone help?

Thanks

John Hurley

unread,

Apr 28, 2011, 9:19:55 AM4/28/11

to

Got a good rman backup?

How many databases share this disk group?

One way to approach it is to get the disk fixed at the storage
level ... recreate the ASM disk group with force ... restore the
database. If approaching it like that you may need to startup nomount
with a pfile copy and then restore a controlfile backup then mount
then do an rman restore.

I for one do not store my rman disk backups in ASM disk groups.

onedbguru

unread,

Apr 28, 2011, 9:50:23 PM4/28/11

to

I would echo John's question. Do you have a good backup?

What version ASM?
RAC? Version?
What type of storage (direct-connect RAID? SCSI? SAN?)
How are the underlying devices partitioned? or are they?
What is your REDUNDANCY level? If you are using EXTERNAL with
individual direct-attached SCSI disks, you should be taken out and
shot.

I typically will partition the device such that:
p1 = first block block 1 to block 1
p2 = rest of the device (block 2 to the end)

and the partition used by ASM is p2 only.

What happens when you use the following syntax for creating the
tablespace? If you are going to use ASM, it is time to get out of
the "I gotta know what datafile my data is in..." DBA mentality. I
have used this on ELDB (V V VLDB??) environments with no performance
degradation. ASM is supposed to help make your life easier and if you
understand ASM, it will. Or you can continue to do things the hard
way.

make sure that
alter system set db_create_file_dest='+DATA';
or
alter system set db_create_file_dest='+DATA/sub-dir/sub-dir'; -- if
you really need to find your datafile.

and then
create tablespace abc;

These are default when using ASM so no need to specify them:

lsllcm

unread,

Apr 30, 2011, 7:33:33 AM4/30/11

to

Yes, I have a backup.

I use dd to clean the disk and recreate the disk group, and use amdu
to extract pfile and control file.

I just want to better way or quicker way to fix the issue.

Thanks for your suggestion about tablespace creation.

I use scsi disks.

I am interesting about why partition like below:

<!-----

I typically will partition the device such that:
p1 = first block block 1 to block 1
p2 = rest of the device (block 2 to the end)

----->

Thanks

lsllcm

unread,

May 1, 2011, 11:05:36 AM5/1/11

to

Hi Onedbguru,

Why partition like below:

onedbguru

unread,

May 2, 2011, 10:35:42 PM5/2/11

to

Some OS's use the first block to store the VTOC (Solaris Volume Table
of Contents as an example ). If you overwrite this with ASM
information, you may no longer be able to access your the device. So,
I just make it a point to ensure that the OS won't do something silly
with my devices by reserving that first block.

In using ASM on a Solaris environment, when we did not reserve that
first block we would test by doing " dd if=/dev/zero of=/dev/...
bs=8192 count=10 ". The first time you do it, it works. Subsequent
attempts fail with I/O errors. Next, you have the SA re-enable the
device by reformatting it. So, bottom line is to use a standard
procedure that works on all platforms.