SCSI error: return code = 0x060e0000 and error (1011) state (3)

623 views
Skip to first unread message

Gopesh Sharma

unread,
Jan 31, 2012, 2:47:45 PM1/31/12
to open-...@googlegroups.com
I am running a Oracle RAC server. Since two weeks have not been to connect it
to SAN. It used to work fine and then it started dropping SAN connections.
It was found that 10GigE switch is to blame and it was replaced. Since then
connectivity is sporadic. The server does connect to SAN but not log enough
of stable enough of Oracle ASM to come up or even for a filesystem to get
mounted. OS is
Red Hat Enterprise Linux Server release 5.6 2.6.18-194.el5
SAN is Dell Equallogic PS6510.
Switch is 10GigE . I do not know model
NIC cards on server are BCM57711 10-Gigabit PCIe
Bios , NIC drivers , formware etc are all uptodate.

I have already set these parameters
sysctl -w net.ipv4.tcp_window_scaling=0

/etc/iscsi/iscsid.conf
node.session.timeo.replacement_timeout = 86400
node.conn[0].timeo.noop_out_interval = 0

I do not see partitions in /proc/partitions and multipath damenon does not
create any device in ls -lrt /dev/mapper/*

dmesg says


sd 22:0:0:0: SCSI error: return code = 0x060e0000
end_request: I/O error, dev sdm, sector 0
connection3:0: detected conn error (1011)
session3: target reset succeeded
connection12:0: detected conn error (1011)
session12: target reset succeeded
connection3:0: detected conn error (1011)
session3: target reset succeeded
connection12:0: detected conn error (1011)
session12: target reset succeeded
sd 13:0:0:0: timing out command, waited 360s
sd 13:0:0:0: SCSI error: return code = 0x060e0000
end_request: I/O error, dev sdd, sector 63
printk: 11 messages suppressed.
Buffer I/O error on device sdd1, logical block 0
Buffer I/O error on device sdd1, logical block 1
Buffer I/O error on device sdd1, logical block 2
Buffer I/O error on device sdd1, logical block 3
Buffer I/O error on device sdd1, logical block 4
Buffer I/O error on device sdd1, logical block 5
Buffer I/O error on device sdd1, logical block 6
Buffer I/O error on device sdd1, logical block 7
Buffer I/O error on device sdd1, logical block 8
Buffer I/O error on device sdd1, logical block 9
sd 22:0:0:0: timing out command, waited 360s
sd 22:0:0:0: SCSI error: return code = 0x060e0000
end_request: I/O error, dev sdm, sector 63
printk: 22 messages suppressed.
Buffer I/O error on device sdm1, logical block 0

Relevant portions of /var/log/messages are:


Jan 31 14:11:44 tptrac1 iscsid: Kernel reported iSCSI connection 6:0 error
(1011) state (3)
Jan 31 14:11:44 tptrac1 iscsid: Kernel reported iSCSI connection 1:0 error
(1011) state (3)
Jan 31 14:11:45 tptrac1 iscsid: Kernel reported iSCSI connection 7:0 error
(1011) state (3)
Jan 31 14:11:45 tptrac1 iscsid: Kernel reported iSCSI connection 9:0 error
(1011) state (3)
Jan 31 14:11:45 tptrac1 iscsid: Kernel reported iSCSI connection 8:0 error
(1011) state (3)
Jan 31 14:11:47 tptrac1 kernel: connection4:0: detected conn error (1011)
Jan 31 14:11:47 tptrac1 kernel: connection2:0: detected conn error (1011)
Jan 31 14:11:47 tptrac1 kernel: connection3:0: detected conn error (1011)
Jan 31 14:11:47 tptrac1 kernel: session1: target reset succeeded
Jan 31 14:11:47 tptrac1 kernel: session6: target reset succeeded
Jan 31 14:11:47 tptrac1 kernel: connection5:0: detected conn error (1011)
Jan 31 14:11:47 tptrac1 iscsid: connection6:0 is operational after recovery (1
attempts)
Jan 31 14:11:47 tptrac1 iscsid: connection1:0 is operational after recovery (1
attempts)


Mike Christie

unread,
Jan 31, 2012, 5:11:53 PM1/31/12
to open-...@googlegroups.com, Gopesh Sharma
On 01/31/2012 01:47 PM, Gopesh Sharma wrote:
> sd 22:0:0:0: SCSI error: return code = 0x060e0000
> end_request: I/O error, dev sdm, sector 0
> connection3:0: detected conn error (1011)
> session3: target reset succeeded
> connection12:0: detected conn error (1011)
> session12: target reset succeeded
> connection3:0: detected conn error (1011)
> session3: target reset succeeded
> connection12:0: detected conn error (1011)
> session12: target reset succeeded
> sd 13:0:0:0: timing out command, waited 360s
> sd 13:0:0:0: SCSI error: return code = 0x060e0000
> end_request: I/O error, dev sdd, sector 63

Something might be wrong with your storage. 0x060e0000 means the scsi
command is timing out. We had to drop the connection and relogin to fix
the problem. We tried to execute the IO for 360 seconds (scsi command
timeout * scsi allowed retries + 1), but it did not complete, so the
scsi layer ended up failing it.

Mike Christie

unread,
Feb 1, 2012, 5:05:11 PM2/1/12
to Gopesh Sharma, open-...@googlegroups.com
On 02/01/2012 10:05 AM, Gopesh Sharma wrote:
> The SAN admin page does not show any error except 1 failed disk . I
> restarted the SAN which came up without any issue.
> I have gone through the configuration couple of times
> I suspect the 10GigE Switch which lies between SAN and Server .
> Right now Dell support is asking to change MTU at switch to 9216,
> Let see if that helps

What version of open-iscsi are you using? Could you turn on debugging?

It might be slightly different on your kernel but something like this:

echo 1 > /sys/module/libiscsi/paramters/*debug/*
echo 1 > /sys/module/libiscsi_tcp/paramters/*debug/*
echo 1 > /sys/module/iscsi_tcp/paramters/*debug/*

And can you also send a tcpdump/wireshark trace. This way we can see if
the IO is making some progress but very slowly, or if we just do not see
any IO on the initiator side at all.


>
>
> Regards,

Gopesh Sharma

unread,
Feb 1, 2012, 11:05:34 AM2/1/12
to Mike Christie, open-...@googlegroups.com
The SAN admin page does not show any error except 1 failed disk . I restarted the SAN which came up without any issue.
I have gone through the configuration couple of times 
I suspect the 10GigE Switch which lies between SAN and Server .
Right now Dell support is asking to change MTU at switch to 9216, 
Let see if that helps


Regards,

On Tue, Jan 31, 2012 at 5:11 PM, Mike Christie <mich...@cs.wisc.edu> wrote:

Ulrich Windl

unread,
Feb 2, 2012, 2:18:16 AM2/2/12
to open-iscsi
Hi!

According to my very little experience with huge packages, I think that even that value is rather big. We are running with 9000 here. About 20 years ago we had a printing problem when some packet buffer was a few bytes to small: Small print jobs would work, but lerger ones won't. Maybe you are seeing similar with iSCSI. Also: Have a _matching_ MTU along the whole path.

Regards,
Ulrich

>>> Gopesh Sharma <gopesh.s...@gmail.com> schrieb am 01.02.2012 um 17:05 in
Nachricht
<CAHOahhyEvOptX5=ExHM_OYa_4uv24h_d...@mail.gmail.com>:

Reply all
Reply to author
Forward
0 new messages