SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)

566 views
Skip to first unread message

avora

unread,
Dec 7, 2009, 6:07:33 AM12/7/09
to open-iscsi
With SLES10 SP3 x86_64,
as soon as I start the second iscsi session2, I am very frequently
getting the connection errors/
I do not see this with SLES10 SP2 x86_64 on the same setup.

Dec 7 18:42:05 cdc-r710s1 kernel: connection2:0: detected conn error
(1011)
Dec 7 18:42:06 cdc-r710s1 iscsid: connection2:0 is operational after
recovery (1 attempts)
Dec 7 18:42:06 cdc-r710s1 iscsid: Kernel reported iSCSI connection
2:0 error (1011) state (3)
Dec 7 18:42:08 cdc-r710s1 kernel: connection2:0: detected conn error
(1011)

I have tried changing noop_out_interval and noop_out_timeout to
120/120 and 0/0 but did not help.
The iscsiadm settings are same on both SP2 and SP3.
Is there anything else that can be tried ?

# iscsiadm --mode node --targetname <target>
...

# rpm -qa | grep iscsi
iscsitarget-0.4.17-3.4.25
open-iscsi-2.0.868-0.6.11
yast2-iscsi-client-2.14.47-0.4.9
yast2-iscsi-server-2.13.26-0.3

Mike Christie

unread,
Dec 7, 2009, 12:31:19 PM12/7/09
to open-...@googlegroups.com
avora wrote:
> With SLES10 SP3 x86_64,
> as soon as I start the second iscsi session2, I am very frequently
> getting the connection errors/
> I do not see this with SLES10 SP2 x86_64 on the same setup.
>
> Dec 7 18:42:05 cdc-r710s1 kernel: connection2:0: detected conn error
> (1011)
> Dec 7 18:42:06 cdc-r710s1 iscsid: connection2:0 is operational after
> recovery (1 attempts)
> Dec 7 18:42:06 cdc-r710s1 iscsid: Kernel reported iSCSI connection
> 2:0 error (1011) state (3)
> Dec 7 18:42:08 cdc-r710s1 kernel: connection2:0: detected conn error
> (1011)
>
> I have tried changing noop_out_interval and noop_out_timeout to
> 120/120 and 0/0 but did not help.

Did you see a ping/nop timeout message in the logs or just what you
included above with the conn error 1011? The ping/nop message would be a
little before the con error 1011.

What target is this with and are you doing any IO tests when this
happens or are you just logging into the second session and then you
start to get these errors?

avora

unread,
Dec 7, 2009, 2:09:32 PM12/7/09
to open-iscsi
I do not see ping/nop timeout message in the logs
(probably that's why changing the noop timeouts did not work).
Simply starting the session does not cause these errors.
On starting the second session, I start a daemon
that does SCSI commands like INQUIRY on all the paths.
After that I see these messages, and the daemon gets stuck
for a very long time waiting for SCSI commands to finish.

At the backend I have EMC CLARiiON.

# iscsiadm -m node -P 1
Target: iqn.1992-04.com.emc:cx.ckm00091100683.a2
Portal: 192.168.10.1:3260,1
Iface Name: iface0
Target: iqn.1992-04.com.emc:cx.ckm00091100683.b2
Portal: 192.168.12.1:3260,3
Iface Name: iface1

# iscsiadm --mode node --targetname iqn.
1992-04.com.emc:cx.ckm00091100683.a2
node.name = iqn.1992-04.com.emc:cx.ckm00091100683.a2
node.tpgt = 1
node.startup = automatic
iface.hwaddress = 00:15:17:A8:A9:0A
iface.iscsi_ifacename = iface0
iface.net_ifacename = eth4
iface.transport_name = tcp
node.discovery_address = 192.168.10.1
node.discovery_port = 3260
node.discovery_type = send_targets
node.session.initial_cmdsn = 0
node.session.initial_login_retry_max = 4
node.session.cmds_max = 128
node.session.queue_depth = 32
node.session.auth.authmethod = None
node.session.auth.username = <empty>
node.session.auth.password = <empty>
node.session.auth.username_in = <empty>
node.session.auth.password_in = <empty>
node.session.timeo.replacement_timeout = 120
node.session.err_timeo.abort_timeout = 15
node.session.err_timeo.lu_reset_timeout = 20
node.session.err_timeo.host_reset_timeout = 60
node.session.iscsi.FastAbort = Yes
node.session.iscsi.InitialR2T = No
node.session.iscsi.ImmediateData = Yes
node.session.iscsi.FirstBurstLength = 262144
node.session.iscsi.MaxBurstLength = 16776192
node.session.iscsi.DefaultTime2Retain = 0
node.session.iscsi.DefaultTime2Wait = 2
node.session.iscsi.MaxConnections = 1
node.session.iscsi.MaxOutstandingR2T = 1
node.session.iscsi.ERL = 0
node.conn[0].address = 192.168.10.1
node.conn[0].port = 3260
node.conn[0].startup = manual
node.conn[0].tcp.window_size = 524288
node.conn[0].tcp.type_of_service = 0
node.conn[0].timeo.logout_timeout = 15
node.conn[0].timeo.login_timeout = 15
node.conn[0].timeo.auth_timeout = 45
node.conn[0].timeo.noop_out_interval = 5
node.conn[0].timeo.noop_out_timeout = 5
node.conn[0].iscsi.MaxRecvDataSegmentLength = 131072
node.conn[0].iscsi.HeaderDigest = None,CRC32C
node.conn[0].iscsi.DataDigest = None
node.conn[0].iscsi.IFMarker = No
node.conn[0].iscsi.OFMarker = No

On Dec 7, 10:31 pm, Mike Christie <micha...@cs.wisc.edu> wrote:
> avora wrote:
> > With SLES10 SP3 x86_64,
> > as soon as I start the second iscsi session2, I am very frequently
> > getting the connection errors/
> > I do not see this with SLES10 SP2 x86_64 on the same setup.
>
> > Dec  7 18:42:05 cdc-r710s1 kernel:  connection2:0:detectedconnerror
> > (1011)
> > Dec  7 18:42:06 cdc-r710s1 iscsid: connection2:0 is operational after
> > recovery (1 attempts)
> > Dec  7 18:42:06 cdc-r710s1 iscsid: Kernel reported iSCSI connection
> > 2:0error(1011) state (3)
> > Dec  7 18:42:08 cdc-r710s1 kernel:  connection2:0:detectedconnerror
> > (1011)
>
> > I have tried changing noop_out_interval and noop_out_timeout to
> > 120/120 and 0/0 but did not help.
>
> Did you see a ping/nop timeout message in the logs or just what you
> included above with theconnerror1011? The ping/nop message would be a

avora

unread,
Dec 8, 2009, 8:12:27 AM12/8/09
to open-iscsi
I got a similar issue while browsing
http://groups.google.com/group/open-iscsi/browse_thread/thread/3c9c37903e40cd6f

I wanted to enable logging as mentioned in above link.
----
echo 1 > /sys/module/libiscsi/parameters/debug_libiscsi_conn
echo 1 > /sys/module/libiscsi/parameters/debug_libiscsi_session
echo 1 > /sys/module/libiscsi/parameters/debug_libiscsi_eh
echo 1 > /sys/module/iscsi_tcp/parameters/debug_iscsi_tcp
echo 1 > /sys/module/libiscsi_tcp/parameters/debug_libiscsi_tcp
---

But on my machine I only see.

# ls /sys/module/libiscsi/
refcnt sections srcversion

# ls /sys/module/iscsi_tcp/
parameters refcnt sections srcversion

# ls /sys/module/iscsi_tcp/parameters/max_lun
/sys/module/iscsi_tcp/parameters/max_lun


# iscsiadm -m session -P 1
Target: iqn.1992-04.com.emc:cx.ckm00091100683.a3
........
iSCSI Connection State: TRANSPORT WAIT
iSCSI Session State: FAILED
Internal iscsid Session State: REPOEN





On Dec 7, 10:31 pm, Mike Christie <micha...@cs.wisc.edu> wrote:
> avora wrote:
> > With SLES10 SP3 x86_64,
> > as soon as I start the second iscsi session2, I am very frequently
> > getting the connection errors/
> > I do not see this with SLES10 SP2 x86_64 on the same setup.
>
> > Dec  7 18:42:05 cdc-r710s1 kernel:  connection2:0:detectedconnerror
> > (1011)
> > Dec  7 18:42:06 cdc-r710s1 iscsid: connection2:0 is operational after
> > recovery (1 attempts)
> > Dec  7 18:42:06 cdc-r710s1 iscsid: Kernel reported iSCSI connection
> > 2:0error(1011) state (3)
> > Dec  7 18:42:08 cdc-r710s1 kernel:  connection2:0:detectedconnerror
> > (1011)
>
> > I have tried changing noop_out_interval and noop_out_timeout to
> > 120/120 and 0/0 but did not help.
>
> Did you see a ping/nop timeout message in the logs or just what you
> included above with theconnerror1011? The ping/nop message would be a

Mike Christie

unread,
Dec 9, 2009, 9:54:17 PM12/9/09
to open-...@googlegroups.com
avora wrote:
> I do not see ping/nop timeout message in the logs
> (probably that's why changing the noop timeouts did not work).
> Simply starting the session does not cause these errors.
> On starting the second session, I start a daemon
> that does SCSI commands like INQUIRY on all the paths.
> After that I see these messages, and the daemon gets stuck
> for a very long time waiting for SCSI commands to finish.
>
> At the backend I have EMC CLARiiON.
>
> # iscsiadm -m node -P 1
> Target: iqn.1992-04.com.emc:cx.ckm00091100683.a2
> Portal: 192.168.10.1:3260,1
> Iface Name: iface0
> Target: iqn.1992-04.com.emc:cx.ckm00091100683.b2
> Portal: 192.168.12.1:3260,3
> Iface Name: iface1


Does the same path always fail?

If you log into one can you use it, then if you logout and log into the
other does that other one then work?

Is there any info the clarrion logs?

Mike Christie

unread,
Dec 9, 2009, 10:04:26 PM12/9/09
to open-...@googlegroups.com
avora wrote:
> I got a similar issue while browsing
> http://groups.google.com/group/open-iscsi/browse_thread/thread/3c9c37903e40cd6f
>
> I wanted to enable logging as mentioned in above link.
> ----
> echo 1 > /sys/module/libiscsi/parameters/debug_libiscsi_conn
> echo 1 > /sys/module/libiscsi/parameters/debug_libiscsi_session
> echo 1 > /sys/module/libiscsi/parameters/debug_libiscsi_eh
> echo 1 > /sys/module/iscsi_tcp/parameters/debug_iscsi_tcp
> echo 1 > /sys/module/libiscsi_tcp/parameters/debug_libiscsi_tcp
> ---
>
> But on my machine I only see.
>
> # ls /sys/module/libiscsi/
> refcnt sections srcversion
>
> # ls /sys/module/iscsi_tcp/
> parameters refcnt sections srcversion
>
> # ls /sys/module/iscsi_tcp/parameters/max_lun
> /sys/module/iscsi_tcp/parameters/max_lun
>

Your open-iscsi version is older and does not have those settings.

>
> # iscsiadm -m session -P 1
> Target: iqn.1992-04.com.emc:cx.ckm00091100683.a3
> ........
> iSCSI Connection State: TRANSPORT WAIT
> iSCSI Session State: FAILED
> Internal iscsid Session State: REPOEN
>
>

You might be seeing something else.

>>> Dec 7 18:42:06 cdc-r710s1 iscsid: connection2:0 is operational after
>>> recovery (1 attempts)

After the conn error message do you see one of these right away?

Mike Christie

unread,
Dec 9, 2009, 10:05:07 PM12/9/09
to open-...@googlegroups.com
If this is easy to reproduce could you get a wireshark/ethereal trace
and send it?

berthiau...@emc.com

unread,
Dec 10, 2009, 10:29:19 AM12/10/09
to open-...@googlegroups.com
Is CHAP configured on the array?

-----Original Message-----
From: open-...@googlegroups.com [mailto:open-...@googlegroups.com]
On Behalf Of Mike Christie
Sent: Wednesday, December 09, 2009 9:54 PM
To: open-...@googlegroups.com
Subject: Re: SLES10 SP3 x86_64 - connection2:0: detected conn error
(1011)

--

You received this message because you are subscribed to the Google
Groups "open-iscsi" group.
To post to this group, send email to open-...@googlegroups.com.
To unsubscribe from this group, send email to
open-iscsi+...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/open-iscsi?hl=en.



avora

unread,
Dec 9, 2009, 11:32:13 PM12/9/09
to open-iscsi
Yes Mike, the recovery message is seen right away.

Dec 7 18:42:06 cdc-r710s1 iscsid: connection2:0 is operational after
recovery (1 attempts)

'conn error' and 'recovery' are seen one after the other, continuosly.

On Dec 10, 8:04 am, Mike Christie <micha...@cs.wisc.edu> wrote:
> avora wrote:
> > I got a similar issue while browsing
> >http://groups.google.com/group/open-iscsi/browse_thread/thread/3c9c37...
>
> > I wanted to enable logging as mentioned in above link.
> > ----
> > echo 1 > /sys/module/libiscsi/parameters/debug_libiscsi_conn
> > echo 1 > /sys/module/libiscsi/parameters/debug_libiscsi_session
> > echo 1 > /sys/module/libiscsi/parameters/debug_libiscsi_eh
> > echo 1 > /sys/module/iscsi_tcp/parameters/debug_iscsi_tcp
> > echo 1 > /sys/module/libiscsi_tcp/parameters/debug_libiscsi_tcp
> > ---
>
> > But on my machine I only see.
>
> > #  ls /sys/module/libiscsi/
> > refcnt  sections  srcversion
>
> > # ls /sys/module/iscsi_tcp/
> > parameters  refcnt  sections  srcversion
>
> > # ls /sys/module/iscsi_tcp/parameters/max_lun
> > /sys/module/iscsi_tcp/parameters/max_lun
>
> Your open-iscsi version is older and does not have those settings.
>
>
>
> > # iscsiadm -m session -P 1
> > Target: iqn.1992-04.com.emc:cx.ckm00091100683.a3
> > ........
> >                 iSCSI Connection State: TRANSPORT WAIT
> >                 iSCSI Session State: FAILED
> >                 Internal iscsid Session State: REPOEN
>
> You might be seeing something else.
I did not get what exactly you meant
>
> >>> Dec  7 18:42:06 cdc-r710s1 iscsid: connection2:0 is operational after
> >>> recovery (1 attempts)
>
> After theconnerrormessage do you see one of these right away?

Anuarg Vora

unread,
Dec 11, 2009, 2:06:40 AM12/11/09
to open-...@googlegroups.com
There is no CHAP configured on the array.


--- On Thu, 12/10/09, berthiau...@emc.com <berthiau...@emc.com> wrote:

Mike Christie

unread,
Dec 11, 2009, 2:22:02 AM12/11/09
to open-...@googlegroups.com
avora wrote:
> Yes Mike, the recovery message is seen right away.
>
> Dec 7 18:42:06 cdc-r710s1 iscsid: connection2:0 is operational after
> recovery (1 attempts)
>
> 'conn error' and 'recovery' are seen one after the other, continuosly.
>

Do you have other initiators connected to the target?

Can you get me a wireshark trace?

Anuarg Vora

unread,
Dec 11, 2009, 2:35:24 AM12/11/09
to open-...@googlegroups.com
I did sent the ethereal trace yesterday.
I am not sure why it didn't reach, is there any place I can upload it ?

There is only 1 initiator.

# iscsiadm -m session -P 1
Target: iqn.1992-04.com.emc:cx.ckm00091100683.a3

Current Portal: 192.168.11.1:3260,2
Persistent Portal: 192.168.11.1:3260,2
**********
Interface:
**********
Iface Name: iface0
Iface Transport: tcp
Iface Initiatorname: iqn.1996-04.de.suse:02:9914ca52960
Iface IPaddress: 192.168.11.11
Iface HWaddress: 00:15:17:A8:A9:1E
Iface Netdev: eth0
SID: 10


iSCSI Connection State: TRANSPORT WAIT
iSCSI Session State: FAILED
Internal iscsid Session State: REPOEN

Target: iqn.1992-04.com.emc:cx.ckm00091100683.b3
Current Portal: 192.168.13.1:3260,4
Persistent Portal: 192.168.13.1:3260,4
**********
Interface:
**********
Iface Name: iface1
Iface Transport: tcp
Iface Initiatorname: iqn.1996-04.de.suse:02:9914ca52960
Iface IPaddress: 192.168.13.11
Iface HWaddress: 00:15:17:A8:A9:1F
Iface Netdev: eth1
SID: 11


iSCSI Connection State: TRANSPORT WAIT
iSCSI Session State: FAILED
Internal iscsid Session State: REPOEN


--- On Thu, 12/10/09, Mike Christie <mich...@cs.wisc.edu> wrote:

> From: Mike Christie <mich...@cs.wisc.edu>
> Subject: Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)
> To: open-...@googlegroups.com

Mike Christie

unread,
Dec 11, 2009, 2:39:15 AM12/11/09
to open-...@googlegroups.com
Anuarg Vora wrote:
> I did sent the ethereal trace yesterday.
> I am not sure why it didn't reach, is there any place I can upload it ?
>

http://groups-beta.google.com/group/open-iscsi/files

Anuarg Vora

unread,
Dec 11, 2009, 3:22:10 AM12/11/09
to open-...@googlegroups.com
Sorry, I do not see an upload option for me even after (signing-in).
How to upload ?

--- On Thu, 12/10/09, Mike Christie <mich...@cs.wisc.edu> wrote:

> From: Mike Christie <mich...@cs.wisc.edu>
> Subject: Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)
> To: open-...@googlegroups.com

Anuarg Vora

unread,
Dec 14, 2009, 7:15:43 AM12/14/09
to open-...@googlegroups.com
I have got a reproducible test case for this.
It seems that SCSI layer returns DID_BUS_BUSY many times when 'conn error (1011)' is seen.

for p in `ls /dev/sd*`
do
dd if=$p of=/dev/zero count=10000 &
done
wait

# ./io-script
10000+0 records in
10000+0 records out
5120000 bytes (5.1 MB) copied, 0.177076 seconds, 28.9 MB/s
....
dd: reading `/dev/sdaa8': Input/output error
2976+0 records in
2976+0 records out

Dec 14 11:15:12 cdc-r710s3 iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3)
Dec 14 11:15:13 cdc-r710s3 kernel: connection2:0: detected conn error (1011)
Dec 14 11:15:13 cdc-r710s3 iscsid: connection2:0 is operational after recovery (1 attempts)
Dec 14 11:15:13 cdc-r710s3 iscsid: Kernel reported iSCSI connection 2:0 error (1011) state (3)
Dec 14 11:15:14 cdc-r710s3 kernel: connection1:0: detected conn error (1011)
...
Dec 14 11:15:14 cdc-r710s3 kernel: sd 9:0:0:13: SCSI error: return code = 0x00020000 <== DID_BUS_BUSY
Dec 14 11:15:14 cdc-r710s3 kernel: end_request: I/O error, dev sdaa, sector 2976

I am unable to upload ethereal on http://groups-beta.google.com/group/open-iscsi/files

Regards,
Anurag

--- On Fri, 12/11/09, Anuarg Vora <anurag...@yahoo.com> wrote:

Mike Christie

unread,
Dec 15, 2009, 3:36:53 AM12/15/09
to open-...@googlegroups.com
Just email the trace to me in private.

Anuarg Vora wrote:
> I have got a reproducible test case for this.
> It seems that SCSI layer returns DID_BUS_BUSY many times when 'conn error (1011)' is seen.
>

DID_BUS_BUSY when getting a 1011 is sort of expected. If you are not
using dm-multipath then the scsi layer will retry the error value up to
5 times.

If you are using dm-mutlipath then the scsi layer will fail the IO to
the multipath layer, where it will retry a new path right away.

Hannes Reinecke

unread,
Jan 13, 2010, 9:24:02 AM1/13/10
to open-...@googlegroups.com
Please try with the latest update kernel. I made quite some
fixes which should help here.

cheers,

Hannes
--
Dr. Hannes Reinecke zSeries & Storage
ha...@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N�rnberg
GF: Markus Rex, HRB 16746 (AG N�rnberg)

sam.rawlins

unread,
Jan 22, 2010, 10:26:20 AM1/22/10
to open-iscsi
Hi Hannes,

I am seeing similar problems.
What kernel do you mean, that has fixes?

> h...@suse.de                          +49 911 74053 688

Reply all
Reply to author
Forward
0 new messages