detected conn error (1011)

1,198 views
Skip to first unread message

Goncalo Gomes

unread,
Aug 4, 2010, 5:12:30 PM8/4/10
to open-...@googlegroups.com, mich...@cs.wisc.edu

I'm running a setup composed of: Linux 2.6.27 x86 based on SLES + Xen 3.4 (as dom0) running a couple of RHEL 5.5 VMs. The underlying storage for these VMs is iSCSI based via open-iscsi 2.0.870-26.6.1 and a DELL equallogic array.

 

Whenever the equallogic rebalances the LUNs between the controllers/ports, it requests the initiator to logout and login again to the new port/ip. If the guests are idle, the following messages show up in the logs:

 

Aug  3 17:55:08 goncalog140 kernel:  connection1:0: detected conn error (1011)

Aug  3 17:55:09 goncalog140 kernel:  connection1:0: detected conn error (1011)

Aug  3 17:55:10 goncalog140 iscsid: connection1:0 is operational after recovery (1 attempts)

 

However, if one of the RHEL guests is busy performing IO, we end up having a few failed requests as well:

 

Aug  3 17:55:26 goncalog140 kernel:  connection1:0: dropping R2T itt 55 in recovery.

Aug  3 17:55:26 goncalog140 kernel:  connection1:0: detected conn error (1011)

Aug  3 17:55:26 goncalog140 kernel: sd 6:0:0:0: [sdb] Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK,SUGGEST_OK

Aug  3 17:55:26 goncalog140 kernel: end_request: I/O error, dev sdb, sector 533399

Aug  3 17:55:26 goncalog140 kernel: sd 6:0:0:0: [sdb] Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK,SUGGEST_OK

Aug  3 17:55:26 goncalog140 kernel: end_request: I/O error, dev sdb, sector 5337 51

Aug  3 17:55:27 goncalog140 kernel:  connection1:0: detected conn error (1011)

Aug  3 17:55:29 goncalog140 iscsid: connection1:0 is operational after recovery (1 attempts)

 

And as a side effect, the guest filesystem goes read-only. Googling around, I've found the following thread on this list which covers the same error I’m seeing in the logs:

 

http://groups.google.com/group/open-iscsi/browse_thread/thread/3a7a5db6e5020423/8e95febb6cf79f64?lnk=gst&q=conn+error#8e95febb6cf79f64

 

I've also compiled the drivers iscsi_tcp/libiscsi with the patch from Mike Christie taken from that thread which can be found in the link below:

 

http://groups.google.com/group/open-iscsi/attach/db552832995daaa7/trace-conn-error.patch?part=2&view=1

 

Is this a known issue? Is there anything else from a troubleshooting perspective that I could do?

 

I've uploaded the following files, in case someone would like to take a look:

 

Tcpdump's collected a couple of days ago in another reproduction/analysis of the same bug (apologies, but I didn’t get around to collect new tcp dumps with today’s reproduction):

 

0tcpdump0947.pcap       162K  - 09:47 (GMT+1) nothing occurred.

1tcpdump0952.pcap       4.8M  - 09:52 (GMT+2) problem occurred

 

Logs from today’s reproduction of the issue with the patched drivers for additional backtracing:

 

vm-boot.txt                        2.7K After VM creation

vm-lun-rebalance-no-effect.txt     3.1K VM is idling, FS does not become read-only.

vm-lun-rebalance-fs-readonly.txt   3.3K VM is dd'ing /dev/zero to iscsi based disk, FS becomes read-only.

guest-dmesg.txt                    14K  RHEL 5.3 with 2.6.18-194.8.1.el5xen (RHEL 5.5 kernel)

 

All these files can be found in the following link:

 

http://promisc.org/iscsi/

 

Any help would be greatly appreciated!

 

Cheers,

 -Goncalo.

 

 

Mike Christie

unread,
Aug 4, 2010, 10:51:30 PM8/4/10
to Goncalo Gomes, open-...@googlegroups.com
On 08/04/2010 04:12 PM, Goncalo Gomes wrote:
> I'm running a setup composed of: Linux 2.6.27 x86 based on SLES + Xen 3.4 (as dom0) running a couple of RHEL 5.5 VMs. The underlying storage for these VMs is iSCSI based via open-iscsi 2.0.870-26.6.1 and a DELL equallogic array.
>
>
>
> Whenever the equallogic rebalances the LUNs between the controllers/ports, it requests the initiator to logout and login again to the new port/ip. If the guests are idle, the following messages show up in the logs:
>
>
>
> Aug 3 17:55:08 goncalog140 kernel: connection1:0: detected conn error (1011)
>
> Aug 3 17:55:09 goncalog140 kernel: connection1:0: detected conn error (1011)
>
> Aug 3 17:55:10 goncalog140 iscsid: connection1:0 is operational after recovery (1 attempts)
>
>
>
> However, if one of the RHEL guests is busy performing IO, we end up having a few failed requests as well:
>
>
>
> Aug 3 17:55:26 goncalog140 kernel: connection1:0: dropping R2T itt 55 in recovery.
>
> Aug 3 17:55:26 goncalog140 kernel: connection1:0: detected conn error (1011)
>
> Aug 3 17:55:26 goncalog140 kernel: sd 6:0:0:0: [sdb] Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK,SUGGEST_OK
>
> Aug 3 17:55:26 goncalog140 kernel: end_request: I/O error, dev sdb, sector 533399
>
> Aug 3 17:55:26 goncalog140 kernel: sd 6:0:0:0: [sdb] Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK,SUGGEST_OK
>
> Aug 3 17:55:26 goncalog140 kernel: end_request: I/O error, dev sdb, sector 5337 51
>
> Aug 3 17:55:27 goncalog140 kernel: connection1:0: detected conn error (1011)
>
> Aug 3 17:55:29 goncalog140 iscsid: connection1:0 is operational after recovery (1 attempts)
>
>
>
> And as a side effect, the guest filesystem goes read-only. Googling around, I've found the following thread on this list which covers the same error I'm seeing in the logs:
>
>
>
> http://groups.google.com/group/open-iscsi/browse_thread/thread/3a7a5db6e5020423/8e95febb6cf79f64?lnk=gst&q=conn+error#8e95febb6cf79f64
>

conn error 1011 is generic. If this is occurring when the eql box is
rebalancing luns, it is a little different than above. With the above
problem we did not know why we got the error. With your situation we
sort of expect this. We should not be getting disk IO errors though.

When we get the logout request from the target, we send the logout
request, then basically handle the cleanup like if we got a connection
error. That is why you would see the conn error msg in this path. This
also means if this happened to the same IO 5 times, then you would see
the disk IO errors (scsi layer only lets us retry disk IO 5 times). But
if it just happened once, then the IO should be retried when we log into
the new portal and execute like normal.

Or are you using dm-multipath over iscsi? In that case you do not get
any retries, so we would expect to see that end_request: I/O error
message, but dm-multipath should just be retrying a new path or
internally queueing for whatever timeout value you had it use in
multipath.conf.

>
>
> I've also compiled the drivers iscsi_tcp/libiscsi with the patch from Mike Christie taken from that thread which can be found in the link below:
>
>
>
> http://groups.google.com/group/open-iscsi/attach/db552832995daaa7/trace-conn-error.patch?part=2&view=1
>
>

Could you send me the libiscsi.c file you patched?

Could you also send more of the log for either case? I want to see the
iscsid log info and any more of the kernel iscsi log info that you have.
I am looking for session recovery timed out messages and/or target
requested logout messages.

Ulrich Windl

unread,
Aug 5, 2010, 2:50:00 AM8/5/10
to open-...@googlegroups.com
>>> Goncalo Gomes <Goncal...@eu.citrix.com> schrieb am 04.08.2010 um 23:12 in
Nachricht
<FFDB98DC9661D3418B9EB...@LONPMAILBOX01.citrite.net>:

> I'm running a setup composed of: Linux 2.6.27 x86 based on SLES + Xen 3.4 (as
> dom0) running a couple of RHEL 5.5 VMs. The underlying storage for these VMs
> is iSCSI based via open-iscsi 2.0.870-26.6.1 and a DELL equallogic array.

I guess that's SLES11 already. I just read an announcement that there is an opne-iscsi update for SLES11 SP1 available. Unfortunately Novell does not give any details in the announcements:

4. Recommended update for open-iscsi

SUSE Linux Enterprise Desktop 11 SP1 for x86-64
http://download.novell.com/Download?buildid=MAugs_l2FJY~

SUSE Linux Enterprise Desktop 11 SP1 for x86
http://download.novell.com/Download?buildid=U2OyI_9oJ5g~

SUSE Linux Enterprise Server 11 SP1 for x86-64
http://download.novell.com/Download?buildid=1Z1WASv0lfE~

SUSE Linux Enterprise Server 11 SP1 for x86
http://download.novell.com/Download?buildid=EzU17PIvOTc~

SUSE Linux Enterprise Server 11 SP1 for s390x
http://download.novell.com/Download?buildid=xqwCozVDBjM~

SUSE Linux Enterprise Server 11 SP1 for ppc
http://download.novell.com/Download?buildid=fMD_W5XKEtI~

SUSE Linux Enterprise Server 11 SP1 for ia64
http://download.novell.com/Download?buildid=_QVtGS0824o~


Maybe you should try the latest (and greatest?) version... ;-)

Regards,
Ulrich


TSFH

unread,
Aug 5, 2010, 3:37:18 PM8/5/10
to open-...@googlegroups.com
On Thu, 2010-08-05 at 08:50 +0200, Ulrich Windl wrote:
> I guess that's SLES11 already. I just read an announcement that there is an opne-iscsi update for SLES11 SP1 available. Unfortunately Novell does not give any details in the announcements:
>
> 4. Recommended update for open-iscsi
>
> SUSE Linux Enterprise Desktop 11 SP1 for x86-64
> http://download.novell.com/Download?buildid=MAugs_l2FJY~
>
> SUSE Linux Enterprise Desktop 11 SP1 for x86
> http://download.novell.com/Download?buildid=U2OyI_9oJ5g~
>
> SUSE Linux Enterprise Server 11 SP1 for x86-64
> http://download.novell.com/Download?buildid=1Z1WASv0lfE~
>
> SUSE Linux Enterprise Server 11 SP1 for x86
> http://download.novell.com/Download?buildid=EzU17PIvOTc~
>
> SUSE Linux Enterprise Server 11 SP1 for s390x
> http://download.novell.com/Download?buildid=xqwCozVDBjM~
>
> SUSE Linux Enterprise Server 11 SP1 for ppc
> http://download.novell.com/Download?buildid=fMD_W5XKEtI~
>
> SUSE Linux Enterprise Server 11 SP1 for ia64
> http://download.novell.com/Download?buildid=_QVtGS0824o~
>
>
> Maybe you should try the latest (and greatest?) version... ;-)

Fron the description:

* Occasionally, not all iSCSI multipath mapping are being created
after boot up
* Stopping of the open-iscsi service fails even if no iSCSI device
is mounted.
* When configuring iBFT, the iscsiadm program does not display the
details of a session.

Do you think any of the fixes above may help in the issue I described
before? I'm presently not making use of multipath nor booting from SAN.

Although, these fixes are worth having, I'm mostly concerned about
understanding the nature/reason of the issue I described before at this
stage.

Thanks,
-Goncalo.


> Regards,
> Ulrich
>
>

Goncalo Gomes

unread,
Aug 5, 2010, 3:21:08 PM8/5/10
to open-...@googlegroups.com
On Wed, 2010-08-04 at 21:51 -0500, Mike Christie wrote:
> conn error 1011 is generic. If this is occurring when the eql box is
> rebalancing luns, it is a little different than above. With the above
> problem we did not know why we got the error. With your situation we
> sort of expect this. We should not be getting disk IO errors though.
>
> When we get the logout request from the target, we send the logout
> request, then basically handle the cleanup like if we got a connection
> error. That is why you would see the conn error msg in this path. This
> also means if this happened to the same IO 5 times, then you would see
> the disk IO errors (scsi layer only lets us retry disk IO 5 times). But
> if it just happened once, then the IO should be retried when we log into
> the new portal and execute like normal.

What would be the best way to I identify how many retries have elapsed?

> Or are you using dm-multipath over iscsi? In that case you do not get
> any retries, so we would expect to see that end_request: I/O error
> message, but dm-multipath should just be retrying a new path or
> internally queueing for whatever timeout value you had it use in
> multipath.conf.

Multipath is not enabled at all. The equallogic array is active/passive
and we only have a view into one controller at any time, so we don't
make use of multipath at present.

> Could you send me the libiscsi.c file you patched?
>
> Could you also send more of the log for either case? I want to see the
> iscsid log info and any more of the kernel iscsi log info that you have.
> I am looking for session recovery timed out messages and/or target
> requested logout messages.

I've copied both the messages file from the host goncalog140 and the
patched libiscsi.c. FWIW, I've also included the iscsid.conf. Find these
files in the link below:

http://promisc.org/iscsi/

N.B: the messages file contains spew from other instrumentation tests
(e.g a dump_stack() call in scsi_transport_iscsi.c::iscsi_conn_error()).
The last set of tests which I've made available yesterday have only the
libiscsi.c and IIRC the iscsi_tcp.c, and this output can be found around
the timeframe of 17:50.

If required I can spin a new set of tests with different instrumentation
and/or collect different information, logs or tcpdumps, if that helps in
any way.

Thanks,
-Goncalo.

Mike Christie

unread,
Aug 5, 2010, 5:08:45 PM8/5/10
to open-...@googlegroups.com, Goncalo Gomes, Hannes Reinecke
ccing Hannes from suse, because this looks like a SLES only bug.

Hey Hannes,

The user is using Linux 2.6.27 x86 based on SLES + Xen 3.4 (as dom0)

running a couple of RHEL 5.5 VMs. The underlying storage for these VMs
is iSCSI based via open-iscsi 2.0.870-26.6.1 and a DELL equallogic array.


On 08/05/2010 02:21 PM, Goncalo Gomes wrote:
> I've copied both the messages file from the host goncalog140 and the
> patched libiscsi.c. FWIW, I've also included the iscsid.conf. Find these
> files in the link below:
>
> http://promisc.org/iscsi/
>

It looks like this chunk from libiscsi.c:iscsi_queuecommand:

case ISCSI_STATE_FAILED:
reason = FAILURE_SESSION_FAILED;
sc->result = DID_TRANSPORT_DISRUPTED << 16;
break;

is causing IO errors.

You want to use something like DID_IMM_RETRY because it can be a long
time between the time the kernel marks the state as ISCSI_STATE_FAILED
until we start recovery and properly get all the device queues blocked,
so we can exhaust all the retries if we use DID_TRANSPORT_DISRUPTED.

Hannes Reinecke

unread,
Aug 6, 2010, 10:57:54 AM8/6/10
to Mike Christie, open-...@googlegroups.com, Goncalo Gomes
Yeah, I noticed.
But the problem is that multipathing will stall during this time,
ie no failover will occur and I/O will stall. Using DID_TRANSPORT_DISRUPTED
will circumvent this and we can failover immediately.

Sadly I got additional bugreports about this so I think I'll have
to revert it.

I have put some test kernels at

http://beta.suse.com/private/hare/sles11/iscsi

Can you test with them and check if this issue is solved?

Thanks.

Cheers,

Hannes
--
Dr. Hannes Reinecke zSeries & Storage
ha...@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N�rnberg
GF: Markus Rex, HRB 16746 (AG N�rnberg)

Goncalo Gomes

unread,
Aug 6, 2010, 11:28:45 AM8/6/10
to Hannes Reinecke, Mike Christie, open-...@googlegroups.com, Julian Chesterfield
Hi Hannes,

Would you be able to send me a unified patch containing the changes included in the test kernels so I can rebuild the drivers with them and update you today?

For completeness, we are not running SLES, but rather the Citrix XenServer 5.6 release which is based off of the Linux 2.6.27 tree of SLES. Also, for this specific controller we don't enable MPIO, but in most other arrays we do.

Thanks,
-Goncalo.

http://beta.suse.com/private/hare/sles11/iscsi

Thanks.

Cheers,

SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

Mike Christie

unread,
Aug 6, 2010, 12:38:10 PM8/6/10
to open-...@googlegroups.com, Hannes Reinecke, Goncalo Gomes

It should stall, It works like FC and the fast io fail tmo. Users need
to set the iscsi replacement/recovery timeout like they would FC's fast
io fail tmo. They should set it to 3 or 5 secs or lower if they want
really fast failovers.

Mike Christie

unread,
Aug 6, 2010, 11:04:37 PM8/6/10
to open-...@googlegroups.com, Hannes Reinecke, Goncalo Gomes
On 08/06/2010 11:38 AM, Mike Christie wrote:
>
> It should stall, It works like FC and the fast io fail tmo. Users need
> to set the iscsi replacement/recovery timeout like they would FC's fast
> io fail tmo. They should set it to 3 or 5 secs or lower if they want
> really fast failovers.
>

Oh yeah, Qlogic recently did this patch:

http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commitdiff;h=fe4f0bdeea788a8ac049c097895cb2e4044f18b1;hp=caf19d38607108304cd8cc67ed21378017f69e8a

so we can have multipath tools set the recovery_tmo like it does for
fast io fail tmo.

Goncalo Gomes

unread,
Aug 24, 2010, 8:43:13 AM8/24/10
to Hannes Reinecke, Mike Christie, open-...@googlegroups.com, Shantanu Mehendale
Hi,

I applied and tested the changes Mike Christie suggests. After the LUN
is rebalanced within the array I no longer see the IO errors and it
appears the setup is now resilient to the equallogic LUN failover
process.

I'm attaching the log from the dmesg merely for sanity check purposes,
if anyone cares to take a look?

> I have put some test kernels at
>
> http://beta.suse.com/private/hare/sles11/iscsi

Do the test kernels in the url above contain the change of
DID_TRANSPORT_DISRUPTED to DID_DIMM_RETRY or is there more to it than
simply changing the result code? If the latter, would you be able to
upload the source rpms or a unified patch containing the changes you are
are staging? I'm looking for a more pallatable way to test them, given I
have no SLES box lying around, but will install one if needs be.

Thanks,
-Goncalo.

dmesg

Hannes Reinecke

unread,
Aug 30, 2010, 11:12:01 AM8/30/10
to Goncal...@eu.citrix.com, Mike Christie, open-...@googlegroups.com, Shantanu Mehendale
Got me confused. How would you test the patch if not on a SLES box?
Presumably you would have to install the new kernel on the instance
you are planning to run the test on. Which for any sane setup would
have to be a SLES box. In which case you can just use the provided
kernel directly and save you the compilation step.

Am I missing something?

Cheers,

Hannes
--
Dr. Hannes Reinecke zSeries & Storage
ha...@suse.de +49 911 74053 688

SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

Goncalo Gomes

unread,
Aug 31, 2010, 10:17:33 AM8/31/10
to Hannes Reinecke, Mike Christie, open-...@googlegroups.com, Shantanu Mehendale
Hi Hannes,

Thanks. The Citrix XenServer 5.6 distribution kernel is based on the 2.6.27 tree of SLES 11. We add a few extra patches specific to Xen, dom0 integration and some backports from upstream. To the best of my knowledge these additions don't touch the iscsi layer, so from the iscsi drivers point of view, I believe they are as pristine as the ones in the SuSE kernel and that's why we need the patch as the binaries probably will mismatch gcc version and/or the versioning that we use e.g 2.6.27.42-0.1.1.xs5.6.0.44.111158xen. I do definitely appreciate your 'forward thinking' with regards to the issue, though!

Thanks,
-Goncalo.



-----Original Message-----
From: Hannes Reinecke [mailto:ha...@suse.de]
Sent: 30 August 2010 15:12
To: Goncalo Gomes
Cc: Mike Christie; open-...@googlegroups.com; Shantanu Mehendale
Subject: Re: detected conn error (1011)

Hannes Reinecke

unread,
Aug 31, 2010, 10:42:57 AM8/31/10
to Goncalo Gomes, Mike Christie, open-...@googlegroups.com, Shantanu Mehendale
Goncalo Gomes wrote:
> Hi Hannes,
>
> Thanks. The Citrix XenServer 5.6 distribution kernel is based on the 2.6.27 tree of SLES 11.
> We add a few extra patches specific to Xen, dom0 integration and some backports from upstream.
> To the best of my knowledge these additions don't touch the iscsi layer, so from the iscsi
> drivers point of view, I believe they are as pristine as the ones in the SuSE kernel and that's
> why we need the patch as the binaries probably will mismatch gcc version and/or the versioning
> that we use e.g 2.6.27.42-0.1.1.xs5.6.0.44.111158xen. I do definitely appreciate your
> 'forward thinking' with regards to the issue, though!
>
I just checked, and the resulting patch is indeed like you proposed:

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 32b30f1..441ca8b 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -1336,9 +1336,6 @@ int iscsi_queuecommand(struct scsi_cmnd *sc, void (*done)(
struct scsi_cmnd *))
*/
switch (session->state) {
case ISCSI_STATE_FAILED:
- reason = FAILURE_SESSION_FAILED;
- sc->result = DID_TRANSPORT_DISRUPTED << 16;
- break;
case ISCSI_STATE_IN_RECOVERY:
reason = FAILURE_SESSION_IN_RECOVERY;
sc->result = DID_IMM_RETRY << 16;

HTH,

Goncalo Gomes

unread,
Aug 31, 2010, 11:09:38 AM8/31/10
to Hannes Reinecke, Mike Christie, open-...@googlegroups.com, Shantanu Mehendale
Thanks Hannes and Mike,

Your help has been highly appreciated!

Cheers,
-Goncalo.

-----Original Message-----
From: Hannes Reinecke [mailto:ha...@suse.de]
Sent: 31 August 2010 14:43
To: Goncalo Gomes
Cc: Mike Christie; open-...@googlegroups.com; Shantanu Mehendale
Subject: Re: detected conn error (1011)

Hannes Reinecke

unread,
Sep 3, 2010, 1:53:58 AM9/3/10
to Shantanu Mehendale, Hannes Reinecke, Mike Christie, open-...@googlegroups.com
On Thu, Sep 02, 2010 at 03:15:31PM -0700, Shantanu Mehendale wrote:
> Hi Hannes/Mike,
>
> I am also dealing with another issue on ISCSI transport where I am
> seeing "DID_TRASNPORT_FAILFAST" hostbyte errors reaching the application
> which is sending I/O on a device-mapper node. Reading the code a little
> I thought that after the iscsi replacement_timeout timer fires, the io
> stuck in the io queues will be sent up to the device-mapper, which
> would send the io to the new path. Is there a possibility that
> dm-multipath is not able to handle all the errors so some of them end
> up going to the application. Basically this is a cable pull kind of
> experiment where we would expect the path failover to work and io to
> continue properly.
Yes, in general it should. And yes, multipath should handle these cases.
But I did quite some patches to iSCSI in SLES11, so you should be making
sure you're using the latest maintenance release.

> Since we already saw one problem with "DID_TRANSPORT_DISRUPTED", I was
> wondering if "DID_TRANSPORT_FAILFAST" also has some similar issues with
> limited retries and such.
>
No, that's actually okay. The I/O error will be reported in either case,
it's just that it'll never reaches the upper layers.

In your case it looks as if the 'tapdisk' thing runs on the raw disks,
not the multipathed device. So of course it'll register the error.
Maybe it's an idea to have the 'tapdisk' run on the multipath device-mapper
device ...

Cheers,

Hannes
--
Dr. Hannes Reinecke zSeries & Storage
ha...@suse.de +49 911 74053 688

SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N�rnberg
GF: Markus Rex, HRB 16746 (AG N�rnberg)

Shantanu Mehendale

unread,
Sep 2, 2010, 6:15:31 PM9/2/10
to Hannes Reinecke, Mike Christie, open-...@googlegroups.com
Hi Hannes/Mike,

I am also dealing with another issue on ISCSI transport where I am seeing "DID_TRASNPORT_FAILFAST" hostbyte errors reaching the application which is sending I/O on a device-mapper node. Reading the code a little I thought that after the iscsi replacement_timeout timer fires, the io stuck in the io queues will be sent up to the device-mapper, which would send the io to the new path. Is there a possibility that dm-multipath is not able to handle all the errors so some of them end up going to the application. Basically this is a cable pull kind of experiment where we would expect the path failover to work and io to continue properly.
Since we already saw one problem with "DID_TRANSPORT_DISRUPTED", I was wondering if "DID_TRANSPORT_FAILFAST" also has some similar issues with limited retries and such.

The test host is running Citrix XenServer 5.6 distribution kernel which is based on the 2.6.27 tree of SLES 11.

Following are the messages that show up in the log after the cable is pulled on the storage array to bring one path down:

Aug 27 15:35:50 cb-xen-srv16 kernel: connection3:0: ping timeout of 15 secs exp
ired, recv timeout 10, last rx 371574, last ping 372574, now 374074
Aug 27 15:35:50 cb-xen-srv16 kernel: connection3:0: detected conn error (1011)
Aug 27 15:35:51 cb-xen-srv16 iscsid: Kernel reported iSCSI connection 3:0 error
(1011) state (3)
Aug 27 15:36:06 cb-xen-srv16 multipathd: sdm: tur checker reports path is down
Aug 27 15:36:06 cb-xen-srv16 multipathd: checker failed path 8:192 in map 3600c0
ff000daf1d967ab6d4c01000000
Aug 27 15:36:06 cb-xen-srv16 multipathd: Path event for 3600c0ff000daf1d967ab6d4
c01000000, calling mpathcount
Aug 27 15:36:06 cb-xen-srv16 multipathd: 3600c0ff000daf1d967ab6d4c01000000: rema
ining active paths: 3
Aug 27 15:36:06 cb-xen-srv16 TAPDISK[13127]: ERROR: errno -5 at vhd_complete: /d
ev/VG_XenStorage-1ea0d8c3-fb10-1120-fa46-2c7041b61667/VHD-54285520-e28a-46e0-8e8
4-7e1161594bca: op: 2, lsec: 384, secs: 8, nbytes: 4096, blk: 0, blk_offset: 845
5
Aug 27 15:36:06 cb-xen-srv16 TAPDISK[13127]: ERROR: errno -5 at __tapdisk_vbd_co
mplete_td_request: req 0: write 0x0008 secs to 0x00000180
Aug 27 15:36:06 cb-xen-srv16 TAPDISK[12972]: ERROR: errno -5 at vhd_complete: /d
ev/VG_XenStorage-d3bac722-2250-953e-732f-24f9f1b8750e/VHD-55619d95-e97b-4ed0-8f2
2-af3f250aaeff: op: 2, lsec: 5309966, secs: 84, nbytes: 43008, blk: 1296, blk_of
fset: 111055
Aug 27 15:36:06 cb-xen-srv16 TAPDISK[12972]: ERROR: errno -5 at __tapdisk_vbd_co
mplete_td_request: req 5: write 0x0054 secs to 0x0051060e
Aug 27 15:36:06 cb-xen-srv16 TAPDISK[12972]: ERROR: errno -5 at vhd_complete: /d
ev/VG_XenStorage-d3bac722-2250-953e-732f-24f9f1b8750e/VHD-55619d95-e97b-4ed0-8f2
2-af3f250aaeff: op: 2, lsec: 5310050, secs: 42, nbytes: 21504, blk: 1296, blk_of
fset: 111055
Aug 27 15:36:06 cb-xen-srv16 TAPDISK[12972]: ERROR: errno -5 at __tapdisk_vbd_co
mplete_td_request: req 6: write 0x002a secs to 0x00510662
Aug 27 15:36:06 cb-xen-srv16 TAPDISK[13006]: ERROR: errno -5 at vhd_complete: /d
ev/VG_XenStorage-edfe98bc-2104-9ba0-e32e-defda94af954/VHD-ad048ea9-1247-4b26-a27
1-d90d279cf33a: op: 2, lsec: 5340496, secs: 8, nbytes: 4096, blk: 1303, blk_offs
et: 1912711
Aug 27 15:36:06 cb-xen-srv16 TAPDISK[13006]: ERROR: errno -5 at __tapdisk_vbd_co
mplete_td_request: req 18: write 0x0008 secs to 0x00517d50
Aug 27 15:36:06 cb-xen-srv16 TAPDISK[13006]: ERROR: errno -5 at vhd_complete: /d
ev/VG_XenStorage-edfe98bc-2104-9ba0-e32e-defda94af954/VHD-ad048ea9-1247-4b26-a27
1-d90d279cf33a: op: 2, lsec: 5340480, secs: 8, nbytes: 4096, blk: 1303, blk_offs
et: 1912711
Aug 27 15:36:06 cb-xen-srv16 TAPDISK[13006]: ERROR: errno -5 at __tapdisk_vbd_co
mplete_td_request: req 17: write 0x0008 secs to 0x00517d40
Aug 27 15:36:06 cb-xen-srv16 TAPDISK[13006]: ERROR: errno -5 at vhd_complete: /d
ev/VG_XenStorage-edfe98bc-2104-9ba0-e32e-defda94af954/VHD-ad048ea9-1247-4b26-a27
1-d90d279cf33a: op: 2, lsec: 5340472, secs: 8, nbytes: 4096, blk: 1303, blk_offs
et: 1912711
Aug 27 15:36:06 cb-xen-srv16 TAPDISK[13006]: ERROR: errno -5 at __tapdisk_vbd_co
mplete_td_request: req 16: write 0x0008 secs to 0x00517d38
Aug 27 15:36:06 cb-xen-srv16 TAPDISK[13006]: ERROR: errno -5 at vhd_complete: /d
ev/VG_XenStorage-edfe98bc-2104-9ba0-e32e-defda94af954/VHD-ad048ea9-1247-4b26-a27
1-d90d279cf33a: op: 2, lsec: 5340464, secs: 8, nbytes: 4096, blk: 1303, blk_offs
et: 1912711
Aug 27 15:36:06 cb-xen-srv16 TAPDISK[13006]: ERROR: errno -5 at __tapdisk_vbd_co
mplete_td_request: req 15: write 0x0008 secs to 0x00517d30
Aug 27 15:36:06 cb-xen-srv16 TAPDISK[13006]: ERROR: errno -5 at vhd_complete: /d
ev/VG_XenStorage-edfe98bc-2104-9ba0-e32e-defda94af954/VHD-ad048ea9-1247-4b26-a27
1-d90d279cf33a: op: 2, lsec: 5340456, secs: 8, nbytes: 4096, blk: 1303, blk_offs
et: 1912711
Aug 27 15:36:06 cb-xen-srv16 TAPDISK[13006]: ERROR: errno -5 at __tapdisk_vbd_co
mplete_td_request: req 14: write 0x0008 secs to 0x00517d28
Aug 27 15:36:06 cb-xen-srv16 TAPDISK[13006]: ERROR: errno -5 at vhd_complete: /d
ev/VG_XenStorage-edfe98bc-2104-9ba0-e32e-defda94af954/VHD-ad048ea9-1247-4b26-a27
1-d90d279cf33a: op: 2, lsec: 5340448, secs: 8, nbytes: 4096, blk: 1303, blk_offs
et: 1912711
Aug 27 15:36:06 cb-xen-srv16 kernel: session3: session recovery timed out after
15 secs
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:3: [sdl] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdl, sector 113
075102
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:3: [sdl] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdl, sector 113
084930
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:3: [sdl] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdl, sector 113
084848
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:3: [sdl] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdl, sector 113
080182
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:3: [sdl] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdl, sector 113
072780
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 528
95157
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
548824
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
548808
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
548800
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
548792
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
548784
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
548776
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
548144
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
548128
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
548120
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
548112
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
617640
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
617632
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
615168
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
615160
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
613800
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
613792
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
534112
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
534104
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
534096
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
534088
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
534080
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
534072
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
534064
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
549960
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
548840
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
548832
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
613088
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
613072
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
605232
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
596240
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:7: [sdn] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdn, sector 113
092702
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:15: [sdo] Result: hostbyte=DID_TRA
NSPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdo, sector 132
641544
Aug 27 15:36:06 cb-xen-srv16 kernel: device-mapper: multipath: Failing path 8:19
2.
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:15: [sdo] Result: hostbyte=DID_TRA
NSPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdo, sector 132
641544
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:7: [sdn] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdn, sector 113
092702
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
548824
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
548808
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
548800
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
548792
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
548784
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
548776
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
548144
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
548128
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
548120
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
548112
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
617640
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
617632
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
615168
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
615160
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
613800
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
613792
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
534112
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
534104
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
534096
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
534088
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
534080
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
534072
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
534064
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
549960
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
548840
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
548832
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
613088
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
613072
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
605232
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 134
596240
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:3: [sdl] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdl, sector 113
075102
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:3: [sdl] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdl, sector 113
084930
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:3: [sdl] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdl, sector 113
084848
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:3: [sdl] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdl, sector 113
080182
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:3: [sdl] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdl, sector 113
072780
Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRAN
SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK
Aug 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdm, sector 528
95157

Any help on this issue will be highly appreciated.

Thanks,
-shantanu

-----Original Message-----
From: Hannes Reinecke [mailto:ha...@suse.de]
Sent: Tuesday, August 31, 2010 7:43 AM
To: Goncalo Gomes
Cc: Mike Christie; open-...@googlegroups.com; Shantanu Mehendale
Subject: Re: detected conn error (1011)

Ulrich Windl

unread,
Sep 8, 2010, 7:51:22 AM9/8/10
to open-...@googlegroups.com
Hallo,

uns fiel gestern ein Controller am SAN-Storage aus, und open-iscsi auf SLES10
SP3 (open-iscsi-2.0.868-0.6.11) erzeugte _sehr_ viele (419869 Einträge in
wenigen Stunden) Fehlermeldungen in Syslog. Gat Novell Bestrebungen, diese zu
drosseln? Beispiel:
Sep 7 16:08:23 hostname kernel: connection19:0: iscsi: detected conn error
(1011)
Sep 7 16:08:23 hostname kernel: connection31:0: iscsi: detected conn error
(1011)
Sep 7 16:08:23 hostname kernel: connection20:0: iscsi: detected conn error
(1011)
Sep 7 16:08:23 hostname kernel: connection32:0: iscsi: detected conn error
(1011)
Sep 7 16:08:23 hostname kernel: connection23:0: iscsi: detected conn error
(1011)
Sep 7 16:08:23 hostname kernel: connection7:0: iscsi: detected conn error
(1011)
Sep 7 16:08:23 hostname kernel: connection24:0: iscsi: detected conn error
(1011)
Sep 7 16:08:23 hostname kernel: connection8:0: iscsi: detected conn error
(1011)
[...]
Sep 7 16:08:24 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:24 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:24 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:24 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:24 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:24 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:24 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:24 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:25 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:25 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:25 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:25 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:25 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:25 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:25 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:25 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:25 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:25 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:25 hostname iscsid: conn 0 login rejected: target error (03/01)
[...]
Sep 7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:53 hostname kernel: session19: iscsi: session recovery timed out
after 30 secs
Sep 7 16:08:53 hostname kernel: session31: iscsi: session recovery timed out
after 30 secs
Sep 7 16:08:53 hostname kernel: session20: iscsi: session recovery timed out
after 30 secs
[...]
Sep 7 16:08:54 hostname kernel: device-mapper: multipath: Failing path
8:192.
Sep 7 16:08:54 hostname multipathd: L116_hostas03: remaining active paths:
14
Sep 7 16:08:54 hostname multipathd: sdac: tur checker reports path is down
Sep 7 16:08:54 hostname multipathd: checker failed path 65:192 in map
L116_hostas03
Sep 7 16:08:54 hostname multipathd: L116_hostas03: remaining active paths:
13
Sep 7 16:08:54 hostname multipathd: sdv: tur checker reports path is down
Sep 7 16:08:54 hostname multipathd: checker failed path 65:80 in map
L116_hostas03
Sep 7 16:08:54 hostname multipathd: L116_hostas03: remaining active paths:
12
Sep 7 16:08:54 hostname multipathd: sdx: tur checker reports path is down
Sep 7 16:08:54 hostname multipathd: checker failed path 65:112 in map
L116_hostas03
Sep 7 16:08:54 hostname multipathd: L116_hostas03: remaining active paths:
11
Sep 7 16:08:54 hostname multipathd: sdab: tur checker reports path is down
Sep 7 16:08:54 hostname multipathd: checker failed path 65:176 in map
L116_hostas03
[...]
Sep 7 16:08:54 hostname multipathd: L111_hostsod: remaining active paths: 7
Sep 7 16:08:54 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:54 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:54 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:54 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:54 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:54 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:54 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:54 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:54 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:54 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:54 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:54 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:54 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep 7 16:08:54 hostname iscsid: conn 0 login rejected: target error (03/01)
[...]
Sep 7 20:47:59 hostname multipathd: sdag: tur checker reports path is up
Sep 7 20:47:59 hostname multipathd: 66:0: reinstated
Sep 7 20:47:59 hostname multipathd: L116_hostas03: remaining active paths:
16
Sep 7 20:47:59 hostname multipathd: sdai: tur checker reports path is up
Sep 7 20:47:59 hostname multipathd: 66:32: reinstated
Sep 7 20:47:59 hostname multipathd: L117_hostas04: remaining active paths:
14
Sep 7 20:47:59 hostname multipathd: sdaj: tur checker reports path is up
Sep 7 20:47:59 hostname multipathd: 66:48: reinstated
Sep 7 20:47:59 hostname multipathd: L117_hostas04: remaining active paths:
15
Sep 7 20:48:00 hostname multipathd: sdak: tur checker reports path is up
Sep 7 20:48:00 hostname multipathd: 66:64: reinstated
Sep 7 20:48:00 hostname multipathd: L117_hostas04: remaining active paths:
16

Mit freundlichen Grüßen
Ulrich Windl

Ulrich Windl

unread,
Sep 8, 2010, 9:11:37 AM9/8/10
to open-...@googlegroups.com
Sorry,

this message was intended for someone else. E-Mail program tricked me. -- Ulrich

>>> "Ulrich Windl" <Ulrich...@rz.uni-regensburg.de> schrieb am 08.09.2010 um
13:51 in Nachricht <4C8794DA0200...@gwsmtp1.uni-regensburg.de>:


Reply all
Reply to author
Forward
0 new messages