I started the dd at 07:13:51
Cable pulled at:
Mar 1 07:14:27 bentCluster-1 kernel: connection4:0: ping timeout of
5 secs expired, recv timeout 5, last rx 4884304, last ping 4889304,
now 4894304
ISCIS errors at:
Mar 1 07:14:28 bentCluster-1 iscsid: Kernel reported iSCSI connection
4:0 error (1011) state (3)
SCSI error and multipath failures at:
Mar 1 07:15:35 bentCluster-1 kernel: session2: session recovery
timed out after 15 secs
Mar 1 07:15:35 bentCluster-1 kernel: sd 3:0:0:1: SCSI error: return
code = 0x000f0000
Mar 1 07:15:35 bentCluster-1 kernel: end_request: I/O error, dev sdf,
sector 3164079
Mar 1 07:15:35 bentCluster-1 kernel: device-mapper: multipath:
Failing path 8:80.
And then I/O starts again on the device I am sending I/O down.
Finally the other devices fail:
Mar 1 07:15:48 bentCluster-1 kernel: device-mapper: multipath:
Failing path 8:112.
The entire dd took 138 seconds. It looks like the delay is in the
iSCSI layer. It took from 07:14:28 to 07:15:35 for the iSCSI session
to fail.
I am using the timeouts:
● node.session.timeo.replacement_timeout = 15
● node.conn[0].timeo.noop_out_timeout = 5
● node.conn[0].timeo.noop_out_interval = 5
http://kbase.redhat.com/faq/docs/DOC-2877
So I guess I have two questions:
1. Based on my timeouts I would think that my session would time out
after 15 seconds. Anyone have an idea why is it taking 67 seconds?
Am I missing any other timeout values?
2. In a perfect world what is the best case scenario for the failure
of my iSCSI session?
Thanks in advance.
-Ben
Yes. It should timeout about 15 secs after you see
> Mar 1 07:14:27 bentCluster-1 kernel: connection4:0: ping timeout of
> 5 secs expired, recv timeout 5, last rx 4884304, last ping 4889304,
> now 4894304
You might be hitting a bug where the network layer gets stuck trying to
send data. I attached a patch that should fix the problem.
If you do not know how to build a RHEL kernel let me know the arch you
are using and I can build a kernel here (it takes about a day).
> after 15 seconds. Anyone have an idea why is it taking 67 seconds?
> Am I missing any other timeout values?
No. The ones you have set are it.
>
> 2. In a perfect world what is the best case scenario for the failure
> of my iSCSI session?
>
It should work like in that doc.
wouldn't the abort timeout also have an effect here? or will iSCSI fail
the coming abort (that the mid-layer sends when it gets an error sending
a SCSI command) immediately?
--guy
Doing some multipath testing with iscsi/tcp I didn't hit this bug, any hint
what does it take to have this come into play? I did failover on silent line
(other then nops) and during I/O.
Mike, is this patch production ready? if yes, are you pushing it upstream?
Or.
In the case in this thread abort timeout does not come into play,
because when the iscsi layer's nop times out it will fail the session
right away and that will prevent the scsi layer's eh (aborts, device
reset, target reset, etc) from starting up.
The problem I am referring to is the one we are discussing in the
"[PATCH] decrease sndtmo" thread. The problem is that if there is a
problem at the same time the write space window is closed we get caught
in a wait for at least one sndtmo period (sometimes more if we were in
the middle of a send).
> (other then nops) and during I/O.
>
> Mike, is this patch production ready? if yes, are you pushing it upstream?
>
It is in scsi-misc for the next feature window. I think James has sent
it already.
Mar 1 12:32:37 bentCluster-1 kernel: tg3: eth0: Link is down.
Mar 1 12:33:03 bentCluster-1 multipathd: checker failed path 8:224 in
map mpath0
Mar 1 12:33:03 bentCluster-1 kernel: end_request: I/O error, dev sdo,
sector 1249431
Mar 1 12:33:03 bentCluster-1 multipathd: mpath0: remaining active
paths: 1
I ended up setting:
[root@bentCluster-1 ~]# echo noop > /sys/block/sdn/queue/scheduler
[root@bentCluster-1 ~]# echo noop > /sys/block/sdo/queue/scheduler
[root@bentCluster-1 ~]# echo 64 > /sys/block/sdn/queue/max_sectors_kb
[root@bentCluster-1 ~]# echo 64 > /sys/block/sdo/queue/max_sectors_kb
[root@bentCluster-1 ~]# echo "5" > /sys/block/sdn/device/timeout
[root@bentCluster-1 ~]# echo "5" > /sys/block/sdo/device/timeout
I couldn't get it under 90 seconds without "/sys/block/sdn/device/
timeout" being set and in my best test I hit 26 seconds. I have a
couple questions:
1. Do I need the scsi timeout to be turned down or could I be hitting
the bug Mike mentioned?
2. The patch that Mike attached to this tread, is there a Red Hat BZ
associated with it so I can track its progress? If not should I open
a BZ?
3. In a best case scenario what kind of failover time can I expect
with multipath and iSCSI? I see about 25-30 seconds, is this
accurate? I saw 3 second failover time using bonded NICs instead of
dm-multipath, is there any specific reason to use multipathd instead
of channel bonding?
Thanks for all the help everyone!
-Ben
It looks like we have two bugs.
1. We can get stuck in the network code.
2. There is a race where the session->state can get reset due to the
xmit thread throwing an error after we have set the session->state but
before we have set the stop_stage.
The attached patch for RHEL 5.5 should fix them all.
Hello,
Will this patch be in the next RHEL 5.5 beta kernel? Easier to test if there's
no need to build custom kernel :)
-- Pasi
I am not sure if it will be in the next 5.5 beta. It should be in 5.5
though. Do you have a bugzilla account? I made this bugzilla
https://bugzilla.redhat.com/show_bug.cgi?id=570681
You can add yourself to it and when the patch is merged you will get a
notification and a link to a test kernel.
If you do not have a bugzilla account, just let me know and I will ping
you when it is available in a test kernel.
I just added myself to the bug. Thanks!
-- Pasi
Hi Mike,
The bugzilla ticket requests a merge of two git commits, but neither of those
contain the libiscsi.c change that addresses bug #2. Was this a mistake, or did
you deliberately omit that part of your speed-up-conn-fail-take3.patch when you
raised the ticket?
TIA,
Alex
Hey,
It was laziness. I did not update the bugzilla. When I made it, I
thought we were only hitting #1 (this was the first patch I sent in this
thread). But when I was testing those 2 patches with RHEL 5, I finally
hit the problem that bet was hitting. When I figured out that we were
hitting #2, I made the second patch in this thread. I then just did not
update the bugzilla with the new patch. For RHEL I ended up sending the
second patch though.
Thanks for the clarification. Is the fix for #2 being upstreamed? If so, is
there a git commit I can reference? (This will make it easier for us to drop
the patch when we pull a kernel which has the fix in it.)
Thanks in advance,
Alex
I sent it to linux-scsi/James a couple days after I sent the patch in
this thread. It is not merged yet.
> is there a git commit I can reference? (This will make it easier for us
> to drop the patch when we pull a kernel which has the fix in it.)
Do you want me to cc you on all future iscsi patches that go upstream?
When James merges it and sends it to linus, then I get a automated
message from him. If I cc you, you can get one too.
>
> Thanks in advance,
>
> Alex
failover time = nop timout + nop interval + replacement_timeout
seconds + scsi block device timeout(/sys/block/sdX/device/timeout)
Is there anything else that I am missing?
-b
/sys/block/sdX/device/timeout is the scsi cmd timeout. It only comes
into play if you have nops off or have their timers set higher than the
scsi cmd timeout (you do not want to do this). When using nops if they
timeout then if the scsi cmd timer fires, the iscsi code would basically
tell the scsi layer they it is handling the problem so do not run the
scsi error handler.
So it is:
failover time = nop timout + nop interval + replacement_timeout
or
/sys/block/sdX/device/timeout + replacement_timeout + min(abort, lun
reset timeoutt, target reset timeout).