Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH 1/1] ib_srp: Infiniband srp fast failover patch.

66 views
Skip to first unread message

Karandeep Chahal

unread,
May 29, 2012, 5:20:02 PM5/29/12
to
Subject: [PATCH] Infiniband srp fast failover patch. Currently ib_srp does
not do anything on receiving a DREQ from the target, it
only sends a response back. Further it also does not
monitor port (down) events. I have patched srp to remove
scsi devices when a port down event is received or if the
target sends a DREQ. Currently even though the target
notifies the initiator of its intentions of going away, the
initiator ignores that information. Later the initiator
gets upset when the devices "suddenly" disappear resulting
in srp initiating an error recovery process which takes a
long time. This caused high failover latencies as compared
to fibre channel. In my experiments with RHEL 6.0 and 6.2 I
encountered failover time that exceeded 2 minutes and 20
seconds (despite tweaking /etc/multipath.conf and
/sys/block/<>/timeout). With this patch the failover takes
30 seconds. I have tested this patch with and without a
switch.

Yours, etc.
Karan

0001-Infiniband-srp-fast-failover-patch.-Currently-ib_srp.patch

Michael Reed

unread,
May 29, 2012, 6:00:01 PM5/29/12
to
Did you subsequently reconnect the target and confirm appropriate behavior?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Karandeep Chahal

unread,
May 29, 2012, 6:30:02 PM5/29/12
to
Hi Michael,

Yes, I tried reconnecting the targets and removing reinserting ib-srp.

Thanks
Karan

Michael Reed

unread,
May 29, 2012, 7:00:02 PM5/29/12
to
Thank you for clarifying!
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in

David Dillow

unread,
May 30, 2012, 2:00:03 AM5/30/12
to
On Tue, 2012-05-29 at 17:07 -0400, Karandeep Chahal wrote:
> Subject: [PATCH] Infiniband srp fast failover patch.

This conflicts with Bart's patches to improve failover; it will be much
better to use his approach to block the target rather than remove it
wholesale -- we could have lost connectivity as a transient and may get
it back quickly if someone grabbed the wrong cable, etc.

Also, we should only kill the one target on DREQ, and we already have a
pointer to it from the CM context -- no need to search.

It is a good idea to hook into the event mechanism; this is something
I've long wanted to incorporate (as Vu did in OFED). I'm looking at
getting Bart's series to a point I can merge it, and I'll pull in your
ideas -- with credit -- there.

Thanks,
--
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office

Karandeep Chahal

unread,
May 30, 2012, 10:50:02 AM5/30/12
to
Hi Dave,

As long as we get faster failover I am happy with Bart's patch.

Currently when I run IO to several luns over multipath and the preferred
path goes down, the system hangs until the IO fails over. Even ssh'ing
into the systems take 20-30 seconds. I *suspect* that is because IO is
being queued up somewhere which brings the whole system to its knees.

Thank you for looking at the patch.

Thanks
Karan

On 05/30/2012 01:06 AM, David Dillow wrote:
> On Tue, 2012-05-29 at 17:07 -0400, Karandeep Chahal wrote:
>> Subject: [PATCH] Infiniband srp fast failover patch.
> This conflicts with Bart's patches to improve failover; it will be much
> better to use his approach to block the target rather than remove it
> wholesale -- we could have lost connectivity as a transient and may get
> it back quickly if someone grabbed the wrong cable, etc.
>
> Also, we should only kill the one target on DREQ, and we already have a
> pointer to it from the CM context -- no need to search.
>
> It is a good idea to hook into the event mechanism; this is something
> I've long wanted to incorporate (as Vu did in OFED). I'm looking at
> getting Bart's series to a point I can merge it, and I'll pull in your
> ideas -- with credit -- there.
>
> Thanks,
--
0 new messages