[Lustre-discuss] Multi-Rail Configurations on a Multi-Port IB HCA

132 views
Skip to first unread message

Dardo D Kleiner - CONTRACTOR

unread,
Nov 13, 2009, 3:34:14 PM11/13/09
to Lustre discuss
Mellanox ConnectX MT25418, two ports, each connected to a separate
IB fabric - ib0 and ib1 have distinct IP subnets, each connected
to a separate Lustre router.

ibstat:
CA 'mlx4_0'
CA type: MT25418
Number of ports: 2
Firmware version: 2.7.0
Hardware version: a0
Port 1:
State: Active
Physical state: LinkUp
Rate: 20
Base lid: 302
LMC: 0
SM lid: 2
Capability mask: 0x02510868
Port 2:
State: Active
Physical state: LinkUp
Rate: 20
Base lid: 5
LMC: 0
SM lid: 1
Capability mask: 0x02510868

ip ad ls:
4: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc pfifo_fast state UP qlen 4096
inet xxx.xxx.182.130/26 brd xxx.xxx.182.191 scope global ib0
5: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc pfifo_fast state UP qlen 4096
inet xxx.xxx.182.194/26 brd xxx.xxx.182.255 scope global ib1

/etc/modprobe.d/lustre:
options lnet \
ip2nets=" \
o2ib1 xxx.xxx.[176-177].[0-255];
o2ib3(ib0) xxx.xxx.182.[128-191];
o2ib4(ib1) xxx.xxx.182.[192-255]"
routes=" \
o2ib1 xxx.xxx.182.129@o2ib3,xxx.xxx.182.193@o2ib4"

dmesg:
.
.
Lustre: Listener bound to ib0:xxx.xxx.182.130:987:mlx4_0
.
.


Why don't I also get "Listener bound to ib1:xxx.xxx.182.194:987:mlx4_0"?

- Dardo
_______________________________________________
Lustre-discuss mailing list
Lustre-...@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Isaac Huang

unread,
Nov 14, 2009, 8:13:55 PM11/14/09
to Dardo D Kleiner - CONTRACTOR, Lustre discuss
On Fri, Nov 13, 2009 at 03:34:14PM -0500, Dardo D Kleiner - CONTRACTOR wrote:
> Mellanox ConnectX MT25418, two ports, each connected to a separate
> IB fabric - ib0 and ib1 have distinct IP subnets, each connected
> to a separate Lustre router.
> ......

> ip ad ls:
> 4: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc pfifo_fast state UP qlen 4096
> inet xxx.xxx.182.130/26 brd xxx.xxx.182.191 scope global ib0
> 5: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc pfifo_fast state UP qlen 4096
> inet xxx.xxx.182.194/26 brd xxx.xxx.182.255 scope global ib1
>
> /etc/modprobe.d/lustre:
> options lnet \
> ip2nets=" \
> o2ib1 xxx.xxx.[176-177].[0-255];
> o2ib3(ib0) xxx.xxx.182.[128-191];
> o2ib4(ib1) xxx.xxx.182.[192-255]"
> routes=" \
> o2ib1 xxx.xxx.182.129@o2ib3,xxx.xxx.182.193@o2ib4"
>
> dmesg:
> .
> .
> Lustre: Listener bound to ib0:xxx.xxx.182.130:987:mlx4_0
> .
> .
>
>
> Why don't I also get "Listener bound to ib1:xxx.xxx.182.194:987:mlx4_0"?

What did 'lctl list_nids' show? It looked like only one NI was
initialized.

Isaac

Dardo D Kleiner - CONTRACTOR

unread,
Nov 14, 2009, 11:03:12 PM11/14/09
to Lustre discuss

Only the one o2ib3 NID was listed, I did check that. So its your belief that
I should have two distinct NIDs here? Should I be able to route over multiple
lnets? On systems that have two HCA's I certainly do see multiple NIDs, this
is the first system I've configured with one HCA that has two ports...

The filesystem wouldn't mount with this configuration, obviously. One other bit
of information is that it also wouldn't work if I only specified o2ib4(ib1),
without the o2ib3(ib0) line (though now I realize I didn't to try set the
ko2iblnd ipif_name to ib1 in that test). It does work if I only have the
o2ib3 lnet definition.

- Dardo

Dardo D Kleiner - CONTRACTOR

unread,
Nov 16, 2009, 4:38:03 PM11/16/09
to Lustre discuss
Stand down. Don't know what was wrong with my configuration at first,
but it does instantiate the two NIDs on the host with multiple ports
on a single HCA. Unfortunately,

LustreError: 17771:0:(router.c:464:lnet_check_routes()) Routes to o2ib1 via xxx.xxx.182.193@o2ib4 and xxx.xxx.182.129@o2ib3 not supported

So I couldn't have done what I wanted to anyway, the answer to my
question below "Should I be able to route over multiple lnets?" is
clearly no...

- Dardo

Isaac Huang

unread,
Nov 16, 2009, 7:03:17 PM11/16/09
to Dardo D Kleiner - CONTRACTOR, Lustre discuss
On Mon, Nov 16, 2009 at 04:38:03PM -0500, Dardo D Kleiner - CONTRACTOR wrote:
> Stand down. Don't know what was wrong with my configuration at first,
> but it does instantiate the two NIDs on the host with multiple ports
> on a single HCA. Unfortunately,
>
> LustreError: 17771:0:(router.c:464:lnet_check_routes()) Routes to o2ib1 via xxx.xxx.182.193@o2ib4 and xxx.xxx.182.129@o2ib3 not supported

In fact, this limitation could be lifted. The reason it was there was
that upper layers would rely on source NID in lnet messages to
identify clients - i.e., it was assumed that messages from a same
client would carry a same source NID in lnet message headers.

It seems that it's becoming an annoyance as multi-rail configurations
grow more popular.

> So I couldn't have done what I wanted to anyway, the answer to my
> question below "Should I be able to route over multiple lnets?" is
> clearly no...
>
> - Dardo

Dardo D Kleiner - CONTRACTOR

unread,
Nov 16, 2009, 8:01:12 PM11/16/09
to Dardo D Kleiner - CONTRACTOR, Lustre discuss
So are you suggesting I could just comment out the check in router.c?

Isaac Huang

unread,
Nov 16, 2009, 8:06:07 PM11/16/09
to Dardo D Kleiner - CONTRACTOR, Lustre discuss
On Mon, Nov 16, 2009 at 08:01:12PM -0500, Dardo D Kleiner - CONTRACTOR wrote:
> So are you suggesting I could just comment out the check in router.c?

That's enough for lnet but Lustre changes must also be made.

Isaac

Dardo D Kleiner - CONTRACTOR

unread,
Nov 16, 2009, 8:10:17 PM11/16/09
to Dardo D Kleiner - CONTRACTOR, Lustre discuss
In the next hour - before SC'09 opens the doors? ;)
Reply all
Reply to author
Forward
0 new messages