I'm getting my feet wet in the infiniband lake and of course I run into
some problems.
It would seem I got the compilation part of sles11 kernel 2.6.27 +
Lustre 1.8.3 + ofed 1.4.2 right, because it allows me to see and use the
infiniband fabric, and because ko2iblnd loads without any complaints.
In /etc/modprobe.d/lustre (this is a Debian system, hence this subdir of
modprobe-configs), I have
> options ip2nets="o2ib0 192.168.0.[1-5]"
I load lnet and do 'lctl network up', but then 'lctl list_nids' will
invariably give me only
> 192.168.0.1@tcp
no matter how I twist the modprobe-config (ip2nets="o2ib",
network="o2ib", network="o2ib(ib0), etc.)
This is true as long as I have ib0 configured with the IP 192.168.0.1
Once I unconfigure it, I get, quite expectedly,
LNET configure error 100: Network is down
So I can either configure ipoib and bring up the network, but using tcp,
or I don't configure ib0 and then cannot start the network -? ;-{} I
think I'm rather missing something here.
Any clues?
Cheers,
Thomas
_______________________________________________
Lustre-discuss mailing list
Lustre-...@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
Are you trying to connect to Lustre via IB and ethernet? If so your
modprobe config should look like this.
options lnet networks="o2ib0(ib0),tcp0(eth0)"
If you're IB only use.
options lnet networks="o2ib0(ib0)"
If your MDS and OSS servers are on a separate networks you'll need to
do something different.
Let's say the MDS and OSSs are on o2ib0/tcp0 and the clients are on
o2ib1/tcp1. You'll need a router server with separate addresses on
o2ib0 and o2ib1.
Also its important to note that o2ib0 and o2ib1 should be different IP
address spaces.
On the clients.
# I live on o2ib1
options lnet networks="o2ib1(ib0),tcp1(eth0)"
# To get to o2ib0 go through IP.ADD.OF.ROUTER@oi2ib1
options lnet routes="o2ib0 IP.ADD.OF.ROUTER@o2ib1"
On the servers
# I live on o2ib0
options lnet networks="o2ib0(ib0),tcp0(eth0)"
# To get to o2ib1 go through IP.ADD.OF.ROUTER@oi2ib0
options lnet routes="o2ib1 IP.ADD.OF.ROUTER@o2ib0"
IP.ADD.OF.ROUTER@oi2ib0 and IP.ADD.OF.ROUTER@oi2ib1 are different IPs
on distinct networks.
lctl list_nids will show you the lustre nids of the node you're logged
into only.
lctl route_list will show you the lustre routers and the networks that
they bridge.
I hope this was helpful.
Erik
thanks for your advice, esp. on routing - I'll study that carefully once
I get that far.
For now, I was just trying the minimal first steps to get lnet via IB:
- It's all happening on the MGS/MDS, but neither mgs nor mdt yet
mounted, just 'modprobe lnet; lctl network up; lctl list_nids'
- I tried to use IB exclusively.
- options lnet networks="o2ib0(ib0)" doesn't work either (nor
variations thereof)
Regards,
Thomas
--
--------------------------------------------------------------------
Thomas Roth
Department: Informationstechnologie
Location: SB3 1.262
Phone: +49-6159-71 1453 Fax: +49-6159-71 2986
GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1
64291 Darmstadt
www.gsi.de
Gesellschaft mit beschränkter Haftung
Sitz der Gesellschaft: Darmstadt
Handelsregister: Amtsgericht Darmstadt, HRB 1528
Geschäftsführung: Professor Dr. Dr. h.c. Horst Stöcker,
Christiane Neumann, Dr. Hartmut Eickhoff
Vorsitzende des Aufsichtsrates: Dr. Beatrix Vierkorn-Rudolph
Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt
If you see a ib0 device and it has a valid IP lnet should pick it up with
options lnet networks="o2ib0(ib0)"
What errors are you seeing?
Erik
Here's a one thing to check, (if you're trying to replace a tcp network
with an IB one, on an existing lustre filesystem):
With the lustre mounts unmounted, run:
tunefs.lustre --dryrun <DEV_PATH> | grep Parameters
check to ensure that parameters like 'mgsnode=IP' end in @o2ib and not
@tcp. If they do, erase and rewrite them.
Cheers,
Adam
--
Adam Munro
System Administrator | SHARCNET | http://www.sharcnet.ca
Compute Canada | http://www.computecanada.org
519-888-4567 x36453
I have varied that already: "--mgsnode=IB" or "--mgsnode=IB
--failnode=tcp" etc. in the config of the MDT.
But I don't go as far as mounting either MGS or MDT.
I'm just loading lnet and then use 'lctl' to start the network:
"lctl network up"
"lctl list_nids"
Whatever I put in modprobe.conf, I get the answer
"192.168.0.1@tcp"
Regards,
Thomas
On Tue, Jun 22, 2010 at 04:19:08PM +0200, Thomas Roth wrote:
> I'm getting my feet wet in the infiniband lake and of course I run into
> some problems.
> It would seem I got the compilation part of sles11 kernel 2.6.27 +
> Lustre 1.8.3 + ofed 1.4.2 right, because it allows me to see and use the
> infiniband fabric, and because ko2iblnd loads without any complaints.
>
> In /etc/modprobe.d/lustre (this is a Debian system, hence this subdir of
> modprobe-configs), I have
> > options ip2nets="o2ib0 192.168.0.[1-5]"
If this is a verbatim copy from the config file, then you're lacking the name
of the module, ie. 'options lnet ip2nets=...'. Maybe also double-check with
'modprobe -c' that options get passed on as intended.
> I load lnet and do 'lctl network up', but then 'lctl list_nids' will
> invariably give me only
> > 192.168.0.1@tcp
> no matter how I twist the modprobe-config (ip2nets="o2ib",
> network="o2ib", network="o2ib(ib0), etc.)
>
> This is true as long as I have ib0 configured with the IP 192.168.0.1
> Once I unconfigure it, I get, quite expectedly,
> LNET configure error 100: Network is down
So ib0 is the only network interface in the system? In this case, I could
imagine that ksocklnd gets loaded unconditionally, always grabs the first
interface it can get hold of, and just doesn't leave any IB interface for
ko2iblnd when it eventually gets loaded. This is just a shot in the dark, but
you could check by manually loading modules via insmod.
Regards,
Daniel.
I did get my infiniband lnet up and working - using the modprobe line
>> options lnet networks=o2ib0(ib0) routes="tcp1 192.168.0.3@o2ib0"
The only thing I did was to throw away and write again the lustre -
modprobe.d file with this line, several times. Finally it worked.
Cheers,
Thomas
--
--------------------------------------------------------------------
Thomas Roth
Department: Informationstechnologie
Location: SB3 1.262
Phone: +49-6159-71 1453 Fax: +49-6159-71 2986
GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1
64291 Darmstadt
www.gsi.de
Gesellschaft mit beschränkter Haftung
Sitz der Gesellschaft: Darmstadt
Handelsregister: Amtsgericht Darmstadt, HRB 1528
Geschäftsführung: Professor Dr. Dr. h.c. Horst Stöcker,
Christiane Neumann, Dr. Hartmut Eickhoff
Vorsitzende des Aufsichtsrates: Dr. Beatrix Vierkorn-Rudolph
Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt
_______________________________________________