[Lustre-discuss] How do you make an MGS/OSS listen on 2 NICs?

462 views
Skip to first unread message

Lundgren, Andrew

unread,
Jan 15, 2008, 12:28:33 PM1/15/08
to Lustre-...@clusterfs.com
I am running on CentOS 5 distribution without adding any updates from CentOS. I am using the lustre 1.6.4.1 kernel and software.
 
I have two NICs that run though different switches. 
 
I have the lustre options in my modprobe.conf to look like this:
 
options lnet networks=tcp0(eth1,eth0)
 
My MGS seems to be only listening on the first interface however.
 
When I try and ping the 1st interface (eth1) , it works when I go for the second (eth0) it does not.
 
failed to ping 192.168.135.80@tcp: Input/output error
 
The following is in /var/log/messages

Jan 15 17:18:15 dint0001 kernel: LustreError: 120-3: Refusing connection from 192.168.135.80 for 192.168.135.80@tcp: No matching NI
Jan 15 17:18:15 dint0001 kernel: LustreError: 3251:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -104 reading HELLO from 192.168.135.80
Jan 15 17:18:15 dint0001 kernel: LustreError: 11b-b: Connection to
192.168.135.80@tcp at host 192.168.135.80 on port 988 was reset: is it running a compatible version of Lustre and is 192.168.135.80@tcp one of its NIDs?

Where/How do I configure it to listen on both devices?
 
Thank you!
 
--
Andrew Lundgren
 


Klaus Steden

unread,
Jan 15, 2008, 1:49:30 PM1/15/08
to Lundgren, Andrew, Lustre-...@clusterfs.com

Try putting quotes around the argument to the ‘networks=’ statement. If you’ve got only eth0 and eth1 in your system, you don’t need to specify them explicitly, either, Lustre will automatically bind all available interfaces.

Here’s what my config looks like:

-- cut --
[root@mds-0-0 ~]# lctl list_nids
172.16.129.252@tcp
172.16.128.252@tcp1
root@mds-0-0 ~]# lctl ping 172.16.128.250@tcp1
12345-0@lo
12345-172.16.129.250@tcp
12345-172.16.128.250@tcp1
[root@mds-0-0 ~]# lctl ping 172.16.128.250@tcp0
^C
[root@mds-0-0 ~]# lctl ping 172.16.129.250@tcp0
12345-0@lo
12345-172.16.129.250@tcp
12345-172.16.128.250@tcp1
[root@mds-0-0 ~]# lctl ping 172.16.129.250@tcp1
^C
[root@mds-0-0 ~]# grep lnet /etc/modprobe.conf
options lnet networks="tcp0(eth0),tcp1(bond0)"
-- cut --

hth,
Klaus

On 1/15/08 9:28 AM, "Lundgren, Andrew" <Andrew....@Level3.com>did etch on stone tablets:

I am running on CentOS 5 distribution without adding any updates from CentOS. I am using the lustre 1.6.4.1 kernel and software.

I have two NICs that run though different switches.  

I have the lustre options in my modprobe.conf to look like this:

options lnet networks=tcp0(eth1,eth0)

My MGS seems to be only listening on the first interface however.

When I try and ping the 1st interface (eth1) , it works when I go for the second (eth0) it does not.

# lctl ping 192.168.135.81@tcp
12345-0@lo
12345-192.168.135.81@tcp
# lctl ping 192.168.135.80@tcp
failed to ping 192.168.135.80@tcp: Input/output error

The following is in /var/log/messages

Jan 15 17:18:15 dint0001 kernel: LustreError: 120-3: Refusing connection from 192.168.135.80 for 192.168.135.80@tcp <mailto:192.168.135.80@tcp> : No matching NI

Jan 15 17:18:15 dint0001 kernel: LustreError: 3251:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -104 reading HELLO from 192.168.135.80
Jan 15 17:18:15 dint0001 kernel: LustreError: 11b-b: Connection to 192.168.135.80@tcp <mailto:192.168.135.80@tcp> at host 192.168.135.80 on port 988 was reset: is it running a compatible version of Lustre and is 192.168.135.80@tcp <mailto:192.168.135.80@tcp> one of its NIDs?

Where/How do I configure it to listen on both devices?

Thank you!

--
Andrew Lundgren
 




_______________________________________________
Lustre-discuss mailing list
Lustre-...@clusterfs.com
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Lundgren, Andrew

unread,
Jan 15, 2008, 3:35:52 PM1/15/08
to Klaus Steden, Lustre-...@clusterfs.com
We actually set lustre up to run on some non-routable aliased IP addresses, so we wanted to limit it two two specific NIC cards.  I modified my line to look like yours, where I specify tcp1 for the second interface and now it is pingable there, but I am not positive it is correct.  When I didn't have any lnet options, it just grabbed the 1st NIC it saw and ignored the rest.
 
--
Andrew


From: Klaus Steden [mailto:klaus....@thomson.net]
Sent: Tuesday, January 15, 2008 11:50 AM
To: Lundgren, Andrew; 'Lustre-...@clusterfs.com'
Subject: Re: [Lustre-discuss] How do you make an MGS/OSS listen on 2 NICs?

Isaac Huang

unread,
Jan 16, 2008, 8:34:19 AM1/16/08
to Lundgren, Andrew, Lustre-...@clusterfs.com
On Tue, Jan 15, 2008 at 10:28:33AM -0700, Lundgren, Andrew wrote:
> I am running on CentOS 5 distribution without adding any updates from
> CentOS. I am using the lustre 1.6.4.1 kernel and software.
>
>
>
> I have two NICs that run though different switches.
>
>
>
> I have the lustre options in my modprobe.conf to look like this:
>
>
>
> options lnet networks=tcp0(eth1,eth0)
>

This way of interface bonding is now a deprecated lnet feature. Please
refer to:
http://manual.lustre.org/manual/LustreManual16_HTML/DynamicHTML-13-1.html

Isaac

Lundgren, Andrew

unread,
Jan 16, 2008, 11:03:59 AM1/16/08
to He.H...@sun.com, Lustre-...@clusterfs.com
So the only way to use two nics at once is to bond? I am more for redundancy rather than increased throughput.

> -----Original Message-----
> From: He.H...@Sun.COM [mailto:He.H...@Sun.COM]
> Sent: Wednesday, January 16, 2008 6:34 AM
> To: Lundgren, Andrew
> Cc: 'Lustre-...@clusterfs.com'
> Subject: Re: [Lustre-discuss] How do you make an MGS/OSS
> listen on 2 NICs?
>

Herb Wartens

unread,
Jan 16, 2008, 2:23:30 PM1/16/08
to Lustre Discuss
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Andrew,
I have not used lustre-1.6.4.X yet, but in previous versions (and most likely the version you are using)
Lustre actually listens on all interfaces no matter what you specify in the modprobe.conf. You can verify this
by looking at the netstat output for port 988 and look for what ports you are listening on. We here at LLNL
regularly use multiple interfaces.
I believe that the issue you are referring to is a bug in the lctl ping code where the ping only responds
over the first network device specified for a particular lnd. As long as you have properly configured your
host routes so that you can ping both interfaces from the other node you should be fine. IMHO this should
just be fixed in lnet so you can do an lctl ping from any endpoint to any other endpoint.

# ilc6 /root > cat /etc/modprobe.conf
options lnet networks="tcp0(eth2,eth3)"

# ilc6 /root > netstat -a -t -n | grep 988 | grep LIST
tcp 0 0 0.0.0.0:988 0.0.0.0:* LISTEN

# ilc6 /root > cat /etc/hosts | grep ilc7
172.16.101.7 ilc7-lnet0 ilc7-eth2
172.16.102.7 ilc7-lnet1 ilc7-eth3

# ilc6 /root > lctl ping 172.16.101.7@tcp0
12345-0@lo
12345-172.16.101.7@tcp
# ilc6 /root > lctl ping 172.16.102.7@tcp0
failed to ping 172.16.102.7@tcp: Input/output error

# ilc6 /root > ping -c 1 172.16.101.7
PING 172.16.101.7 (172.16.101.7) 56(84) bytes of data.
64 bytes from 172.16.101.7: icmp_seq=1 ttl=64 time=0.143 ms

- --- 172.16.101.7 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.143/0.143/0.143/0.000 ms
# ilc6 /root > ping -c 1 172.16.102.7
PING 172.16.102.7 (172.16.102.7) 56(84) bytes of data.
64 bytes from 172.16.102.7: icmp_seq=1 ttl=64 time=0.094 ms

- --- 172.16.102.7 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.094/0.094/0.094/0.000 ms

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFHjlmyP/62XqEEbMYRCvegAKCK3z6aFIVtGe/O5ruqStf6/tZoLQCcD1L8
lfEg/WwNivOlMxHDdnWpgcA=
=0xbK
-----END PGP SIGNATURE-----

Cliff White

unread,
Jan 16, 2008, 6:11:31 PM1/16/08
to Lundgren, Andrew, Lustre-...@clusterfs.com
Lundgren, Andrew wrote:
> So the only way to use two nics at once is to bond? I am more for redundancy rather than increased throughput.

Yes. When we first did this with Lustre bonding was not as good, bonding
works fine now so we did not see the need to continue to duplicate
function.

cliffw

Isaac Huang

unread,
Jan 16, 2008, 11:01:32 PM1/16/08
to Herb Wartens, Lustre Discuss
On Wed, Jan 16, 2008 at 11:23:30AM -0800, Herb Wartens wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
>
> Andrew,
> I have not used lustre-1.6.4.X yet, but in previous versions (and most likely the version you are using)
> Lustre actually listens on all interfaces no matter what you specify in the modprobe.conf. You can verify this
> by looking at the netstat output for port 988 and look for what ports you are listening on. We here at LLNL
> regularly use multiple interfaces.
> I believe that the issue you are referring to is a bug in the lctl ping code where the ping only responds
> over the first network device specified for a particular lnd. As long as you have properly configured your
> host routes so that you can ping both interfaces from the other node you should be fine. IMHO this should
> just be fixed in lnet so you can do an lctl ping from any endpoint to any other endpoint.

I don't think it's a lctl ping bug.

>
> # ilc6 /root > cat /etc/modprobe.conf
> options lnet networks="tcp0(eth2,eth3)"

This config gives the node only one NID: ip_of_eth2@tcp0. You can
verify it by 'lctl list_nids' on the node.

>
> # ilc6 /root > netstat -a -t -n | grep 988 | grep LIST
> tcp 0 0 0.0.0.0:988 0.0.0.0:* LISTEN
>
> # ilc6 /root > cat /etc/hosts | grep ilc7
> 172.16.101.7 ilc7-lnet0 ilc7-eth2
> 172.16.102.7 ilc7-lnet1 ilc7-eth3
>
> # ilc6 /root > lctl ping 172.16.101.7@tcp0
> 12345-0@lo
> 12345-172.16.101.7@tcp

When you lctl ping a node at any one of its NIDs, the ping reply
contains a list of all NIDs of the node. As can be seen from the reply
above, 172.16.101.7@tcp0 has two NIDs: 0@lo and 172.16.101.7@tcp.

So when you tried 'lctl ping 172.16.102.7@tcp0', the ping request
could reach 172.16.102.7, but it was rejected since 172.16.102.7@tcp0
was not one of the node's NIDs.

The socklnd does interface bonding transparently from lnet's
perspective. It exchanges a list of IPs of all NICs under a lnet NID
with peers, and creates connections to all IPs of a peer and thus
aggregates bandwidth. Lnet has no knowledge of this - all it sees is
just one NID, i.e. ip_of_1st_nic@tcp.

Isaac

D. Marc Stearman

unread,
Jan 17, 2008, 11:19:21 AM1/17/08
to Lustre-...@clusterfs.com
If his NICs are connected to two different switches, bonding will not
work. If that is incorrect, please enlighten me.

-Marc

----
D. Marc Stearman
LC Lustre Systems Administrator
ma...@llnl.gov
925.423.9670
Pager: 1.888.203.0641

Lundgren, Andrew

unread,
Jan 17, 2008, 12:06:02 PM1/17/08
to D. Marc Stearman, Lustre-...@clusterfs.com
My NICs are connected to different switches... I am still pondering this one.

> -----Original Message-----
> From: lustre-disc...@clusterfs.com
> [mailto:lustre-disc...@clusterfs.com] On Behalf Of D.
> Marc Stearman
> Sent: Thursday, January 17, 2008 9:19 AM
> To: Lustre-...@clusterfs.com
> Subject: Re: [Lustre-discuss] How do you make an MGS/OSS
> listen on 2 NICs?
>

D. Marc Stearman

unread,
Jan 17, 2008, 12:21:17 PM1/17/08
to Lustre Discuss
Isaac is correct, that you will only have one NID, and it will be the
only one pingable, however you should be able to use both interfaces
with your configuration. First of all, can you ping the IP addr (not
using LNET pings, just ordinary tcp/ip ping)?

As you state, your server has two NICS on two different switches.
Where is the client connected? Does it have two NICS, and are they
on the same switches? Same subnets? A bit more information about
how your network is laid out would be useful.

-Marc

----
D. Marc Stearman
LC Lustre Systems Administrator
ma...@llnl.gov
925.423.9670
Pager: 1.888.203.0641

Lundgren, Andrew

unread,
Jan 17, 2008, 1:32:29 PM1/17/08
to D. Marc Stearman, Lustre Discuss
The clients and servers both have two nics connected to different switches using different subnets. The clients and servers both have their eth0 devices connected to the same switch on one subnet and their eth1 on a second switch using a second subnet.

They can talk to each other over the same switch or via routing between the switches.

client eth0 <-same switch-> server eth0
client eth1 <-same switch-> server eth1
client eth0 <-via routing between switches-> server eth1
client eth1 <-via routing between switches-> server eth0

--
Andrew

D. Marc Stearman

unread,
Jan 17, 2008, 4:40:48 PM1/17/08
to Lustre Discuss
That sounds like a proper setup. It's very similar to what we have
here. I believe the socklnd will make connections using both NICs on
both client and server. I saw in another message that you had
iptables on. Turning that off or allowing the port Andreas suggested
should fix your issue. Are you still seeing connection problems with
the second NIC?

-Marc

----
D. Marc Stearman
LC Lustre Systems Administrator
ma...@llnl.gov
925.423.9670
Pager: 1.888.203.0641

Herb Wartens

unread,
Jan 17, 2008, 4:59:46 PM1/17/08
to Lustre Discuss
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Isaac,
My mistake. I was thinking this issue was similar to an lctl issue that I have been seeing for
quite a while now, but as you say this is not the case since the node only has a single NID. I
just oversimplified the problem. The case that I am referring to is what I think is a bug in the
lctl ping code.

Here is an example below of what I was referring to:

Node1:
ilc6 a lustre server that has two separate ethernet devices eth2 and eth3

# ilc6 /root > cat /etc/modprobe.conf

options lnet networks="tcp0(eth2,eth3)" \
routes="elan0 172.16.3.[4-6]@tcp0"

# ilc6 /root > lctl list_nids
172.16.101.6@tcp

Node2:
adev4 is a lustre router that has two separate ethernet devices and and elan device

# adev4 /root > cat /etc/modprobe.conf
options lnet networks="tcp0(eth0,eth1),elan0" \
forwarding="enabled"

# adev4 /root > lctl list_nids
172.16.3.4@tcp
4@elan

Node3:
adev3 is a lustre client with only an elan device

# adev3 /root > lctl list_nids
3@elan


Now the actual problem here is that
1) ilc6 can only successfully issue an lctl ping to the tcp nid even though it knows
how to get to the elan0 network.
2) adev3 can only successfully issue an lctl ping to the elan nid even though it knows
how to get to the tcp0 network.

FROM Node1:
# ilc6 /root > lctl ping 172.16.3.4@tcp0
12345-0@lo
12345-172.16.3.4@tcp
12345-4@elan

# ilc6 /root > lctl ping 3@elan
12345-0@lo
12345-3@elan

ERROR:
# ilc6 /root > lctl ping 4@elan
failed to ping 4@elan: Input/output error

FROM Node3:
# adev3 /root > lctl ping 4@elan
12345-0@lo
12345-172.16.3.4@tcp
12345-4@elan

# adev3 /root > lctl ping 172.16.101.6@tcp
12345-0@lo
12345-172.16.101.6@tcp

ERROR:
# adev3 /root > lctl ping 172.16.3.4@tcp
failed to ping 172.16.3.4@tcp: Input/output error

This is the error I was mistakenly trying to describe yesterday.

> --- 172.16.101.7 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.143/0.143/0.143/0.000 ms
> # ilc6 /root > ping -c 1 172.16.102.7
> PING 172.16.102.7 (172.16.102.7) 56(84) bytes of data.
> 64 bytes from 172.16.102.7: icmp_seq=1 ttl=64 time=0.094 ms
>

iD8DBQFHj8/SP/62XqEEbMYRCtrrAKC4q4EWSdmjKmLaR9itrEoa4gdd0gCgn32S
OI4G8yg8Czvy1lsLNYHqBcY=
=R7ZB

Isaac Huang

unread,
Jan 18, 2008, 10:23:27 AM1/18/08
to Herb Wartens, Lustre Discuss
On Thu, Jan 17, 2008 at 01:59:46PM -0800, Herb Wartens wrote:
> ......

The router rejected the ping request message because it believed that
ilc6 could reach him via another NID (172.16.3.4@tcp0) which was
closer to ilc6 than its elan NID.

You should see a message on router dmesg that read:
172.16.101.6@tcp, src 172.16.101.6@tcp: Bad dest nid 4@elan ......

I don't think it's a lnet bug.

Isaac
_______________________________________________
Lustre-discuss mailing list

Lustre-...@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Herb Wartens

unread,
Jan 18, 2008, 11:54:22 AM1/18/08
to Lustre Discuss

Correct that is how it works now, however I believe we have different ideas
of what lctl ping should mean. In my opinion it should not matter if you can get
to a node via another nid that is "closer." I feel that the lctl ping should
show lnet connectivity regardless of the fact that there is a "closer" network.
Maybe just me...=)

-Herb

Lundgren, Andrew

unread,
Jan 21, 2008, 5:24:10 PM1/21/08
to D. Marc Stearman, Lustre Discuss
I am still unclear on how this should be configured.

All of my clients, OSS and MGS servers have two nics, on different subnets, connected to different switches. This is done for network redundancy. We are also using aliased IP addresses for lustre in the 192.168 address space. We are not bonding interfaces.

In my /etc/modprobe.conf file on all of the machines I have the following line:

options lnet networks=tcp0(eth1:0),tcp1(eth0:0)

When I format my OSTs, I have used:

mkfs.lustre --fsname stage --ost --mgsnode=192.168.135.999@tcp0 --param="failover.mode=failout" /dev/md6

Where 999 is the IP on the aliased nic. (Should I use both tcp0 and tcp1 with a comma?)

When I mount the clients in my /etc/fstab, I have done this:

192.168.136.81@tcp0,192.168.135.80@tcp1:/stage /stage lustre defaults,_netdev 0 0

My intent is that if one path to the MGS fails, the second one will be used.

Am I doing this correct, or am I off base here?

Thanks!

--
Andrew Lundgren

D. Marc Stearman

unread,
Jan 22, 2008, 12:03:21 PM1/22/08
to Lundgren, Andrew, Lustre Discuss
As far as I know, LNET will use the shortest path on the network, so
if you have two equivalent tcp networks, tcp0 and tcp1, LNET will
just use the first one. If it fails, it should use the second one.
If both NICs are in the same tcp network, LNET should use both.
Whether you decide on one or two LNET networks is up to you.
Regardless, your fstab entry is not correct. You should only list
one server as the host:

> 192.168.136.81@tcp0:/stage /stage lustre
> defaults,_netdev 0 0

or

> 192.168.135.80@tcp1:/stage /stage lustre
> defaults,_netdev 0 0

If one NIC fails, while a client is not mounted, you would have to
change the fstab to remount. If lustre
is already mounted, it should just use the other LNET network.

-Marc

----
D. Marc Stearman
LC Lustre Systems Administrator
ma...@llnl.gov
925.423.9670
Pager: 1.888.203.0641

Lundgren, Andrew

unread,
Jan 22, 2008, 1:12:03 PM1/22/08
to D. Marc Stearman, lustre-...@clusterfs.com
>
> As far as I know, LNET will use the shortest path on the
> network, so if you have two equivalent tcp networks, tcp0 and
> tcp1, LNET will just use the first one. If it fails, it
> should use the second one.
> If both NICs are in the same tcp network, LNET should use both.
> Whether you decide on one or two LNET networks is up to you.

So setting up tcp0(eth1:0,eth0:0) and tcp0(eth1:0),tcp1(eth0:0) are functionally equivalent for what I am doing?

> Regardless, your fstab entry is not correct. You should only
> list one server as the host:
>
> > 192.168.136.81@tcp0:/stage /stage lustre
> > defaults,_netdev 0 0
>
> or
>
> > 192.168.135.80@tcp1:/stage /stage lustre
> > defaults,_netdev 0 0
>
> If one NIC fails, while a client is not mounted, you would
> have to change the fstab to remount. If lustre is already
> mounted, it should just use the other LNET network.

Then as long as the box does not reboot while the network is down, the mount should still function, just over the secondary path?

Thank you for the clarification.


--
Andrew

D. Marc Stearman

unread,
Jan 22, 2008, 1:19:10 PM1/22/08
to Lundgren, Andrew, lustre-...@clusterfs.com
Functionally, it will work the same, but not performance wise.

tcp0(eth1:0),tcp1(eth0:0) will create two LNET networks, and it will
use the shorter of the two. If they are the same in terms of network
hops from client to server, it will use the first one, and only the
first one. This setup would create two NIDs on the servers, so you
could use either fstab entry discussed before.

tcp0(eth1:0,eth0:0) will create one LNET network, and use all
interfaces between clients and servers. You would have double the
bandwidth. This setup would create only one NID on the servers, and
you would use the NID assiociated with eth1:0 in your fstab entries.
If the NIC (or network) for that NID failed on the mgs/mds you would
not be able to mount new clients, but your filesystem should still
work, as it will mark that route down and use the other interface.

-Marc

----
D. Marc Stearman
LC Lustre Systems Administrator
ma...@llnl.gov
925.423.9670
Pager: 1.888.203.0641


On Jan 22, 2008, at 10:12 AM, Lundgren, Andrew wrote:

>>
>> As far as I know, LNET will use the shortest path on the
>> network, so if you have two equivalent tcp networks, tcp0 and
>> tcp1, LNET will just use the first one. If it fails, it
>> should use the second one.
>> If both NICs are in the same tcp network, LNET should use both.
>> Whether you decide on one or two LNET networks is up to you.
>

> So setting up and are functionally equivalent for what I am doing?

Reply all
Reply to author
Forward
0 new messages