Network configuration questions

44 views
Skip to first unread message

代栋

unread,
Oct 11, 2017, 6:11:47 PM10/11/17
to cloudlab-users
Hi, I am running an experiment in Utah APT (experiment Id: daidong-QV29290). It should use multiple network ports. But ifconfig shows only one ip address is assigned for every machine.

 em1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 128.110.96.88  netmask 255.255.252.0  broadcast 128.110.99.255
        inet6 fe80::622b:5407:c1e2:a3a9  prefixlen 64  scopeid 0x20<link>
        ether f0:1f:af:e2:ce:54  txqueuelen 1000  (Ethernet)
        RX packets 16782095  bytes 17025464398 (15.8 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 514559  bytes 217905091 (207.8 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device interrupt 16  
em2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether f0:1f:af:e2:ce:55  txqueuelen 1000  (Ethernet)
        RX packets 190876  bytes 32244168 (30.7 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 33493  bytes 7370700 (7.0 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device interrupt 17  
ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 2044
Infiniband hardware address can be incorrect! Please read BUGS section in ifconfig(8).
        infiniband 80:00:02:40:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00  txqueuelen 256  (InfiniBand)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1  (Local Loopback)
        RX packets 12  bytes 980 (980.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 12  bytes 980 (980.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
p2p2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether f4:52:14:15:67:b2  txqueuelen 1000  (Ethernet)
        RX packets 37842  bytes 12930675 (12.3 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 6739  bytes 1194593 (1.1 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0


At the same time, the /etc/hosts file still gives the internal 10.x.x.x ip address as follow. But, both these ip addresses and hostnames are not accessible. 


127.0.0.1 localhost loghost
10.10.1.7 lustre-client-1-link-1 lustre-client-1-0 lustre-client-1
10.10.1.10 lustre-client-3-link-1 lustre-client-3-0 lustre-client-3
10.10.1.4 lustre-client-2-link-1 lustre-client-2-0 lustre-client-2
10.10.1.12 lustre-server-2-link-1 lustre-server-2-0 lustre-server-2
10.10.1.1 lustre-client-9-link-1 lustre-client-9-0 lustre-client-9
10.10.1.5 lustre-client-10-link-1 lustre-client-10-0 lustre-client-10
10.10.1.3 lustre-client-7-link-1 lustre-client-7-0 lustre-client-7
10.10.1.2 lustre-client-4-link-1 lustre-client-4-0 lustre-client-4
10.10.1.15 lustre-server-3-link-1 lustre-server-3-0 lustre-server-3
10.10.1.8 lustre-client-6-link-1 lustre-client-6-0 lustre-client-6
10.10.1.14 lustre-server-1-link-1 lustre-server-1-0 lustre-server-1
10.10.1.9 lustre-client-11-link-1 lustre-client-11-0 lustre-client-11
10.10.1.11 lustre-client-8-link-1 lustre-client-8-0 lustre-client-8
10.10.1.6 lustre-client-5-link-1 lustre-client-5-0 lustre-client-5
10.10.1.13 lustre-server-5-link-1 lustre-server-5-0 lustre-server-5
10.10.1.16 lustre-server-4-link-1 lustre-server-4-0 lustre-server-4


I am wondering could you let me know whether i did something wrong in the network configuration and how to enable using the internal ip addresses and host names?


thanks for your help and advices!

- Dong

Leigh Stoller

unread,
Oct 11, 2017, 6:36:23 PM10/11/17
to 代栋, cloudlab-users
> Hi, I am running an experiment in Utah APT (experiment Id: daidong-QV29290). It should use multiple network ports. But ifconfig shows only one ip address is assigned for every machine.

Hi. I notice you created the experiment last night, was the network
broken when you started?

Also, has this image (profile) worked on other clusters prior to this?
Just trying to figure if something has changed since it last worked.

Leigh




Mike Hibler

unread,
Oct 11, 2017, 6:36:52 PM10/11/17
to ??????, cloudlab-users
It appears the experiment has been swapped in for 18 hours. Have you done
something to the network in that time?

It looks like the interface setup script is correct and when I run it
manually, it configures the p2p2 interface properly.

Are you sure the lustre startup didn't mess up the interfaces?

Also, please make sure that lustre is only using the 10.x.x.x interfaces
to communicate. It should NOT use the control network interfaces (128.110.x.x).

On Wed, Oct 11, 2017 at 03:11:47PM -0700, ?????? wrote:
> Hi, I am running an experiment in Utah APT (experiment Id: daidong-QV29290
> <https://www.cloudlab.us/status.php?uuid=6df55c6b-ae3c-11e7-b179-90e2ba22fee4>).
> --
> You received this message because you are subscribed to the Google Groups "cloudlab-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to cloudlab-user...@googlegroups.com.
> To post to this group, send email to cloudla...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/cloudlab-users/16488f89-c303-40be-b1f2-0b144b07d3ed%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

代栋

unread,
Oct 11, 2017, 7:19:47 PM10/11/17
to cloudlab-users
Thanks Leigh and Mike for your quick replies. 

I did not do anything special after booting it yesterday. In fact, i noticed the network issues right after initializing the profile. That is also why I used control network interface to configure Lustre, as 10.x.x.x interface does not work. I will NOT run any workload on Lustre before solving this network issue and switching back to the internal network. 

Note that, this disk image is created manually with all lustre-needed kernel patches applied. Do you think this might be the reason? Could you please try to boot a machine using the same disk image and check whether there is problem with the disk image? The image urn is: urn:publicid:IDN+emulab.net+image+NMSU-Cloud:lustre10

thanks,
- Dong

Mike Hibler

unread,
Oct 11, 2017, 7:26:16 PM10/11/17
to ??????, cloudlab-users
What standard CloudLab image did you customize to make your image?
> > an email to cloudlab-user...@googlegroups.com <javascript:>.
> > > To post to this group, send email to cloudla...@googlegroups.com
> > <javascript:>.
> > > To view this discussion on the web visit
> > https://groups.google.com/d/msgid/cloudlab-users/16488f89-c303-40be-b1f2-0b144b07d3ed%40googlegroups.com.
> >
> > > For more options, visit https://groups.google.com/d/optout.
> >
> >
>
> --
> You received this message because you are subscribed to the Google Groups "cloudlab-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to cloudlab-user...@googlegroups.com.
> To post to this group, send email to cloudla...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/cloudlab-users/796e5b20-4ae5-4823-b1d6-0e901960fae6%40googlegroups.com.

代栋

unread,
Oct 11, 2017, 7:38:33 PM10/11/17
to cloudlab-users
Hi, Mike, let me check and get back to you soon.

thanks,
- Dong

代栋

unread,
Oct 12, 2017, 12:36:05 PM10/12/17
to cloudlab-users
Hi, Mike, 

the standard CloudLab image is urn:publicid:IDN+emulab.net+image+emulab-ops:CENTOS7-64-STD

Also, I noticed on lustre-server-1, the p2p2 interface is correctly configured already. Would you please let me know which script i can run on other machines to make them work?

thanks,
- Dong
Reply all
Reply to author
Forward
0 new messages