[Rocks-Discuss] Network configuration problems?

407 views
Skip to first unread message

Josh Anderson

unread,
Nov 8, 2011, 4:48:03 PM11/8/11
to npaci-rocks...@sdsc.edu
Hi I am new to Rocks. I have installed everything with 7 compute nodes. Everything seems to work fine, but after about 3 hours the eth0 goes down. I cannot ssh into the compute nodes from the frontend and obviously my simulations stop. ssh says no route to host. I cannot find what is wrong. The only way I can fix this is to do a hard restart on the frontend and then the system works for about 3 hours again.

1.) Is this a hardware issue with my NIC?

2.) In my network configuration under the hardware tab it identifies 4 devices: eth0, eth1, peth0, and peth1. It seems that peth0 is somehow linked to eth0. I am unsure why there are peth anything.

Does anyone have any ideas how to troubleshoot this issue?

Thanks in advance,

Josh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20111108/da698e07/attachment.html

Philip Papadopoulos

unread,
Nov 8, 2011, 6:55:34 PM11/8/11
to Discussion of Rocks Clusters
It sounds like you are trying (or have already) used the network gui to
change network configuration.
Network configuration on a frontend requires some careful balances that the
network gui doesn't necessarily understand

Please do the following
# rocks sync host network <name of your frontend>
and then
reboot.

Please send the contents
of /etc/sysconfig/network-scripts/ifcfg-eth0
and /etc/sysconfig/network-scripts/ifcfg-eth1


the peth0 is because you have the xen roll installed and is part of the
standard Xen bridging construction.

-P


--
Philip Papadopoulos, PhD
University of California, San Diego
858-822-3628 (Ofc)
619-331-2990 (Fax)


-------------- next part --------------
An HTML attachment was scrubbed...

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20111108/9c83e6bb/attachment.html

Josh Anderson

unread,
Nov 8, 2011, 8:17:40 PM11/8/11
to Discussion of Rocks Clusters
I ran the commands. The file contents are listed below. I would like to note
that I switched the two NICs and connected my original eth1 to my cluster
network and my original eth0 to the internet connection. Now it seems that
the internet connection fails every 3 hours. This leads me to believe it is
the extra NIC I installed. I just do not understand how to troubleshoot this
problem.

contents of /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
HWADDR=00:e0:b3:10:dd:76
IPADDR=128.104.176.177
NETMASK=255.255.252.0
BOOTPROTO=none
ONBOOT=yes
MTU=1500
TYPE=Ethernet
USERCTL=no
IPV6INIT=no
PEERDNS=yes
GATEWAY=128.104.176.1


contents of /etc/sysconfig/network-scripts/ifcfg-eth1:
TYPE=Ethernet
DEVICE=eth1
HWADDR=d4:85:64:98:9a:cd
BOOTPROTO=none
NETMASK=255.255.0.0
IPADDR=10.1.1.1
ONBOOT=yes
USERCTL=no
IPV6INIT=no
PEERDNS=yes

Philip Papadopoulos

unread,
Nov 8, 2011, 8:45:12 PM11/8/11
to Discussion of Rocks Clusters
Can you also send the output of
# rocks list host interface <name of your frontend>
# rocks list network

Thanks,
Phil


On Tue, Nov 8, 2011 at 5:17 PM, Josh Anderson <jande...@wisc.edu> wrote:

> I ran the commands. The file contents are listed below. I would like to
> note that I switched the two NICs and connected my original eth1 to my
> cluster network and my original eth0 to the internet connection. Now it
> seems that the internet connection fails every 3 hours. This leads me to
> believe it is the extra NIC I installed. I just do not understand how to
> troubleshoot this problem.
>

> contents of /etc/sysconfig/network-**scripts/ifcfg-eth0


> DEVICE=eth0
> HWADDR=00:e0:b3:10:dd:76
> IPADDR=128.104.176.177
> NETMASK=255.255.252.0
> BOOTPROTO=none
> ONBOOT=yes
> MTU=1500
> TYPE=Ethernet
> USERCTL=no
> IPV6INIT=no
> PEERDNS=yes
> GATEWAY=128.104.176.1
>
>

> contents of /etc/sysconfig/network-**scripts/ifcfg-eth1:


> TYPE=Ethernet
> DEVICE=eth1
> HWADDR=d4:85:64:98:9a:cd
> BOOTPROTO=none
> NETMASK=255.255.0.0
> IPADDR=10.1.1.1
> ONBOOT=yes
> USERCTL=no
> IPV6INIT=no
> PEERDNS=yes
>
> -----Original Message----- From: Philip Papadopoulos
> Sent: Tuesday, November 08, 2011 5:55 PM
> To: Discussion of Rocks Clusters
> Subject: Re: [Rocks-Discuss] Network configuration problems?
>
>
> It sounds like you are trying (or have already) used the network gui to
> change network configuration.
> Network configuration on a frontend requires some careful balances that the
> network gui doesn't necessarily understand
>
> Please do the following
> # rocks sync host network <name of your frontend>
> and then
> reboot.
>
> Please send the contents

> of /etc/sysconfig/network-**scripts/ifcfg-eth0
> and /etc/sysconfig/network-**scripts/ifcfg-eth1


>
>
> the peth0 is because you have the xen roll installed and is part of the
> standard Xen bridging construction.
>
> -P
>
>
>
> On Tue, Nov 8, 2011 at 1:48 PM, Josh Anderson <jande...@wisc.edu> wrote:
>
> Hi I am new to Rocks. I have installed everything with 7 compute nodes.
>> Everything seems to work fine, but after about 3 hours the eth0 goes down.
>> I cannot ssh into the compute nodes from the frontend and obviously my
>> simulations stop. ssh says no route to host. I cannot find what is wrong.
>> The only way I can fix this is to do a hard restart on the frontend and
>> then the system works for about 3 hours again.
>>
>> 1.) Is this a hardware issue with my NIC?
>>
>> 2.) In my network configuration under the hardware tab it identifies 4
>> devices: eth0, eth1, peth0, and peth1. It seems that peth0 is somehow
>> linked to eth0. I am unsure why there are peth anything.
>>
>> Does anyone have any ideas how to troubleshoot this issue?
>>
>> Thanks in advance,
>>
>> Josh
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL:

>> https://lists.sdsc.edu/**pipermail/npaci-rocks-**discussion/attachments/*
>> *20111108/da698e07/attachment.**html<https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20111108/da698e07/attachment.html>


>>
>>
>>
>
> --
> Philip Papadopoulos, PhD
> University of California, San Diego
> 858-822-3628 (Ofc)
> 619-331-2990 (Fax)
> -------------- next part --------------
> An HTML attachment was scrubbed...

> URL: https://lists.sdsc.edu/**pipermail/npaci-rocks-**
> discussion/attachments/**20111108/9c83e6bb/attachment.**html<https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20111108/9c83e6bb/attachment.html>
>

--
Philip Papadopoulos, PhD
University of California, San Diego
858-822-3628 (Ofc)
619-331-2990 (Fax)
-------------- next part --------------
An HTML attachment was scrubbed...

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20111108/8f270edf/attachment.html

Josh Anderson

unread,
Nov 11, 2011, 1:54:20 PM11/11/11
to Discussion of Rocks Clusters
Sorry for the late reply.
Below are the outputs of the requested commands.

[root@efm ~]# rocks list host interface efm
SUBNET IFACE MAC IP NETMASK MODULE NAME
VLAN OPTIONS CHANNEL
private eth0 00:E0:B3:10:DD:76 10.1.1.1 255.255.0.0 ------
fm ---- ------- -------
public eth1 D4:85:64:98:9A:CD 128.104.176.177 255.255.252.0 ------
fm ---- ------- -------


[root@efm ~]# rocks list network
NETWORK SUBNET NETMASK MTU DNSZONE SERVEDNS
private: 10.1.0.0 255.255.0.0 1500 local True
public: 128.104.176.0 255.255.252.0 1500 cluster.org False

Josh

Philip Papadopoulos

unread,
Nov 13, 2011, 10:55:12 PM11/13/11
to Discussion of Rocks Clusters
I might be very suspect of your eth0 hardware. According to
http://www.coffer.com/mac_find/?string=00%3AE0%3AB3%3A10%3ADD%3A76

Your NIC vendor is etherwan. The probably do fine as a desktop NIC with low
network utilization,
but may not handle heavy load. Is this a replaceable NIC -- if so, you
might try a standard intel e1000
NIC.

-P

>> contents of /etc/sysconfig/network-****scripts/ifcfg-eth0


>>
>> DEVICE=eth0
>> HWADDR=00:e0:b3:10:dd:76
>> IPADDR=128.104.176.177
>> NETMASK=255.255.252.0
>> BOOTPROTO=none
>> ONBOOT=yes
>> MTU=1500
>> TYPE=Ethernet
>> USERCTL=no
>> IPV6INIT=no
>> PEERDNS=yes
>> GATEWAY=128.104.176.1
>>
>>

>> contents of /etc/sysconfig/network-****scripts/ifcfg-eth1:


>>
>> TYPE=Ethernet
>> DEVICE=eth1
>> HWADDR=d4:85:64:98:9a:cd
>> BOOTPROTO=none
>> NETMASK=255.255.0.0
>> IPADDR=10.1.1.1
>> ONBOOT=yes
>> USERCTL=no
>> IPV6INIT=no
>> PEERDNS=yes
>>
>> -----Original Message----- From: Philip Papadopoulos
>> Sent: Tuesday, November 08, 2011 5:55 PM
>> To: Discussion of Rocks Clusters
>> Subject: Re: [Rocks-Discuss] Network configuration problems?
>>
>>
>> It sounds like you are trying (or have already) used the network gui to
>> change network configuration.
>> Network configuration on a frontend requires some careful balances that
>> the
>> network gui doesn't necessarily understand
>>
>> Please do the following
>> # rocks sync host network <name of your frontend>
>> and then
>> reboot.
>>
>> Please send the contents

>> of /etc/sysconfig/network-****scripts/ifcfg-eth0
>> and /etc/sysconfig/network-****scripts/ifcfg-eth1


>>
>>
>>
>> the peth0 is because you have the xen roll installed and is part of the
>> standard Xen bridging construction.
>>
>> -P
>>
>>
>>
>> On Tue, Nov 8, 2011 at 1:48 PM, Josh Anderson <jande...@wisc.edu>
>> wrote:
>>
>> Hi I am new to Rocks. I have installed everything with 7 compute nodes.
>>
>>> Everything seems to work fine, but after about 3 hours the eth0 goes
>>> down.
>>> I cannot ssh into the compute nodes from the frontend and obviously my
>>> simulations stop. ssh says no route to host. I cannot find what is wrong.
>>> The only way I can fix this is to do a hard restart on the frontend and
>>> then the system works for about 3 hours again.
>>>
>>> 1.) Is this a hardware issue with my NIC?
>>>
>>> 2.) In my network configuration under the hardware tab it identifies 4
>>> devices: eth0, eth1, peth0, and peth1. It seems that peth0 is somehow
>>> linked to eth0. I am unsure why there are peth anything.
>>>
>>> Does anyone have any ideas how to troubleshoot this issue?
>>>
>>> Thanks in advance,
>>>
>>> Josh
>>> -------------- next part --------------
>>> An HTML attachment was scrubbed...
>>> URL:

>>> https://lists.sdsc.edu/****pipermail/npaci-rocks-****
>>> discussion/attachments/*<https://lists.sdsc.edu/**pipermail/npaci-rocks-**discussion/attachments/*>
>>> *20111108/da698e07/attachment.****html<https://lists.sdsc.edu/**
>>> pipermail/npaci-rocks-**discussion/attachments/**


>>> 20111108/da698e07/attachment.**html<https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20111108/da698e07/attachment.html>
>>> >
>>>
>>>
>>>
>>>
>> --
>> Philip Papadopoulos, PhD
>> University of California, San Diego
>> 858-822-3628 (Ofc)
>> 619-331-2990 (Fax)
>> -------------- next part --------------
>> An HTML attachment was scrubbed...

>> URL: https://lists.sdsc.edu/****pipermail/npaci-rocks-**<https://lists.sdsc.edu/**pipermail/npaci-rocks-**>
>> discussion/attachments/****20111108/9c83e6bb/attachment.****html<


>> https://lists.sdsc.edu/**pipermail/npaci-rocks-**discussion/attachments/*
>> *20111108/9c83e6bb/attachment.**html<https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20111108/9c83e6bb/attachment.html>
>> >
>>
>>
>
>
> --
> Philip Papadopoulos, PhD
> University of California, San Diego
> 858-822-3628 (Ofc)
> 619-331-2990 (Fax)
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: https://lists.sdsc.edu/**pipermail/npaci-rocks-**

> discussion/attachments/**20111108/8f270edf/attachment.**html<https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20111108/8f270edf/attachment.html>
>

--
Philip Papadopoulos, PhD
University of California, San Diego
858-822-3628 (Ofc)
619-331-2990 (Fax)
-------------- next part --------------
An HTML attachment was scrubbed...

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20111113/b3d3b16f/attachment.html

Reply all
Reply to author
Forward
0 new messages