[compute-0-37.local][[17044,1]
,15][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] connect()
to 192.168.122.1 failed: Connection refused (111)
We don't have a node with the ip address of 192.168.122.1. All our nodes
are 192.168.6...
rocks run host hostname works out to all the hosts. Selinux is disabled.
Is there something else in openmpi that I need to configure?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20110923/387b5ad8/attachment.html
virbr0 Link encap:Ethernet HWaddr 00:00:00:00:00:00
inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:394 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:159564 (155.8 KiB)
On Fri, Sep 23, 2011 at 3:32 PM, Doll, Margaret Ann <margar...@brown.edu
> wrote:
Did you include the Xen roll during installation?
What IP addresses are in your hostfile? Do you even use one? What IPs
do your exec/submit nodes have in SGE? Somehow, something picked up
the 192.168.X.X addresses.
Ian
On Fri, Sep 23, 2011 at 12:37 PM, Doll, Margaret Ann
--
Ian Kaufman
Research Systems Administrator
UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu
virbr0 Link encap:Ethernet HWaddr 00:00:00:00:00:00
inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:394 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:159564 (155.8 KiB)
virbr0 had the "offending" connection with the user's job.
I executed
yum groupremove "Virtualization"
Hopefully that solves the last of our problems.
Have a nice weekend. Thanks.
I am not sure "yum groupremove Virtualization" will solve your
problems. Any time a node reboots, the virtualization will return.
Ian
On Fri, Sep 23, 2011 at 2:04 PM, Doll, Margaret Ann
>>>>> ,15][btl_tcp_endpoint.c:638:**mca_btl_tcp_endpoint_complete_**connect]
>>>>> connect() to 192.168.122.1 failed: Connection refused (111)
>>>>>
>>>>> We don't have a node with the ip address of 192.168.122.1. All our
>>>>> nodes
>>>>> are 192.168.6...
>>>>>
>>>>> rocks run host hostname works out to all the hosts. Selinux is
>>>>> disabled.
>>>>>
>>>>>
>>>>> Is there something else in openmpi that I need to configure?
>>>>>
>>>>>
>>>>>
>>>>> -------------- next part --------------
>>> An HTML attachment was scrubbed...
>>> URL: https://lists.sdsc.edu/**pipermail/npaci-rocks-**
>>> discussion/attachments/**20110923/83418b98/attachment.**html<https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20110923/83418b98/attachment.html>
>>>
>>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
Then you could try to add these two lines to the bottom of your
/share/apps/openmpi/etc/openmpi-mca-params.conf
file [if not yet there]:
btl = tcp,sm,self
btl_tcp_if_include = eth0
I assume you use eth0 for MPI.
If not, change the second line accordingly.
**
Also, in another email Ian noted that you're using
routable IP addresses [192.167.X.X] for your compute nodes.
Was that I typo in your email?
Did you mean 192.168.X.X, perhaps?
Maybe I am missing something, but anyway ...
Are these addresses associated to eth0?
Or are they associated to another interface,
and perhaps on the Internet?
It may help avoid confusion if you post the output of
rocks list network
and
rocks list host interface
I hope this helps,
Gus Correa
> Doll, Margaret Ann wrote:
> > We use tcp over Ethernet.
> >
> > On Fri, Sep 23, 2011 at 5:16 PM, Gus Correa <g...@ldeo.columbia.edu>
> wrote:
> >
> >> Do you use Infiniband, tcp over Ethernet, or something else
> >> for MPI communication?
> >>
> >>
>
> Then you could try to add these two lines to the bottom of your
> /share/apps/openmpi/etc/**openmpi-mca-params.conf
> file [if not yet there]:
>
> btl = tcp,sm,self
> btl_tcp_if_include = eth0
>
Thanks, I added these lines.
>
> I assume you use eth0 for MPI.
> If not, change the second line accordingly.
>
> **
>
> Also, in another email Ian noted that you're using
> routable IP addresses [192.167.X.X] for your compute nodes.
> Was that I typo in your email?
> Did you mean 192.168.X.X, perhaps?
>
Mistype on my part.
eth0 Link encap:Ethernet HWaddr 00:04:23:D9:87:8C
inet addr:192.168.6.1 Bcast:192.168.6.255 Mask:255.255.255.0
> Maybe I am missing something, but anyway ...
> Are these addresses associated to eth0?
> Or are they associated to another interface,
> and perhaps on the Internet?
>
> It may help avoid confusion if you post the output of
>
> rocks list network
>
# rocks list network
NETWORK SUBNET NETMASK MTU DNSZONE SERVEDNS
private: 192.168.6.0 255.255.255.0 1500 local True
public: 128.148.229.0 255.255.255.192 1500 hetchem.brown.edu False
>
> and
>
> rocks list host interface
>
> # rocks list host interface
> HOST SUBNET IFACE MAC IP NETMASK
> MODULE NAME VLAN OPTIONS CHANNEL
> ted: private eth0 00:04:23:D9:87:8C 192.168.6.1 255.255.255.0
> ------ ted ---- ------- -------
> ted: public eth1 00:04:23:D9:87:8D 128.148.229.2 255.255.255.192
> ------ ted ---- ------- -------
> compute-0-20: private eth0 00:30:48:7d:d3:32 192.168.6.237 255.255.255.0
> ------ compute-0-20 ---- ------- -------
> compute-0-20: ------- eth1 00:30:48:7d:d3:33 ------------- ---------------
> ------ ------------ ---- ------- -------
> power-0-18: private ----- 00:30:48:99:59:80 192.168.6.239 255.255.255.0
> ------ compute-0-18 ---- ------- -------
> power-0-17: private ----- 00:30:48:99:59:75 192.168.6.240 255.255.255.0
> ------ compute-0-17 ---- ------- -------
> compute-0-16: private eth0 00:30:48:7d:d2:de 192.168.6.241 255.255.255.0
> ------ compute-0-16 ---- ------- -------
> compute-0-16: ------- eth1 00:30:48:7d:d2:df ------------- ---------------
> ------ ------------ ---- ------- -------
> compute-0-15: private eth0 00:30:48:7d:d2:d2 192.168.6.242 255.255.255.0
> ------ compute-0-15 ---- ------- -------
> compute-0-15: ------- eth1 00:30:48:7d:d2:d3 ------------- ---------------
> ------ ------------ ---- ------- -------
> power-0-14: private ----- 00:30:48:99:58:b3 192.168.6.243 255.255.255.0
> ------ compute-0-14 ---- ------- -------
> power-0-13: private ----- 00:30:48:99:58:f5 192.168.6.246 255.255.255.0
> ------ compute-0-13 ---- ------- -------
> compute-0-12: private eth0 00:30:48:79:cb:1a 192.168.6.244 255.255.255.0
> ------ compute-0-12 ---- ------- -------
> compute-0-12: ------- eth1 00:30:48:79:cb:1b ------------- ---------------
> ------ ------------ ---- ------- -------
> compute-0-11: private eth0 00:30:48:79:ca:72 192.168.6.245 255.255.255.0
> ------ compute-0-11 ---- ------- -------
> compute-0-11: ------- eth1 00:30:48:79:ca:73 ------------- ---------------
> ------ ------------ ---- ------- -------
> compute-0-9: private eth0 00:30:48:7c:3e:2a 192.168.6.247 255.255.255.0
> ------ compute-0-9 ---- ------- -------
> compute-0-9: ------- eth1 00:30:48:7c:3e:2b ------------- ---------------
> ------ ------------ ---- ------- -------
> compute-0-8: private eth0 00:30:48:7b:51:c2 192.168.6.248 255.255.255.0
> ------ compute-0-8 ---- ------- -------
> compute-0-8: ------- eth1 00:30:48:7b:51:c3 ------------- ---------------
> ------ ------------ ---- ------- -------
> power-0-7: private ----- 00:30:48:8c:43:42 192.168.6.249 255.255.255.0
> ------ compute-0-7 ---- ------- -------
> power-0-6: private ----- 00:30:48:94:2f:31 192.168.6.250 255.255.255.0
> ------ compute-0-6 ---- ------- -------
> power-0-3: private ----- 00:30:48:8c:43:3c 192.168.6.251 255.255.255.0
> ------ compute-0-3 ---- ------- -------
> compute-0-1: ------- eth0 00:30:48:79:c8:db ------------- ---------------
> ------ ------------ ---- ------- -------
> compute-0-1: private eth1 00:30:48:79:c8:da 192.168.6.252 255.255.255.0
> ------ compute-0-1 ---- ------- -------
> power-0-2: private ----- 00:30:48:8c:43:21 192.168.6.253 255.255.255.0
> ------ compute-0-2 ---- ------- -------
> power-0-3: private ----- 00:30:48:8c:43:3c 192.168.6.251 255.255.255.0
> ------ compute-0-3 ---- ------- -------
> compute-0-0: private eth0 00:30:48:79:76:3a 192.168.6.254 255.255.255.0
> ------ compute-0-0 ---- ------- -------
> compute-0-0: ------- eth1 00:30:48:79:76:3b ------------- ---------------
> ------ ------------ ---- ------- -------
> compute-0-19: private eth0 00:30:48:7d:d3:2e 192.168.6.238 255.255.255.0
> ------ compute-0-19 ---- ------- -------
> compute-0-19: ------- eth1 00:30:48:7d:d3:2f ------------- ---------------
> ------ ------------ ---- ------- -------
> compute-0-1: ------- eth0 00:30:48:79:c8:db ------------- ---------------
> ------ ------------ ---- ------- -------
> compute-0-1: private eth1 00:30:48:79:c8:da 192.168.6.252 255.255.255.0
> ------ compute-0-1 ---- ------- -------
> power-0-2: private ----- 00:30:48:8c:43:21 192.168.6.253 255.255.255.0
> ------ compute-0-2 ---- ------- -------
> power-0-21: private ----- 00:30:48:9a:e8:9a 192.168.6.236 255.255.255.0
> ------ compute-0-21 ---- ------- -------
> power-0-22: private ----- 00:30:48:99:59:6c 192.168.6.235 255.255.255.0
> ------ compute-0-22 ---- ------- -------
> compute-0-23: private eth0 00:30:48:7a:15:a4 192.168.6.234 255.255.255.0
> ------ compute-0-23 ---- ------- -------
> compute-0-23: ------- eth1 00:30:48:7a:15:a5 ------------- ---------------
> ------ ------------ ---- ------- -------
> compute-0-24: private eth0 00:30:48:7d:d2:a8 192.168.6.233 255.255.255.0
> ------ compute-0-24 ---- ------- -------
> compute-0-24: ------- eth1 00:30:48:7d:d2:a9 ------------- ---------------
> ------ ------------ ---- ------- -------
> power-0-25: private ----- 00:30:48:99:59:54 192.168.6.232 255.255.255.0
> ------ compute-0-25 ---- ------- -------
> power-0-26: private ----- 00:30:48:99:58:c0 192.168.6.231 255.255.255.0
> ------ compute-0-26 ---- ------- -------
> compute-0-27: private eth0 00:30:48:7d:d2:aa 192.168.6.230 255.255.255.0
> ------ compute-0-27 ---- ------- -------
> compute-0-27: ------- eth1 00:30:48:7d:d2:ab ------------- ---------------
> ------ ------------ ---- ------- -------
> compute-0-28: private eth0 00:30:48:7d:d1:a8 192.168.6.229 255.255.255.0
> ------ compute-0-28 ---- ------- -------
> compute-0-28: ------- eth1 00:30:48:7d:d1:a9 ------------- ---------------
> ------ ------------ ---- ------- -------
> power-0-29: private ----- 00:30:48:99:58:b1 192.168.6.228 255.255.255.0
> ------ compute-0-29 ---- ------- -------
> power-0-30: private ----- 00:30:48:99:58:b9 192.168.6.227 255.255.255.0
> ------ compute-0-30 ---- ------- -------
> compute-0-31: private eth0 00:30:48:7d:d1:ac 192.168.6.226 255.255.255.0
> ------ compute-0-31 ---- ------- -------
> compute-0-31: ------- eth1 00:30:48:7d:d1:ad ------------- ---------------
> ------ ------------ ---- ------- -------
> power-0-32: private ----- 00:30:48:8c:42:07 192.168.6.225 255.255.255.0
> ------ compute-0-32 ---- ------- -------
> compute-0-33: private eth0 00:30:48:7d:d1:aa 192.168.6.224 255.255.255.0
> ------ compute-0-33 ---- ------- -------
> compute-0-33: ------- eth1 00:30:48:7d:d1:ab ------------- ---------------
> ------ ------------ ---- ------- -------
> power-0-34: private ----- 00:30:48:99:59:67 192.168.6.223 255.255.255.0
> ------ compute-0-34 ---- ------- -------
> power-0-35: private ----- 00:30:48:99:59:79 192.168.6.222 255.255.255.0
> ------ compute-0-35 ---- ------- -------
> compute-0-36: private eth0 00:30:48:7d:d2:c0 192.168.6.221 255.255.255.0
> ------ compute-0-36 ---- ------- -------
> compute-0-36: ------- eth1 00:30:48:7d:d2:c1 ------------- ---------------
> ------ ------------ ---- ------- -------
> compute-0-37: private eth0 00:30:48:7d:d2:c4 192.168.6.220 255.255.255.0
> ------ compute-0-37 ---- ------- -------
> compute-0-37: ------- eth1 00:30:48:7d:d2:c5 ------------- ---------------
> ------ ------------ ---- ------- -------
> power-0-38: private ----- 00:30:48:94:37:c7 192.168.6.219 255.255.255.0
> ------ compute-0-38 ---- ------- -------
> power-0-39: private ----- 00:30:48:94:2c:c3 192.168.6.218 255.255.255.0
> ------ compute-0-39 ---- ------- -------
> compute-0-40: private eth0 00:30:48:7c:7f:be 192.168.6.217 255.255.255.0
> ------ compute-0-40 ---- ------- -------
> compute-0-40: ------- eth1 00:30:48:7c:7f:bf ------------- ---------------
> ------ ------------ ---- ------- -------
> compute-0-41: private eth0 00:30:48:7c:7f:b8 192.168.6.216 255.255.255.0
> ------ compute-0-41 ---- ------- -------
> compute-0-41: ------- eth1 00:30:48:7c:7f:b9 ------------- ---------------
> ------ ------------ ---- ------- -------
> power-0-42: private ----- 00:30:48:94:38:af 192.168.6.215 255.255.255.0
> ------ compute-0-42 ---- ------- -------
> power-0-43: private ----- 00:30:48:94:37:45 192.168.6.214 255.255.255.0
> ------ compute-0-43 ---- ------- -------
> compute-0-44: private eth0 00:30:48:7d:c5:aa 192.168.6.213 255.255.255.0
> ------ compute-0-44 ---- ------- -------
> compute-0-44: ------- eth1 00:30:48:7d:c5:ab ------------- ---------------
> ------ ------------ ---- ------- -------
> compute-0-45: private eth0 00:30:48:7c:42:42 192.168.6.212 255.255.255.0
> ------ compute-0-45 ---- ------- -------
> compute-0-45: ------- eth1 00:30:48:7c:42:43 ------------- ---------------
> ------ ------------ ---- ------- -------
>>>>>>> ,15][btl_tcp_endpoint.c:638:****mca_btl_tcp_endpoint_complete_**
>>>>>>> **connect]
>>>>>>> connect() to 192.168.122.1 failed: Connection refused (111)
>>>>>>>
>>>>>>> We don't have a node with the ip address of 192.168.122.1. All our
>>>>>>> nodes
>>>>>>> are 192.168.6...
>>>>>>>
>>>>>>> rocks run host hostname works out to all the hosts. Selinux is
>>>>>>> disabled.
>>>>>>>
>>>>>>>
>>>>>>> Is there something else in openmpi that I need to configure?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -------------- next part --------------
>>>>>>>
>>>>>> An HTML attachment was scrubbed...
>>>>> URL: https://lists.sdsc.edu/****pipermail/npaci-rocks-**<https://lists.sdsc.edu/**pipermail/npaci-rocks-**>
>>>>> discussion/attachments/****20110923/83418b98/attachment.****html<
>>>>> https://lists.sdsc.edu/**pipermail/npaci-rocks-**
>>>>> discussion/attachments/**20110923/83418b98/attachment.**html<https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20110923/83418b98/attachment.html>
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>
>>>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: https://lists.sdsc.edu/**pipermail/npaci-rocks-**
>> discussion/attachments/**20110923/e53c04a5/attachment.**html<https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20110923/e53c04a5/attachment.html>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
Gus Correa