I went ahead and played with Rocks and then installed it on my small
Cray. I installed kernel, base, web, bio, ganglia, hpc, sge, service
pack and modules. Everything went very smoothly. The compute nodes
came up and it all looks good.
I'm now trying to get infiniband support to work.
I followed the instructions at
http://flakrat.blogspot.com/2011/02/building-mellanox-ofed-152-for-rocks
-54.html (printed them out-it's blocked at work!)
replacing my kernel 2.6.18-194.17.4.el5 for the kernel mentioned in the
docs (2.6.18-194.17.1.el5). I used the
MLNX_OFED_LINUX-1.5.2-2.0.0-rhel5.5.iso even though there is a newer one
available, because the newer one seemed to be for a newer kernel and I
didn't know if it would work.
The only line I didn't follow in the instructions was rm
/root//root/MLNX_OFED_LINUX-1.5.2-2.0.0-rhel5.5.iso, as it seemed not to
work and removing the iso image left nothing for the later script to add
(inject) kernel support.
When I run
cd
/share/apps/mellanox/MLNX_OFED_LINUX-1.5.2-2.0.0-rhel5.5-2.6.18-194.17.4
.el5
# ./mlnxofedinstall --force -hpc
The installation starts (there is one error)
This program will install the MLNX_OFED_LINUX package on your machine.
Note that all other Mellanox, OEM, OFED, or Distribution IB packages
will be removed.
Uninstalling the previous version of OFED
Starting MLNX_OFED_LINUX-1.5.2-2.0.0 installation ...
Installing mpi-selector RPM
Preparing...
##################################################
...
dapl-devel-static
##################################################
mlnxofed-docs
##################################################
error: unpacking of archive failed: cpio: Bad magic
ofed-scripts
##################################################
...
then
...
opensm
##################################################
Device (15b3:6732):
02:00.0 InfiniBand: Mellanox Technologies MT26418 [ConnectX VPI
PCIe 2.0 5GT/s - IB DDR / 10GigE] (rev a0)
Link Width: 8x
PCI Link Speed: 5Gb/s
Error: Firmware configuration file for /dev/mst/mt26418_pci_cr0 is not
found
Skip firmware update for /dev/mst/mt26418_pci_cr0.
Configuring /etc/security/limits.conf.
warning: /etc/infiniband/openib.conf saved as
/etc/infiniband/openib.conf.rpmsave
Looking further in the Install OFED 1.5.X on a Rocks 5.3 cluster blog
instructions I ran:
[root@ABLethCX1 mellanox]# /sbin/chkconfig --add openibd
[root@ABLethCX1 mellanox]# /sbin/chkconfig openibd on
[root@ABLethCX1 mellanox]# /sbin/service openibd start
Loading Mellanox HCA driver: [FAILED]
Loading Mellanox MLX4_IB HCA driver: [FAILED]
Loading QLogic QIB driver: [FAILED]
Loading cxgb3 driver: [FAILED]
Loading nes driver: [FAILED]
Loading HCA driver and Access Layer: [FAILED]
Please open an issue in the http://bugs.openfabrics.org and attach
/tmp/ib_debug_info.log
Ibv_devinfo gives me:
[root@ABLethCX1 ~]# ibv_devinfo
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.6.000
node_guid: 0030:48ff:ffcd:fb3c
sys_image_guid: 0030:48ff:ffcd:fb3f
vendor_id: 0x02c9
vendor_part_id: 26418
hw_ver: 0xA0
board_id: SM_2081000001000
phys_port_cnt: 1
port: 1
state: PORT_INIT (2)
max_mtu: 2048 (4)
active_mtu: 2048 (4)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: IB
Any ideas where I went wrong or how to remedy this?
Any help appreciated.
Bob Forster
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20110603/29d2046d/attachment.html
I took Steve Jones' (Stanford) advice and used the UCSD Triton
InfiniBand OFED roll. Attached are my instructions and scripts for
adding and configuring InfiniBand support to our Rocks 5.4 cluster
(which has a separate, integrated CentOS NAS). Hopefully, the
comments in the scripts are self-explanatory and will enable you to
pick and choose what you want them to do.
Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Install InfiniBand OFED.pdf
Type: application/pdf
Size: 60582 bytes
Desc: not available
Url : https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20110603/e089ff12/InstallInfiniBandOFED.pdf
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ipoib-nas-5.4.2.sh
Type: application/octet-stream
Size: 6408 bytes
Desc: not available
Url : https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20110603/e089ff12/ipoib-nas-5.4.2.sh
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ipoib-rocks-5.4.2.sh
Type: application/octet-stream
Size: 11296 bytes
Desc: not available
Url : https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20110603/e089ff12/ipoib-rocks-5.4.2.sh
> I took Steve Jones' (Stanford) advice and used the UCSD Triton
> InfiniBand OFED roll. Attached are my instructions and scripts for
> adding and configuring InfiniBand support to our Rocks 5.4 cluster
> (which has a separate, integrated CentOS NAS). Hopefully, the
> comments in the scripts are self-explanatory and will enable you to
> pick and choose what you want them to do.
Thanks for informations! One question : if i am not interested in
IP-over-IB the how is finished at "Configure the InfiniBand OFED Services"?
And my next step would be to compile the mpi roll from triton?
Thanks!
Adrian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3110 bytes
Desc: S/MIME Cryptographic Signature
Url : https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20110603/718d6d68/smime.p7s
Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov
-P
--
Philip Papadopoulos, PhD
University of California, San Diego
858-822-3628 (Ofc)
619-331-2990 (Fax)
-------------- next part --------------
An HTML attachment was scrubbed...
Thanks!
Adrian
>
> Larry Baker
> US Geological Survey
> 650-329-5608
> ba...@usgs.gov
>
> On 3 Jun 2011, at 11:16 AM, Larry Baker wrote:
>
>> Robert,
>>
>> I took Steve Jones' (Stanford) advice and used the UCSD Triton
>> InfiniBand OFED roll. Attached are my instructions and scripts for
>> adding and configuring InfiniBand support to our Rocks 5.4 cluster
>> (which has a separate, integrated CentOS NAS). Hopefully, the
>> comments in the scripts are self-explanatory and will enable you to
>> pick and choose what you want them to do.
>>
>> Larry Baker
>> US Geological Survey
>> 650-329-5608
>> ba...@usgs.gov
>>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3110 bytes
Desc: S/MIME Cryptographic Signature
Url : https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20110603/1a89bbe1/smime.p7s
-------------- next part --------------
A non-text attachment was scrubbed...
Name: laotsao.vcf
Type: text/x-vcard
Size: 653 bytes
Desc: not available
Url : https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20110603/b09be715/laotsao.vcf
On 6/3/2011 3:39 PM, Adrian Sevcenco wrote:
> On 06/03/2011 10:14 PM, Larry Baker wrote:
>> I neglected to mention that my instructions disable the Linux InfiniBand
>> subnet manager because our switch has an integrated subnet manager. You
>> will need to run a subnet manager somewhere -- the front end (or a
>> separate NAS, for example, or both) makes logical sense.
> Hi! Do you have infiniband on all nodes?
> Has anyone an idea how can i customize the exdend-compute.xml in order
> to install ofed only on WNs with infiniband?
in the extend-compute.xml check for the HBA IB devices
Name: laotsao.vcf
Type: text/x-vcard
Size: 653 bytes
Desc: not available
Url : https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20110603/46a65e3b/laotsao.vcf
On Jun 3, 2011, at 12:39 PM, Adrian Sevcenco wrote:
> On 06/03/2011 10:14 PM, Larry Baker wrote:
>> I neglected to mention that my instructions disable the Linux InfiniBand
>> subnet manager because our switch has an integrated subnet manager. You
>> will need to run a subnet manager somewhere -- the front end (or a
>> separate NAS, for example, or both) makes logical sense.
> Hi! Do you have infiniband on all nodes?
> Has anyone an idea how can i customize the exdend-compute.xml in order
> to install ofed only on WNs with infiniband?
If you're using the Triton OFED roll, you can set the 'disable_ofed' attribute to control which hosts get OFED installed. For example,
$ rocks set host attr compute-0-0 disable_ofed True
--Rick
Url : https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20110603/42b2b855/laotsao.vcf
Adrian
>
> --Rick
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3110 bytes
Desc: S/MIME Cryptographic Signature
Url : https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20110603/79325047/smime.p7s
Bob
[root@ABLethCX1 ~]# lspci -v | grep Mellanox
02:00.0 InfiniBand: Mellanox Technologies MT26418 [ConnectX VPI PCIe 2.0
5GT/s - IB DDR / 10GigE] (rev a0)
Subsystem: Mellanox Technologies MT26418 [ConnectX VPI PCIe 2.0
5GT/s - IB DDR / 10GigE]
>> I took Steve Jones' (Stanford) advice and used the UCSD Triton
>> InfiniBand OFED roll. Attached are my instructions and scripts for
>> adding and configuring InfiniBand support to our Rocks 5.4 cluster
>> (which has a separate, integrated CentOS NAS). Hopefully, the
>> comments in the scripts are self-explanatory and will enable you to
>> pick and choose what you want them to do.
> Thanks for informations! One question : if i am not interested in
> IP-over-IB the how is finished at "Configure the InfiniBand OFED
> Services"?
> And my next step would be to compile the mpi roll from triton?
>
> Thanks!
> Adrian
>
Your proposal sounds correct to me. That is, you would not need to
run either of the ipoib*.sh scripts. You might want to inspect the
OpenIB configuration file, /etc/infiniband/openib.conf, to disable the
features you don't want. This is done in my script:
> #
> # Configure InfiniBand OFED kernel modules
> #
> file="/etc/infiniband/openib.conf"
> echo -e "\n" \
> "Configure InfiniBand OFED kernel modules\n"
> if [ ! -f "${file}.original" ] ; then
> echo \cp "${file}" "${file}.original" ;
> \cp "${file}" "${file}.original" ;
> fi
> echo \cp "${file}.original" "${file}"
> \cp "${file}.original" "${file}"
> echo \
> sed -i -e '/^MLX4_VNIC_LOAD\>/s/\<yes\>/no/' \
> -e '/^IPOIB_LOAD\>/s/\<no\>/yes/' \
> "${file}"
> sed -i -e '/^MLX4_VNIC_LOAD\>/s/\<yes\>/no/' \
> -e '/^IPOIB_LOAD\>/s/\<no\>/yes/' \
> "${file}"
I think you will want to leave IPOIB_LOAD set to "no", and you will
also want to change MLX4_VNIC_LOAD to "no", as I did.
Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov
-------------- next part --------------
An HTML attachment was scrubbed...
Url : https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20110603/da96bded/laotsao.vcf
> I think you will want to leave IPOIB_LOAD set to "no", and you will also
> want to change MLX4_VNIC_LOAD to "no", as I did.
great, thanks for info!!
Adrian
>
> Larry Baker
> US Geological Survey
> 650-329-5608
> ba...@usgs.gov
>
>
--
----------------------------------------------
Adrian Sevcenco |
Institute of Space Sciences - ISS, Romania |
adrian.sevcenco at {cern.ch,spacescience.ro} |
----------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3110 bytes
Desc: S/MIME Cryptographic Signature
Url : https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20110604/b4ebdb5c/smime.p7s
Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov
On 3 Jun 2011, at 2:06 PM, Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D.
I believe the boards are manufactured by Mellanox, and used/configured by the OEM. The board_id starts with SM_208... and I found reference elsewhere to this being SuperMicro? There is nothing on the switch and I don't think I want to pull the computer apart! Anyways I went through the Mellanox firmware site and found the instructions on how to update the OEM Mellanox boards. However do I need to do this? The update is to version 2.8.0600 and the board on my machine has 2.6.000.
Before I do anything else I'll wait 'til my computer expert is available Monday, and then look at the USCD Triton install instructions that you sent. On a quick perusal I don't see that install updating the firmware. Is that correct?
Bob
>>> [root@ABLethCX1 ‾]# lspci -v | grep Mellanox
>>>> [root@ABLethCX1 ‾]# ibv_devinfo
On Jun 3, 2011, at 3:44 PM, "Forster, Robert" <Robert....@agr.gc.ca> wrote:
> Before I do anything else I'll wait 'til my computer expert is available Monday, and then look at the USCD Triton install instructions that you sent. On a quick perusal I don't see that install updating the firmware. Is that correct?
The Triton OFED roll does not touch the firmware on the HCAs. I haven't looked at anyone else's install notes, but the roll only deals with software.
--Rick
Could you let me know when git.rocksclusters.org is up again, or if
there are any other places to get the Triton OFED and MPI/Ib files.
Thanks,
Bob Forster
-----Original Message-----
From: npaci-rocks-dis...@sdsc.edu
[mailto:npaci-rocks-dis...@sdsc.edu] On Behalf Of Philip
Papadopoulos
Sent: Friday, June 03, 2011 1:14 PM
To: Discussion of Rocks Clusters
Subject: Re: [Rocks-Discuss] Building Infiniband support for Rocks 5.4
If the Rocks git server won't be back today, I'll post the files for you later this morning.
--Rick
I've bundled the current version of the OFED roll source, along with the OFED-1.5.3.1 source. You can pull it down from:
http://users.sdsc.edu/~rpwagner/ofed-triton.tar.gz
If you need the BXOFED features, how to swap out the version of OFED is spelled out in the README, but I could also post a separate version.
--Rick
I downloaded the source files. Do you know anything about the MPI/IB
roll, or would I recompile openMPI to have the IB support enabled?
During the first install attempt all the mpi files were uninstalled. I
assume installing a roll installs binaries and doesn't actually
recompile anything.
Thanks Bob
Because we (the Triton developers) build several implementations of MPI with various compilers, we chose to limit the OFED roll to build only the IB libraries and kernel modules. If you want MPI to build, try modifying src/ofed/ofed.conf; you want to change lines like openmpi_gcc=n to openmpi_gcc=y. You'll also need to modify nodes/ofed-common.xml, and add the new packages that get built.
The Triton mpi_ib roll may work for you out of the box. For the future, we're developing a single MPI roll, that will build ISOs using different compilers and for different fabrics. (It does this by adjusting the build environment using modules.)
--Rick
-P
Cheers,