[Rocks-Discuss] Rocks, IB and PXE booting

Lloyd Brown

unread,

Oct 11, 2011, 2:45:56 PM10/11/11

to Discussion of Rocks Clusters

Is anyone in the Rocks community working on using Infiniband for
booting/installing? Something like Mellanox's FlexBoot? For the most
part, I think a cluster like ours at least, could get by with just IB
(including IPoIB, of course), but the integration with Rocks for PXE and
install, is something I'm less sure about.

I'd love to hear about anyone doing this, or something similar, to use
Rocks on a single-fabric cluster (preferably IB-based).

--
Lloyd Brown
Systems Administrator
Fulton Supercomputing Lab
Brigham Young University
http://marylou.byu.edu

Ian Kaufman

unread,

Oct 11, 2011, 2:57:44 PM10/11/11

to Discussion of Rocks Clusters

Most PXEboot/provisioning solutions that I know of use Busybox, which
in turn uses Udhcp. Udhcp does not have IB support.

Someone with a deep understanding of ROCKS would have to chime in to
see how the ROCKS image is sent over and installed. I think Kickstart
also uses Busybox.

However, I know that a patch to Udhcp for IB support is being tested,
and hopefully will get committed.

Ian

--
Ian Kaufman
Research Systems Administrator
UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu

Lloyd Brown

unread,

Oct 11, 2011, 6:20:39 PM10/11/11

to npaci-rocks...@sdsc.edu

Interesting. Just so I'm sure I understand, this is not talking about
the process of PXE and getting the kernel/initrd, but rather the
installation process afterward? Basically, you're saying that
Kickstart/Anaconda (or some component thereof) is the hard part?

Thanks,

Lloyd Brown
Systems Administrator
Fulton Supercomputing Lab
Brigham Young University
http://marylou.byu.edu

Ian Kaufman

unread,

Oct 11, 2011, 6:38:42 PM10/11/11

to Discussion of Rocks Clusters

It's the kernel/initrd phase - that mini-kernel uses busybox and udhcp
to start the process, and that is where IB support does not exist.
Again, I believe RHEL based systems have a busybox-anaconda package
that is used during the pre-installation phase of kickstart.

Ian

--

Philip Papadopoulos

unread,

Oct 11, 2011, 11:08:05 PM10/11/11

to Discussion of Rocks Clusters

It comes down to "does the installation kernel recognize the adapter as
common networking device?"
It may be possible, but it will require a deep dive into Anaconda's
methodology of detecting networking hardware.

An additional issue will be the very large number of services that have to
be initiated to get an IB network to actually work.
I'm not familiar with flexboot, but if it is PXE-like that is only about
1/8th of the way there. Once linux is up and running
install mode, it has be able to configure the network.

If I were to attempt something like this, I would focus on getting the IPoIB
to actually function in the install environment.
Network installers understand IP, they don't understand IB.

-P

--
Philip Papadopoulos, PhD
University of California, San Diego
858-822-3628 (Ofc)
619-331-2990 (Fax)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20111011/283f69ad/attachment.html

Ian Kaufman

unread,

Oct 12, 2011, 12:54:20 AM10/12/11

to Discussion of Rocks Clusters

The only way to do this is by using IPoIB. However, not all DHCP
servers can deal with IB protocols and MAC addresses. There is a patch
to support IB for ISC DHCP, and DNsmasq has IB support built in. But,
Busybox's Udhcp does not have any IB support. The patch is in testing
by some people I know at LBNL/NERSC.

I am using QLogic DDR interfaces, and was able to use iPxe to build
the correct driver and flash the cards. I was able to get through the
initial PXEboot process, but when the Busybox kernel loaded, the IB
interface would drop. After some digging, I found out that Busybox's
Udhcp did not have any IB support. So, for now, I provision using
GigE, but use IB for data transfer and Grid Engine communication.

Ian

Lloyd Brown

unread,

Oct 12, 2011, 10:31:36 AM10/12/11

to npaci-rocks...@sdsc.edu

This has been an interesting discussion. I appreciate the community
insight. Looks like I'll have to do some more digging/testing, etc.

>From what I've been able to read (that's as far as I've gotten so far),
Mellanox Flexboot includes patches for dhcpd, several kernel/initrd
combinations, and a bootloader image for HCAs that's at least based of
iPXE. They also have a document that describes (at least partially) the
process of installing CentOS, RHEL, and SLES11, over IPoIB, so someone
there is thinking about it. Refs:
http://www.mellanox.com/related-docs/prod_software/FlexBoot_user_manual.pdf
and
http://www.mellanox.com/related-docs/prod_software/Linux_PXE_Installation_over_IPoIB_README.txt

In the end, this is more exploratory for me, since we're a long way away
from our next cluster purchase. But I'll see what I can do as far as
testing with stock CentOS/RHEL, and maybe we can resurrect this thread then.

Thanks again for all the discussion. I'll be interested to see where
this goes long term.

Lloyd Brown
Systems Administrator
Fulton Supercomputing Lab
Brigham Young University
http://marylou.byu.edu

Andrus, Brian Contractor

unread,

Oct 12, 2011, 3:15:43 PM10/12/11

to Discussion of Rocks Clusters

Just a quick 2 cents..

While we have no issue using the DHCP server to assign ip addresses once the IPoIB module is in, I would think the first hurdle would be a boot prom.
Without something on the card that will get said address and then request an image, you would be dead in the water.

It would be nice to have a quick boot prom on an IB card that could do that. Maybe emulate PXE or BOOTP enough to get/load the image. Of course now you still need to get that IPoIB into the image as needed, but that is probably easier than getting an IB boot prom :)

Brian Andrus
ITACS/Research Computing
Naval Postgraduate School
Monterey, California
voice: 831-656-6238

Ian Kaufman

unread,

Oct 19, 2011, 2:12:24 PM10/19/11

to Discussion of Rocks Clusters

OK, I did a little more digging.

Indeed, the RHEL Kickstart process uses Busybox, which in turn uses
the udhcp client, which does not have IB support at this time. So, you
will not be able to provision a system using single wire IB until the
IB patch for udhcp is integrated. You will have to provision using
GigE.

In a nutshell, this is how it goes:

PXEboot grabs the mini install kernel that uses Busybox/udhcp - using
iPXE (and Mellanox has this capability by default IIRC), this can be
done over IB using IPoIB and either dnsmasq's DHCP server, or the
latest/patched ISC DHCP server (I don't recall which version of ISC
DHCP includes IB support). Looks like Mellanox supplies a patched ISC
DHCP package.

The PXEboot/tftp transfer loads the Busybox based initrd/kernel (which
is what runs the Kickstart script in RAM to format the disks, set up
networking, select/install the packages, etc.). Busybox uses the udhcp
client which requests the IP address again so that it can communicate
with the server. This is where you lose IB and IPoIB support. It looks
like Mellanox has their own Busybox/udhcp patch for IB/IPoIB support,
which is why you copy their vmlinuz and initrd files over:

http://www.mellanox.com/related-docs/prod_software/Linux_PXE_Installation_over_IPoIB_README.txt

Once Kickstart has finished, then the full kernel/initrd is installed
and the system reboots, and this should have IB/IPoIB support
(assuming the correct packages were selected if necessary).

I had quite a lot of fun trying to do all of this before I figured out
where the problem was ...

However, I kind of like isolating the management of the cluster to
GigE and leaving the IB infrastructure for just data transfer and
GridEngine traffic.

Ian

Lloyd Brown

unread,

Oct 19, 2011, 3:05:14 PM10/19/11

to npaci-rocks...@sdsc.edu

Ian,

This is extremely interesting. Thank you for looking into this. Were
you able to successfully boot the full anaconda/kickstart, using the
Mellanox kernel/initrd? That's the stage I keep getting stuck at, where
it's asking me to select the network (and only shows eth{0,1}), even
when I use their kernel/initrd.

Chances are I'm missing something really obvious in the setup. In the
end, I'm not really all that good with Anaconda/Kickstart and kernel
parameters.

As far as motivation goes, I certainly understand the desire to have
separate fabrics. Given the relatively low cost of a basic 1GbE fabric,
it's pretty hard to argue for removing it (you only really save the cost
of a few cables, and a few switches; the NICs are usually onboard).

But I also know that there is a movement in the Enterprise space to
flatten/merge networks, and IB is one that may fit the bill. Our
university is experimenting with using IB-only for it's VMware ESX farm,
for example, and having some pretty good results. They love not having
4 ethernets, at least 2 FCs, etc.

I don't know if we'll ever be there too. It's just exploratory right
now. Trying to understand what's possible and what isn't.

Lloyd Brown
Systems Administrator
Fulton Supercomputing Lab
Brigham Young University
http://marylou.byu.edu

Steve Swanekamp

unread,

Oct 19, 2011, 3:50:09 PM10/19/11

to Discussion of Rocks Clusters

Hi Rocks Gurus,
I would like to NFS mount a disk farm across the compute nodes. Is there a
way to do on the front end so that all get the require NFS directives? Is
this type of thing better done using the 411 service? If this is better,
then what are the commands to change the 411 service to mount the new disks
across the compute nodes? Thanks...
Steve

evm

unread,

Oct 19, 2011, 4:13:38 PM10/19/11

to Discussion of Rocks Clusters

I would be interested in the answer as well.

We have some NAS boxes that I currently NFS mount on a second interface
on the compute nodes (and a 3rd interface onthe head node) that has a
dedicated SAN switch if you will. The configuration is done manually and
I would rather do it as part of a node reload using standard tools
rather than a manually run script.

Inquiring minds want to know!

Thanks, Ethan

--
Ethan VanMatre
Research Systems Engineer
Center for Coastal Margin Observation and Prediction
Oregon Health Science University
503-748-1157
e...@stccmop.org

Jon Forrest

unread,

Oct 19, 2011, 4:29:05 PM10/19/11

to npaci-rocks...@sdsc.edu

On 10/19/2011 1:13 PM, evm wrote:
> I would be interested in the answer as well.
>
> We have some NAS boxes that I currently NFS mount on a second
> interface on the compute nodes (and a 3rd interface onthe head node)
> that has a dedicated SAN switch if you will. The configuration is
> done manually and I would rather do it as part of a node reload using
> standard tools rather than a manually run script.

One way to do this is to modify extend-compute.xml so that
it puts whatever you want into /etc/fstab on the compute nodes.
In this case it could be static mounts. Or, if you want
to use the automounter, you could do what's necessary
to modify the auto.* files.

Jon

Ian Kaufman

unread,

Oct 19, 2011, 4:26:19 PM10/19/11

to Discussion of Rocks Clusters

Hi Lloyd,

My road was a bit tougher. I have Qlogic DDR cards, so I had to
install iPXE and build my own ROM, and then flash the cards. That part
actually worked and I was able to get the initial PXE/tftp process
running. So, I did not try playing with the Mellanox stuff. But, I was
able to watch the traffic and logs and see that PXE/tftp was
successful, and the initrd/vmlinuz was transferred over, but then the
interface dropped out. It took a bit of time to find out the root
cause.

Interestingly, I was trying to build up a cluster using both Warewulf
3 and Perceus 1.6. My research dug up that an older version of Perceus
had addressed this issue by including a patched Busybox/udhcp client,
and that patch was provided by one of the original Perceus/Warewulf
guys. Unfortunately, udhcp had changed so much that the patch was no
longer valid. Fortunately, I know the developer personally, and had
already been working with him. When I presented my findings, it was an
"Ah ..." moment for him as well. He had one of his team work up a new
udhcp patch, which they will test and then send back up to the Busybox
team.

Anyway, were you able to successfully PXE and transfer the images over
tftp? Are you stuck at the Kickstart process? I suspect that something
in the Kickstart file isn't set properly, or the initrd/vmlinuz
provided by Mellanox is not detecting your hardware completely enough
to bring up the IB interface. Or, perhaps, the IB network is not
seeing things correctly.

Ian

--

Bart Brashers

unread,

Oct 19, 2011, 4:38:05 PM10/19/11

to Discussion of Rocks Clusters

The rocks way of doing this is to use the auto-mounter, and the 411 service to distribute /etc/auto.something (after you've populated it).

It's best to have a direct connection between the compute nodes and the external NFS box, as Ethan has. Otherwise, all traffic will go through the FE and it could become a bottleneck for throughput (in addition to slowing down your FE).

You could choose to mount your external SAN/NAS/SNA/ANS/TLA as a directory under /home, in which case you simply have to edit /etc/auto.home to add a line, then do a "rocks sync users" to distribute it to the nodes.

If you want to mount it under /data, then you can create a new file /etc/auto.data, add a line for it in /etc/auto.master, populate /etc/auto.data with lines like those found in /etc/auto.home (i.e. one line for each directory to be found under /data), and test on the FE. Once it's good, add /etc/auto.data to /var/411/Files.mk, and do a "make -C /var/411/Makefile force". Don't forget to "mkdir /data" on your FE, and add a line doing that to extend-compute.xml.

The usual Linux rules apply regarding the exporting box. It probably has some security (is the host requesting the mount known, and does it reverse-DNS correctly?, is the username known and has that user been given rights to mount).

An alternative way of doing this is to NOT use the auto-mounter, and add lines to /etc/fstab to make static NFS-mounts. Some users like this, some do not. Add the line by hand to the FE, tweaking it until it works, then edit extend-compute.xml to add the same line to /etc/fstab on all the nodes, and re-install them.

Bart Brashers

> -----Original Message-----
> From: npaci-rocks-dis...@sdsc.edu [mailto:npaci-rocks-
> discussio...@sdsc.edu] On Behalf Of evm
> Sent: Wednesday, October 19, 2011 1:14 PM
> To: Discussion of Rocks Clusters

This message contains information that may be confidential, privileged or otherwise protected by law from disclosure. It is intended for the exclusive use of the Addressee(s). Unless you are the addressee or authorized agent of the addressee, you may not review, copy, distribute or disclose to anyone the message or any information contained within. If you have received this message in error, please contact the sender by electronic reply to em...@environcorp.com and immediately delete all copies of the message.

Lloyd Brown

unread,

Oct 19, 2011, 6:01:59 PM10/19/11

to npaci-rocks...@sdsc.edu

Ian,

Wow. That's a lot more work than I realized. I'm impressed that you'd
work that hard at it. Thanks.

Also, this is getting a little off-topic relative to Rocks. We can take
the discussion off-list if you want.

At this point, I'm not entirely sure what my hangup is. As I said, I'm
not that good with Anaconda/kickstart and initrds. But my current
objective was just to get to a rescue prompt, so I'm not sure whether it
really needs to get the kickstart file or not.

Here's my current pxelinux.cfg for this host:

> kernel CentOS/CentOS5.6Mellanox/vmlinuz.CentOS5.6x64
> append rescue initrd=CentOS/CentOS5.6Mellanox/initrd.CentOS5.6x64 ksdevice=ib0

What I see is this:
- The host's BIOS/POST starts up, gets to the point of loading the
iPXE/FlexBoot ROM image from the HCA
- The iPXE image correctly brings up the IB interface, and DHCPs from
another host running Mellanox's patched DHCP server. The dhcp logs show
the lease being issued.
- The iPXE image pulls the pxelinux.0 image, then pulls the pxelinux
config, then the kernel/initrd, all successfully over IPoIB.
- The kernel starts, and then starts prompting for Language, Keyboard
layout, etc. When it gets to the point of asking where the media is
("Local CDROM", "Hard Drive", "NFS image", etc.), I choose NFS (I have
the contents of the ISOs on an NFS server, both on TCP/IP/Ethernet
network, and on IPoIB network).
- I get prompted to choose which network to use to get to the NFS server
(eth0 or eth1). I'd hoped to see ib0 here, but no luck. I choose eth0,
it DHCPs, etc., and we move on.
- If I specify the IPoIB address on the NFS server, I get an error about
not being able to contact the NFS server.
- If I specify the TCP/IP/Ethernet address of the NFS server, it finds
it, pulls the image it needs, and drops me to a rescue prompt, as expected.
- When I get to the prompt, I poke around and discover that:
-- the kernel modules (eg. "ib_ipoib", "mlx4_ib", etc.) are loaded
-- the "ib0" and "ib1" interfaces show up in "ifconfig -a", meaning the
interfaces exist, but they're not up, meaning they don't show up in
"ifconfig", and don't have IP addresses assigned.
-- if I try "dhclient ib0", the dhcp server never sees the request, and
the client times out; this may be your udhcp patch that's needed; I'm
not sure
-- if I statically assign an IP using "ifconfig ib0 IPADDR netmask
255.255.255.0 up", it comes up, and I can ping the dhcp/nfs server over
IPoIB

Just for fun, I also tried adding the "ip=" and "netmask=" on the append
line, hoping that with the "ksdevice=ib0", it would work, and we'd know
it was just a dhcp client issue. Unfortunately not. In that
configuration, I couldn't get either NFS server IP to work, and so
couldn't drop to a prompt to see what's going on. I suspect it's
putting the IP onto eth0 or something, but I don't know that.

Lloyd Brown
Systems Administrator
Fulton Supercomputing Lab
Brigham Young University
http://marylou.byu.edu

Ian Kaufman

unread,

Oct 19, 2011, 7:06:50 PM10/19/11

to Discussion of Rocks Clusters

Hi Lloyd,

> Wow. That's a lot more work than I realized. I'm impressed that you'd
> work that hard at it. Thanks.

All new cluster, with all new hardware, 2x100TB storage systems, QDR
intefaces for storage and switches, DDR on the redundnat frontends and
94 dual hexacore nodes (4 halfU systems in 2U racks). Wanted to move
to Ubuntu, but Perceus 1.6 support is a little flat right now, 1.7
isn't out (nor is 2.0), and Warewulf 3 (Warewulf development has split
form Perceus) doesn't really support Ubuntu yet (I was offering up a
lot of code changes to support Ubuntu, dnsmasq and IB). The cluster is
almost ready for final testing.

>
> Also, this is getting a little off-topic relative to Rocks. We can take
> the discussion off-list if you want.

I will leave that up to the list - but I suspect Phil, Anoop and
others might be interested in how the whole PXE/Kickstart over IB runs
its course. So, while not entirely ROCKS specific, there is some
relevance in the case where someone wants to deploy a ROCKS cluster
over IB. But, like I said, if Phil asks to take it off-list, we can do
so - he knows where my office is and how to get a hold of me, after
all.

>
> At this point, I'm not entirely sure what my hangup is. As I said, I'm
> not that good with Anaconda/kickstart and initrds. But my current
> objective was just to get to a rescue prompt, so I'm not sure whether it
> really needs to get the kickstart file or not.
>
> Here's my current pxelinux.cfg for this host:
>
>> kernel CentOS/CentOS5.6Mellanox/vmlinuz.CentOS5.6x64
>> append rescue initrd=CentOS/CentOS5.6Mellanox/initrd.CentOS5.6x64 ksdevice=ib0
>
> What I see is this:
> - The host's BIOS/POST starts up, gets to the point of loading the
> iPXE/FlexBoot ROM image from the HCA
> - The iPXE image correctly brings up the IB interface, and DHCPs from
> another host running Mellanox's patched DHCP server. The dhcp logs show
> the lease being issued.
> - The iPXE image pulls the pxelinux.0 image, then pulls the pxelinux
> config, then the kernel/initrd, all successfully over IPoIB.

So, this is where I got - even the part where the Busybox kernel loads up.

> - The kernel starts, and then starts prompting for Language, Keyboard
> layout, etc. When it gets to the point of asking where the media is
> ("Local CDROM", "Hard Drive", "NFS image", etc.), I choose NFS (I have
> the contents of the ISOs on an NFS server, both on TCP/IP/Ethernet
> network, and on IPoIB network).
> - I get prompted to choose which network to use to get to the NFS server
> (eth0 or eth1). I'd hoped to see ib0 here, but no luck. I choose eth0,
> it DHCPs, etc., and we move on.
> - If I specify the IPoIB address on the NFS server, I get an error about
> not being able to contact the NFS server.
> - If I specify the TCP/IP/Ethernet address of the NFS server, it finds
> it, pulls the image it needs, and drops me to a rescue prompt, as expected.
> - When I get to the prompt, I poke around and discover that:
> -- the kernel modules (eg. "ib_ipoib", "mlx4_ib", etc.) are loaded
> -- the "ib0" and "ib1" interfaces show up in "ifconfig -a", meaning the
> interfaces exist, but they're not up, meaning they don't show up in
> "ifconfig", and don't have IP addresses assigned.
> -- if I try "dhclient ib0", the dhcp server never sees the request, and
> the client times out; this may be your udhcp patch that's needed; I'm
> not sure

Yeah - this is kind of what I suspect. The patch to udhcp is not in
place, and therefore the udhcp client in the Busybox kernel cannot
make the DHCP request over IB. I had to rebuild my initrd and vmlinuz
to support the Qlogic interfaces, but I still could not get any DHCP
requests working. Hence all the digging to find out all I could
regarding Busybox, udhcp and IB. I even downloaded the Perceus and
Warewulf sources, and was updating/patching the Perceus code that
builds the Busybox and udhcp stuff, which is how I found out about the
old patch, and that the udhcp code had changed enough to make the
patch invalid. But, the researchers who bought the cluster were
getting impatient (and rightfully so), so I changed tactics. At some
point I may go back and test the patch out if the Warewulf guys don't
beat me to it.

> -- if I statically assign an IP using "ifconfig ib0 IPADDR netmask
> 255.255.255.0 up", it comes up, and I can ping the dhcp/nfs server over
> IPoIB

Yep - sure sounds like udhcp is the culprit for sure.

--

Larry Baker

unread,

Oct 19, 2011, 7:30:39 PM10/19/11

to Steve Swanekamp, Discussion of Rocks Clusters

Bart brings up most of the important issues around the choices to be
made.

As far as doing things the "Rocks" way, Jon mentioned modifying extend-
compute.xml. I did things a little differently to add IP-over-IB and
an NFS NAS to our cluster. (It doesn't really matter that new subnet
is IP-over-IB; this is all IP subnet configuration stuff.) Attached
is a script (I've posted it before) that uses the existing compute
node network definitions from the database to create another subnet
with a one-to-one mapping of host ids (the variable part of a network
address) to the new subnet. It adds firewall rules for the new
subnet. It duplicates exports from the front end over the new
subnet. I also create, but don't automatically use, automounter files
for the new subnet. (The 411 scripts already send all auto.* files to
the compute nodes.) Standard Rocks compute node installs then take
care of the rest, since all the info is in the database.

Here's what the top of the script I wrote (ipoib-rocks-5.4.2.sh) looks
like. You can edit it to change what you want to do differently. I
have not yet tried to install Rocks 5.4.3. (The firewall rules
commands have changed for Rocks 5.4.3.)

> #!/bin/sh
> #
> # This script will configure the IP-over-IB network, assign IP-over-
> IB network IP
> # addresses, and replicate the services from the private network on
> the IP-over-IB
> # network. The hostid portion of the private network IP address,
> plus an optional bias,
> # is used to assign the IP address on the IP-over-IB network. A
> warning is issued if the
> # netmasks differ, i.e., if the range of hostids is not the same on
> the private network
> # and the IP-over-IB network.
> #
> # Fully qualified domain names for the IP-over-IB IP addresses are
> defined in /etc/hosts
> # (on the front end only) by replacing .local with .ib; e.g.,
> compute-0-0.local is the
> # private IP address and compute-0-0.ib is the IP-over-IB IP address.
> #
> # Select the host names or groups for IP-over-IB (rocks iterate host
> ${ipoib_hosts})
> #
> # Default is all hosts except the front end
> #ipoib_hosts=
> ipoib_hosts="frontend compute nas"
> #
> # Rocks database components will be created, as needed, or modified
> to match the settings
> # provided here.
> #
> ipoib_iface=ib0
> ipoib_name=ib
> ipoib_subnet=10.170.47.64
> ipoib_netmask=255.255.255.192
> ipoib_bias=0
> #
> # Do not make any changes below this line
> #

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

-------------- next part --------------
An HTML attachment was scrubbed...

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20111019/4f414d75/attachment.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ipoib-rocks-5.4.2.sh
Type: application/octet-stream
Size: 11296 bytes
Desc: not available
Url : https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20111019/4f414d75/ipoib-rocks-5.4.2.sh

-------------- next part --------------
An HTML attachment was scrubbed...

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20111019/4f414d75/attachment-0001.html

Larry Baker

unread,

Oct 19, 2011, 7:51:36 PM10/19/11

to Lloyd Brown, Discussion of Rocks Clusters

Lloyd,

Besides the interesting intellectual exercise you and Ian have gone
through, I hope you know you can make a bootable USB stick that will
get you to a rescue prompt from a CentOS LiveCD image. You can also
add a read-write overlay to put things like Mellanox firmware updates
on it. That's how I update the Mellanox firmware on my IB cards. No
need to pollute Rocks with the Mellanox IB stack (it can mess up other
IB stacks and OpenMPI libraries that have already been installed).

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

Lloyd Brown

unread,

Oct 20, 2011, 10:33:55 AM10/20/11

to Discussion of Rocks Clusters

Larry,

Yes. I do know that can be done. Admittedly, I've done things like
that more often with Ubuntu to get it running on my netbook (without an
optical drive), than I have with CentOS for my cluster of servers. In
general, we've preferred to use either in-OS tools, or PXE-bootable
images, for things like firmware updates or hardware diagnostics. That
way we don't have to walk down to the server room, and it's much more
scalable than the "Virtual Media" stuff that some server management can do.

And yes, I know that the Mellanox IB stuff can mess some things up,
including deleting any packages whose names match "*openmpi*". In
general, I've built OFED (not MLNXOFED) and OpenMPI myself, and just put
them in "<package>..</package>" in the extend-compute.xml, and called it
good.

Thanks for the warning, though. I do appreciate it.

Lloyd Brown
Systems Administrator
Fulton Supercomputing Lab
Brigham Young University
http://marylou.byu.edu

Lloyd Brown

unread,

Oct 20, 2011, 5:28:47 PM10/20/11

to npaci-rocks...@sdsc.edu

Ian,

Do you know if this patch is going to make it into the udhcp codebase?
If that's all it takes to get this next step working, that'd be great to
get it in.

I realize that even if it is included, the whole process of getting it
released, and adopted into RH's environment, etc., will take a long
time. But, it'll never happen if it doesn't get started.

Lloyd Brown
Systems Administrator
Fulton Supercomputing Lab
Brigham Young University
http://marylou.byu.edu

Ian Kaufman

unread,

Oct 20, 2011, 5:40:55 PM10/20/11

to Discussion of Rocks Clusters

Hi Lloyd,

If it tests out, then yes, the Warewulf3 team will submit it upstream
to the Busybox/udhcp team. Then they (Busybox/udhcp) will have to
decide to include it.