Warewulf vs xCAT for Torque-based environment of 20 nodes on CentOS

770 views
Skip to first unread message

Vince Forgetta

unread,
Apr 7, 2015, 3:09:53 PM4/7/15
to ware...@lbl.gov
Hi,

I am quite keen on using Warewulf on a 20 node cluster I admin, but also came across xCAT. From my limited comparisons, by reading the tutorials on how to deploy these two systems, Warewulf seems more straightforward (Warewulf: http://www.admin-magazine.com/HPC/Articles/Warewulf-Cluster-Manager-Master-and-Compute-Nodes [multiple parts], and xCAT: http://sumavi.com/books/xcat-administrators-guide), whereas xCAT seems to have more features and better documentation (?). My intention is to use one of these to deploy and manage a cluster running Torque.  Optionally, I would also like to manage 4 additional nodes with SSH login and Rstudio server for data analysis. All nodes are connected to a NAS in RAID 10. I would prefer to have IPMI access to all nodes as well, but not required.

My requirements seem quite simple, so I would assume that Warewulf is a good fit. However, I hesitate making the leap, thinking that xCAT may have crucial feature I will need.

So, is there anyone with experience using these two systems, and would like to share their thoughts?

Thanks in advance for your comments.

Vince 

Gregory M. Kurtzer

unread,
Apr 7, 2015, 5:28:30 PM4/7/15
to Warewulf
Hi Vince,

You may find that many of the people on this list are a bit biased (noteably myself), but with that said Warewulf is designed to be straight forward, and not overly complex. I can't state that xCAT won't have a crucial feature that you may need in the future but I haven't come across any. As a matter of fact, Warewulf pioneered the needed features that xCAT later not only emulated,.. but actually required Warewulf as a dependency to implement stateless node provisioning (google wareCAT).

My point is that just because a solution is feature-full does not guarantee they will have the features you need. ;)

With that said, at this point xCAT is a very robust but extremely complicated solution. You will have to become quite familiar with the xCAT documents if you wish to use it effectively. On the contrary, the problem with Warewulf is that the documentation is just very thin. With that said, we not only have a new Documentation project underway led by Doug Eadline, but Warewulf is much simpler in design (no doubt somewhat reflective of the simple mind that created it LOL) and thus easier to use.

We run Warewulf in a pretty complicated manner with a single WW master running multiple clusters within a shared environment including some special nodes (e.g. login/interactive nodes, DTNs, GPUs, etc..) and we have not run into any missing features. Yet...

Hope that helps!

Greg







--
You received this message because you are subscribed to the Google Groups "Warewulf" group.
To unsubscribe from this group and stop receiving emails from it, send an email to warewulf+u...@lbl.gov.
To post to this group, send email to ware...@lbl.gov.
To view this discussion on the web visit https://groups.google.com/a/lbl.gov/d/msgid/warewulf/692d5ce8-e128-4fbd-a768-d34d12357902%40lbl.gov.
For more options, visit https://groups.google.com/a/lbl.gov/d/optout.



--
Gregory M. Kurtzer
Technical Lead and HPC Systems Architect
High Performance Computing Services (HPCS)
University of California
Lawrence Berkeley National Laboratory
One Cyclotron Road, Berkeley, CA 94720

Jess Cannata

unread,
Apr 7, 2015, 5:32:54 PM4/7/15
to ware...@lbl.gov
I have not run xCat, but have used a variety of other cluster management toolkits and Warewulf is pretty easy to get running despite the missing documentation. Plus, Warewulf has a great mailing list with users and developers who are happy to help when troubles appear.

--
Jess Cannata
R Systems
512-410-9690

Joseph Gonzalez

unread,
Apr 7, 2015, 11:14:49 PM4/7/15
to ware...@lbl.gov
I have not used xCat either, but I did attempt to install two different versions of Rocks 6.x before I finally gave up and went with Warewulf, WW 3.5 on CentOS 6.5.  At the time, I had just the most basic knowledge of working in a Linux environment, but with the help of a writeup by Jeff Layton, http://www.admin-magazine.com/HPC/Articles/Warewulf-Cluster-Manager-Master-and-Compute-Nodes, things went very easily.  All told, I was able to get a similar sized cluster, 24 nodes, running SLURM in approximately 3 months, again with a limited knowledge of Linux, let alone system administration.  Also, this list was very helpful with some infiniband issues I was having.  Good luck.




--
/*
Joseph Gonzalez, 
PhD Student, Materials Simulation Laboratory
Department of Physics, University of South Florida, Tampa, FL 33620

*/

Steve Pritchett

unread,
Apr 8, 2015, 7:13:57 AM4/8/15
to ware...@lbl.gov

I've discussed xCAT with a few admins even recently. What I've gotten out of those talks is that "xCAT is great... once you get it running." Last I personally looked at xCAT, some time ago now (perceus was seriously lacking for our needs), it was complex enough that I would have rather written my own provisioner. I even started playing a bit with that idea before the new Warewulf was released, after which I didn't feel the need to continue with it.

If you have a basic understanding of Linux administration, Warewulf is pretty straightforward. Even if you're doing some really crazy stuff most of its shortcomings can be hacked around easily enough. If you do have issues getting up and running, or come up with something out of the norm that you would like to try, this mailing list is extremely helpful in working through any issues you may find.

At the moment, we use Warewulf for provisioning a couple thousand systems in a multi-tenant environment, with various vlan and air gap separation, and a mix of Torque, SLURM, etc. configurations.

The things I would like to see that Warewulf currently lacks are a way to provision onto a software raid (last I knew this was still in the works), and a way to overlay a large group of files(such as an application installation) over a vnfs during deployment -though this can be done afterwards in about a billion other ways.

Hope this helps!
Steve

Vince Forgetta

unread,
Apr 8, 2015, 9:20:51 AM4/8/15
to ware...@lbl.gov
Hi all,

I am truly appreciative of the replies, and based on your feedback I am keen on giving Warewulf an earnest go. I will document my experience in hopes that it may contribute to the documentation effort.  I expect I will be back here with a question or two :)

Vince
Reply all
Reply to author
Forward
0 new messages