[slurm-dev] What cluster provisioning system do you use?

7 views
Skip to first unread message

Bjørn-Helge Mevik

unread,
Mar 15, 2016, 8:40:33 AM3/15/16
to slurm-dev

I apologize for the slightly off-topic subject, but I could not think of
a better forum to ask. If you know of a more proper place to ask this,
I'd be happy to know about it.

We are currently in the design fase for a new cluster that is going to
be set up next year. We have so far used Rocks (on top of CentOS) for
cluster provisioning. However, Rocks don't support CentOS >= 7, and it
doesn't look like it will in the near future. Also for other reasons,
we are looking for alternatives to Rocks.

So, what are you using for cluster provisioning?

- Rocks?
- A different provisioning tool?
- A locally developed solution?

--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo

Chris Samuel

unread,
Mar 15, 2016, 8:45:08 AM3/15/16
to slurm-dev

On Tue, 15 Mar 2016 05:40:29 AM Bjørn-Helge Mevik wrote:

> I apologize for the slightly off-topic subject, but I could not think of
> a better forum to ask. If you know of a more proper place to ask this,
> I'd be happy to know about it.

http://beowulf.org/

There's actually a very recent thread that is relevant here:

http://beowulf.org/pipermail/beowulf/2016-March/033520.html

We use xCAT (open source). https://xcat.org/

All the best,
Chris
--
Christopher Samuel Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci

Ryan Novosielski

unread,
Mar 15, 2016, 9:13:20 AM3/15/16
to slurm-dev

> On Mar 15, 2016, at 08:44, Chris Samuel <sam...@unimelb.edu.au> wrote:
>
>> On Tue, 15 Mar 2016 05:40:29 AM Bjørn-Helge Mevik wrote:
>>
>> I apologize for the slightly off-topic subject, but I could not think of
>> a better forum to ask. If you know of a more proper place to ask this,
>> I'd be happy to know about it.
>
> http://beowulf.org/
>
> There's actually a very recent thread that is relevant here:
>
> http://beowulf.org/pipermail/beowulf/2016-March/033520.html
>
> We use xCAT (open source). https://xcat.org/

As a matter of fact, that thread also contains survey results from I believe the supercomputing conference, and maybe other places, to show what other people are using.

I am using Warewulf on our clusters. Support for CentOS 7 is preproduction at the moment, but they are planning a release any day now I believe. There is a final push of testing that was recently announced. I am using it anyway, with the only ill effects being some misplaced warning messages.

Our clusters also make very small use of xCAT, mostly because that is what IBM/Lenovo uses for certain things. My feel is that it is significantly more featureful, at the cost of complexity to configure.

=R=

John Hearns

unread,
Mar 15, 2016, 9:21:12 AM3/15/16
to slurm-dev
Bjorn

You should be definitely looking at Bright cluster Manager.

I set up a Bright cluster last week with CentOS 7.2 and slurm.
Bright works right our of the box with slurm, and it is set up automatically as you provision the nodes.
Also have the powersaving scripts etc all set up.

Please ping me an email off the list and I can discuss.

Also I am happy to let you log into our cluster remotely and 'test drive' it.
CentOS7, Slurm, Mellanox FDR Infiniband, and we have Xeon Phi too.
#####################################################################################
Scanned by MailMarshal - M86 Security's comprehensive email content security solution.
#####################################################################################
Any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the company. Employees of XMA Ltd are expressly required not to make defamatory statements and not to infringe or authorise any infringement of copyright or any other legal right by email communications. Any such communication is contrary to company policy and outside the scope of the employment of the individual concerned. The company will not accept any liability in respect of such communication, and the employee responsible will be personally liable for any damages or other liability arising. XMA Limited is registered in England and Wales (registered no. 2051703). Registered Office: Wilford Industrial Estate, Ruddington Lane, Wilford, Nottingham, NG11 7EP

Loris Bennett

unread,
Mar 15, 2016, 9:50:11 AM3/15/16
to slurm-dev

Hi Bjørn-Helge,

Bjørn-Helge Mevik <b.h....@usit.uio.no>
writes:

> I apologize for the slightly off-topic subject, but I could not think of
> a better forum to ask. If you know of a more proper place to ask this,
> I'd be happy to know about it.
>
> We are currently in the design fase for a new cluster that is going to
> be set up next year. We have so far used Rocks (on top of CentOS) for
> cluster provisioning. However, Rocks don't support CentOS >= 7, and it
> doesn't look like it will in the near future. Also for other reasons,
> we are looking for alternatives to Rocks.
>
> So, what are you using for cluster provisioning?
>
> - Rocks?
> - A different provisioning tool?
> - A locally developed solution?

We currently use Bright Cluster Manager, but are looking to move away
from this due to cost, lack of an update path from our current set-up,
and the fact that the integration with Slurm locked us to version 2.2.7
for a long time until we decided to do without the integration and
installed an up-to-date version.

I am currently setting up a test cluster and shall be looking at

- Warewulf
- DRBL
- maybe xCat

I would also be interested in other options.

Cheers,

Loris

--
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin Email loris....@fu-berlin.de

Roland Fehrenbacher

unread,
Mar 15, 2016, 10:15:13 AM3/15/16
to slurm-dev, Bjørn-Helge Mevik

>>>>> "BH" == Bjørn-Helge Mevik <b.h....@usit.uio.no> writes:

BH> I apologize for the slightly off-topic subject, but I could not
BH> think of a better forum to ask. If you know of a more proper
BH> place to ask this, I'd be happy to know about it.

BH> We are currently in the design fase for a new cluster that is
BH> going to be set up next year. We have so far used Rocks (on top
BH> of CentOS) for cluster provisioning. However, Rocks don't
BH> support CentOS >= 7, and it doesn't look like it will in the
BH> near future. Also for other reasons, we are looking for
BH> alternatives to Rocks.

BH> So, what are you using for cluster provisioning?

BH> - Rocks?
BH> - A different provisioning tool?
BH> - A locally developed solution?

I you want a simple yet powerful solution with a guaranteed upgrade
path, have a look at Qlustar. It has an excellent management interface,
tons of goodies, like fully integrated Lustre and BeeGFS e.g and comes
with regular security updates. It currently ships slurm 15.08.5.

With a reasonable Internet connection, you'll have a cluster headnode,
including a virtual demo cluster, setup in 30 min.

https://qlustar.com/download

Best,

Roland

-------
http://www.q-leap.com / http://qlustar.com
--- HPC / Storage / Cloud Linux Cluster OS ---

Daniel Letai

unread,
Mar 15, 2016, 10:29:19 AM3/15/16
to slurm-dev
Another vote for xCAT here - been using it for ~3 years now, on installations ranging from 8 to 1+k nodes.
Once you get to know xCAT it's quite easy to manage, although familiarity with perl will help in any troubleshooting or customization (not required, you can do without, but it helps).

That said, I've been meaning for a long time to look at foreman (http://theforeman.org/) especially with it's good integration with puppet/chef/salt etc. which increasingly become more relevant to large cluster management.

xCAT's integration with configuration management tools is somewhat lacking, from my experience.

Aaron Knister

unread,
Mar 15, 2016, 11:09:09 AM3/15/16
to slurm-dev

+1 for xcat. We use xCAT to manage 3k+ nodes and while it can be a little complex it works very well. I've used xcat on 3 different systems now and I quite like it.

Sent from my iPhone

Jared David Baker

unread,
Mar 15, 2016, 11:19:19 AM3/15/16
to slurm-dev

+1 for xCAT as well. We’ve enjoyed it for the most part, although we have found a few bugs, but the xCAT team good to work with just like the Slurm team. I will note, like Daniel, we’re looking for some better integration with configuration management software stacks as well as the number of support nodes (i.e., not compute nodes) grows and need for additional services beyond job scheduling, storage, etc. I have used Rocks, Warewulf, and trialed Bright, but we’ve selected xCAT with Warewulf as a highly preferable secondary option.

 

-Jared

 

From: Daniel Letai [mailto:da...@letai.org.il]

Sent: Tuesday, March 15, 2016 8:29 AM
To: slurm-dev <slur...@schedmd.com>

Rouhani, Hossein

unread,
Mar 15, 2016, 11:24:14 AM3/15/16
to slurm-dev, HPC_Engineering

Hi Bjørn-Helge,

We also have a good experience with xCAT and did lots of cluster configuration across Europe. Please let us know if you need help or assistant. We also have a intensive xCAT training beside other HPC services.

Regards,
Hossein
transtec ag
________________________________________
From: Aaron Knister [aaron....@gmail.com]
Sent: Tuesday, March 15, 2016 4:08 PM
To: slurm-dev
Subject: [slurm-dev] Re: What cluster provisioning system do you use?

transtec Aktiengesellschaft
Vorstand: Hans-Juergen Bahde
Vorsitzender des Aufsichtsrats: Hauke Luebben
Sitz der Gesellschaft: Reutlingen
Amtsgericht Stuttgart HRB 381299

Diese E-Mail enthaelt vertrauliche und/oder rechtlich geschuetzte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese
E-Mail irrtuemlich erhalten haben, informieren Sie bitte sofort den
Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie
die unbefugte Weitergabe dieser Mail ist nicht gestattet.

This e-mail may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this e-mail
in error) please notify the sender immediately and destroy this
e-mail. Any unauthorized copying, disclosure or distribution of the
material in this e-mail is strictly forbidden.

John Hearns

unread,
Mar 15, 2016, 11:48:10 AM3/15/16
to slurm-dev

I am currently setting up a test cluster and shall be looking at

- Warewulf

If you like Warewulf, you could look at OpenHPC, which uses Warewulf for the provisioning.
The slurm version on my OpenHPC server is 15.08.6, and this came from the OpenHPC repositories.

Trey Dockendorf

unread,
Mar 15, 2016, 1:00:21 PM3/15/16
to slurm-dev
Not specifically a cluster provisioning system, but we use Foreman + Puppet to provision our systems, which are stateful.  The deploy time is kind of lengthy, ~45 minutes per node, but the results are consistent and changes can be made to systems after deployment via Puppet.  Puppet operates in master-less mode so modules and data come from a NFS mount, which allows Puppet to scale beyond the limits of the Puppet masters.

- Trey

=============================

Trey Dockendorf 
Systems Analyst I 
Texas A&M University 
Academy for Advanced Telecommunications and Learning Technologies 
Phone: (979)458-2396 

Marc Rodriguez

unread,
Mar 15, 2016, 2:02:24 PM3/15/16
to slurm-dev
We use SaltStack and Cobbler with centos 7 we was test katello but is uneatable with salt

I install node in less than 20minuts, and I manage the configuration/partitions scheduling and accounting with salt
--

--

ALBA Synchrotron

Marc Rodriguez
Systems - Computing Division
 
ALBA SYNCHROTRON LIGHT SOURCE
Ctra. BP 1413 km. 3,3 | 08290 | Cerdanyola del Vallès| Barcelona | Spain
(+34) 93 592 40 81
www.albasynchrotron.es | marc.ro...@cells.es
 
Please, do not print this e-mail unless it is absolutely necessary.
Si heu rebut aquest correu per error, us informo que pot contenir informació confidencial i privada i que està prohibit el seu ús. Us agrairíem que ho comuniqueu al remitent i l'elimineu. Gràcies.
Si ha recibido este correo por error, le informo de que puede contener información confidencial y privada y que está prohibido su uso. Le agradeceré que lo comunique a su remitente y lo elimine. Gracias.
If you have received this e-mail in error, please note that it may contain confidential and private information, therefore, the use of this information is strictly forbidden. Please inform the sender of the error and delete the information received. Thank you.

Bill Barth

unread,
Mar 15, 2016, 4:24:13 PM3/15/16
to slurm-dev

We use something that was originally home-grown in house but is now open
source (https://github.com/hpcsi/losf) after having moved off of Rocks
long ago. It's based on Cobbler but has recently be updated to support
both CentOS and SUSE. Either way, management of RPMs is at its core. We
have had more than 10k nodes under management across a handful of clusters
at once with this system.

Best,
Bill.
--
Bill Barth, Ph.D., Director, HPC
bba...@tacc.utexas.edu | Phone: (512) 232-7069
Office: ROC 1.435 | Fax: (512) 475-9445
Reply all
Reply to author
Forward
0 new messages