Vicker, Darby (JSC-EG311)
unread,Feb 12, 2018, 9:47:52 AM2/12/18Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Slurm User Community List, slurm...@schedmd.com
We recently brought a new cluster online with the desire to federate it with our existing cluster. See the full story here:
https://bugs.schedmd.com/show_bug.cgi?id=4512
There are some fairly large limitations to federation, the biggest of which (for us anyway) was:
> The current implementation assumes all systems in the federation
> are largely identical. We hope to address this in future versions.
I initially thought this would be a show stopper for us but we were able to modify our job_submit.lua to work around that issue for our use case. We haven't actually federated our two clusters yet. We are still testing things out with -M submissions. If that works out, we probably will federate them.
My initial solution to getting slurm working on multiple clusters also involved setting SLURM_CONF to a different location (we have slurm installed on an NFS share that gets mounted on both clusters). As pointed out in Bug 4573, this isn't a good solution for multi-cluster operation since, by default anyway, environment variables are exported to the job on another cluster. This will confuse slurm if the job starts on another cluster. The solution I chose was to configure slurm to always look in /etc/slurm/ and have that directory be a sym link to the proper slurm configuration directory for that cluster. That seems to work well for us.
Our plugins are pretty much the same between our two clusters so I'm not sure about that question.
Hope that helps.
Darby
-----Original Message-----
From: slurm-users <
slurm-use...@lists.schedmd.com> on behalf of Yair Yarom <
ir...@cs.huji.ac.il>
Reply-To: Slurm User Community List <
slurm...@lists.schedmd.com>
Date: Monday, February 12, 2018 at 4:15 AM
To: "
slurm...@schedmd.com" <
slurm...@schedmd.com>
Subject: [slurm-users] Should I join the federation?