[slurm-dev] Trouble loading alternate slurmconf by env or swtich

1 view
Skip to first unread message

Wiegand, Paul

unread,
Nov 17, 2015, 5:50:00 PM11/17/15
to slurm-dev

Greetings,

We are running CentOS 7 (i.e., systemd) and using SLURM 15.08.3, and everything works fine as long as the slurm.conf is where I told it during the build. However, if I try to use the SLURM_CONF environmental variable and/or the "-f" switch when loading the daemon, slurm commands (e.g., sinfo) still seem to look where it was in the build (even when SLURM_CONF is set to the new, valid location).

I got the same behavior when SLURM 14.11.7.

The story is a bit more complicated than that, but rather a tedious recounting of what I've tried: Is there anyone out there that is doing this successfully from whom I might get some advice. Alternatively, is there anyone out there that can verify that there are issues (or, at least, that the documented methods are insufficient to achieve my goals)?

Thanks,
Paul.

Trey Dockendorf

unread,
Nov 17, 2015, 6:49:03 PM11/17/15
to slurm-dev
SLURM_CONF works in 14.03.10 with RPMs based installation.  Seems work in 15.08.3 too.  My test was very basic.

$ scontrol --version
slurm 14.03.10
$ scontrol ping
Slurmctld(primary/backup) at batch01/(NULL) are UP/DOWN
$ export SLURM_CONF=/home/admin/etc/slurm-node-dev/slurm.conf
$ scontrol ping
Slurmctld(primary/backup) at batch02/(NULL) are UP/DOWN

Minor errors which are expected since the second ping was from 15.08.3 -> 14.03.10.

$ scontrol --version
slurm 15.08.3
$ scontrol ping
Slurmctld(primary/backup) at batch02/(NULL) are UP/DOWN
$ export SLURM_CONF=/home/admin/etc/slurm-node/slurm.conf
$ scontrol ping
scontrol: error: slurm_receive_msg: Zero Bytes were transmitted or received
Slurmctld(primary/backup) at batch01/(NULL) are DOWN/DOWN

I don't know if the systemd aspect changes things, but I set the path of our configs in /etc/sysconfig/slurm like this in EL6 using init.d startup:

CONFDIR="/etc/slurm"
SLURMCTLD_OPTIONS="-f /etc/slurm/slurm.conf"
SLURMD_OPTIONS="-f /etc/slurm/slurm.conf -M"
export SLURM_CONF="/etc/slurm/slurm.conf"

- Trey

=============================

Trey Dockendorf 
Systems Analyst I 
Texas A&M University 
Academy for Advanced Telecommunications and Learning Technologies 
Phone: (979)458-2396 

Wiegand, Paul

unread,
Nov 17, 2015, 10:39:00 PM11/17/15
to slurm-dev

Is CONFDIR required? I don't see it in the documentation.

I copied your parameters, and it seems to work now. I appreciate your help!

Paul.

Trey Dockendorf

unread,
Nov 17, 2015, 11:11:04 PM11/17/15
to slurm-dev
I believe I set CONFDIR in the past when the value differed from the location set at build time.  The init.d scripts in 14.03 and 15.08 have this check:

if [ ! -f $CONFDIR/slurm.conf ]; then
   echo "Could not find $CONFDIR/slurm.conf. Bad path?"
   exit 1
fi

The default value of CONFDIR comes from when SLURM is built, based on the --sysconfdir I believe.  So if you are using a different path and the original path doesn't exist, then you likely need to set CONFDIR.  This likely only applies to the init.d scripts. From looking at systemd files used in 15.08 [1][2] looks like using SLURMD_OPTIONS and SLURMCTLD_OPTIONS is all you need, I don't see CONFDIR referenced.  I'm surprised /etc/sysconfig/slurm worked since appears the 15.08.4 systemd services reference specific files as set in EnvironmentFile.

- Trey


=============================

Trey Dockendorf 
Systems Analyst I 
Texas A&M University 
Academy for Advanced Telecommunications and Learning Technologies 
Phone: (979)458-2396 

Wiegand, Paul

unread,
Nov 18, 2015, 8:02:57 AM11/18/15
to slurm-dev

This is it exactly. Thank you very much.

Paul.

Wiegand, Paul

unread,
Nov 18, 2015, 8:39:07 AM11/18/15
to slurm-dev

 Perhaps I should be less terse.


I set CONFDIR in the proper sysyemd enviroment files and in the profile.d files (probably unnecessary), and otherwise did what you described (which is more or less what I had done before(, and now it works.


So it appears to me that CONFDIR is still needed in systemd.  The check in the service file you refernce is just an initial check ... I had/have mine hard-pathed and that wasn't throwing errors so I guess that was correct.  The only difference I see from what I was doing was CONFDIR.


Thanks again,

Paul


------ Original message------

From: Wiegand, Paul

Date: Wed, Nov 18, 2015 07:03

To: slurm-dev;

Subject:[slurm-dev] Re: Trouble loading alternate slurmconf by env or swtich


Reply all
Reply to author
Forward
0 new messages