[slurm-users] Issues with pam_slurm_adopt

530 views
Skip to first unread message

Nicolas Greneche

unread,
Apr 8, 2022, 1:54:04 PM4/8/22
to slurm...@lists.schedmd.com
Hi,

I have an issue with pam_slurm_adopt when I moved from 21.08.5 to
21.08.6. It no longer works.

When I log straight to the node with root account :

Apr 8 19:06:49 magi46 pam_slurm_adopt[20400]: Ignoring root user
Apr 8 19:06:49 magi46 sshd[20400]: Accepted publickey for root from
172.16.0.3 port 50884 ssh2: ...
Apr 8 19:06:49 magi46 sshd[20400]: pam_unix(sshd:session): session
opened for user root(uid=0) by (uid=0)

Everything is OK.

I submit a very simple job, an infinite loop to keep the first compute
node busy :

nicolas.greneche@magi3:~/test-bullseye/infinite$ cat infinite.slurm
#!/bin/bash
#SBATCH --job-name=infinite
#SBATCH --output=%x.%j.out
#SBATCH --error=%x.%j.err
#SBATCH --nodes=1
srun infinite.sh

nicolas.greneche@magi3:~/test-bullseye/infinite$ sbatch infinite.slurm
Submitted batch job 203

nicolas.greneche@magi3:~/test-bullseye/infinite$ squeue
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
203 COMPUTE infinite nicolas. R 0:03 1 magi46

I have a job running on the node. When I try to log on the node with the
same regular account :

nicolas.greneche@magi3:~/test-bullseye/infinite$ ssh magi46
Access denied by pam_slurm_adopt: you have no active jobs on this node
Connection closed by 172.16.0.46 port 22

In the auth.log, we can see that the job found (JOBID 203) is found but
the PAM decides that I have no running job on node :

Apr 8 19:11:32 magi46 sshd[20542]: pam_access(sshd:account): access
denied for user `nicolas.greneche' from `172.16.0.3'
Apr 8 19:11:32 magi46 pam_slurm_adopt[20542]: debug2:
_establish_config_source: using config_file=/run/slurm/conf/slurm.conf
(cached)
Apr 8 19:11:32 magi46 pam_slurm_adopt[20542]: debug: slurm_conf_init:
using config_file=/run/slurm/conf/slurm.conf
Apr 8 19:11:32 magi46 pam_slurm_adopt[20542]: debug: Reading
slurm.conf file: /run/slurm/conf/slurm.conf
Apr 8 19:11:32 magi46 pam_slurm_adopt[20542]: debug: Reading
cgroup.conf file /run/slurm/conf/cgroup.conf
Apr 8 19:11:32 magi46 pam_slurm_adopt[20542]: debug4: found
StepId=203.batch
Apr 8 19:11:32 magi46 pam_slurm_adopt[20542]: debug4: found StepId=203.0
Apr 8 19:11:32 magi46 pam_slurm_adopt[20542]: send_user_msg: Access
denied by pam_slurm_adopt: you have no active jobs on this node
Apr 8 19:11:32 magi46 sshd[20542]: fatal: Access denied for user
nicolas.greneche by PAM account configuration [preauth]

I may have miss something, if you have some tips, I'll be delighted.

In appendices, I give you the configuration of sshd pam on compute nodes
and the slurm.conf :

root@magi46:~# cat /etc/pam.d/sshd
@include common-auth
account required pam_nologin.so
account required pam_access.so
account required pam_slurm_adopt.so log_level=debug5

@include common-account
session [success=ok ignore=ignore module_unknown=ignore default=bad]
pam_selinux.so close
session required pam_loginuid.so
session optional pam_keyinit.so force revoke

@include common-session
session optional pam_motd.so motd=/run/motd.dynamic
session optional pam_motd.so noupdate
session optional pam_mail.so standard noenv
session required pam_limits.so
session required pam_env.so
session required pam_env.so user_readenv=1
envfile=/etc/default/locale
session [success=ok ignore=ignore module_unknown=ignore default=bad]
pam_selinux.so open

@include common-password

root@slurmctld:~# cat /etc/slurm/slurm.conf
ClusterName=magi
ControlMachine=slurmctld
SlurmUser=slurm
AuthType=auth/munge

MailProg=/usr/bin/mail
SlurmdDebug=debug

StateSaveLocation=/var/slurm
SlurmdSpoolDir=/var/slurm
SlurmctldPidFile=/var/slurm/slurmctld.pid
SlurmdPidFile=/var/slurm/slurmd.pid
SlurmdLogFile=/var/log/slurm/slurmd.log
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmctldParameters=enable_configless

AccountingStorageHost=slurmctld
JobAcctGatherType=jobacct_gather/linux
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageEnforce=associations
JobRequeue=0
SlurmdTimeout=600

SelectType=select/cons_tres
SelectTypeParameters=CR_CPU

TmpFS=/scratch

GresTypes=gpu
PriorityType="priority/multifactor"

Nodename=magi3 Boards=1 Sockets=2 CoresPerSocket=10 ThreadsPerCore=2
State=UNKNOWN
Nodename=magi[107] Boards=1 Sockets=2 CoresPerSocket=14 ThreadsPerCore=2
RealMemory=92000 State=UNKNOWN
Nodename=magi[46-53] Boards=1 Sockets=2 CoresPerSocket=10
ThreadsPerCore=2 RealMemory=64000 State=UNKNOWN

PartitionName=MISC-56c Nodes=magi107 Priority=3000 MaxTime=INFINITE State=UP
PartitionName=COMPUTE Nodes=magi[46-53] Priority=3000 MaxTime=INFINITE
State=UP Default=YES

Thank you,

--
Nicolas Greneche
USPN
Support à la recherche / RSSI
https://www-magi.univ-paris13.fr

Brian Andrus

unread,
Apr 8, 2022, 3:43:57 PM4/8/22
to slurm...@lists.schedmd.com
Check selinux.

Run "getenforce" on the node, if it returns 1, try setting "setenforce 0"

Slurm doesn't play well if selinux is enabled.

Brian Andrus

Nicolas Greneche

unread,
Apr 8, 2022, 4:41:16 PM4/8/22
to slurm...@lists.schedmd.com
Hi Brian,

Thanks, SELinux is neither in strict or targeted mode, I'm running SLURM
on Debian Bullseye with SELinux and Apparmor disabled.

Thank you for your suggestion,

Brian Andrus

unread,
Apr 8, 2022, 4:47:09 PM4/8/22
to slurm...@lists.schedmd.com
Ok. Next I would check that the uid of the user is the same on the
compute node as the head node.

It looks like it is identifying the job, but doesn't see it as yours.

Brian Andrus

Nicolas Greneche

unread,
Apr 8, 2022, 4:56:13 PM4/8/22
to slurm...@lists.schedmd.com
Yes they are all stored in LDAP directory :

root@magi3:~# id nicolas.greneche
uid=6001(nicolas.greneche) gid=6001(nicolas.greneche)
groupes=6001(nicolas.greneche)

root@magi46:~# id nicolas.greneche
uid=6001(nicolas.greneche) gid=6001(nicolas.greneche)
groupes=6001(nicolas.greneche)

UID are consistent on the whole cluster.

Juergen Salk

unread,
Apr 8, 2022, 5:56:31 PM4/8/22
to Slurm User Community List
Hi Nicolas,

it looks like you have pam_access.so placed in your PAM stack *before*
pam_slurm_adopt.so so this may get in your way. In fact, the logs
indicate that it's pam_access and not pam_slurm_adopt that denies access
in the first place:

Apr 8 19:11:32 magi46 sshd[20542]: pam_access(sshd:account): access denied for user `nicolas.greneche' from `172.16.0.3'

Maybe the following web page is useful for you in order to setup
your PAM stack with pam_slurm_adopt:

https://slurm.schedmd.com/pam_slurm_adopt.html

--- snip ---

If you always want to allow access for an administrative group (e.g.,
wheel), stack the pam_access module after pam_slurm_adopt. A success
with pam_slurm_adopt is sufficient to allow access, but the pam_access
module can allow others, such as administrative staff, access even
without jobs on that node:

account sufficient pam_slurm_adopt.so
account required pam_access.so

--- snip ---

We did it that way and this works fine for us. There is just one
drawback, though, namely that adminisitrative users that are allowed
to access compute nodes without having jobs on them do always get
an annoying message from pam_slurm_adopt when doing so, even though
login succeeds:

Access denied by pam_slurm_adopt: you have no active jobs on this node

We've gotten used to it, but now that I see it on the web page, maybe
I'll take a look at the alternative approach with pam_listfile.so.

Best regards
Jürgen


* Nicolas Greneche <nicolas....@univ-paris13.fr> [220408 19:53]:

Nicolas Greneche

unread,
Apr 22, 2022, 2:13:29 PM4/22/22
to slurm...@lists.schedmd.com
Hi Juergen,

I found what went wrong. I forgot to specify :

PrologFlags=contain

before :

ProctrackType=proctrack/cgroup

My bad, it is specified in the documentation here :

https://slurm.schedmd.com/pam_slurm_adopt.html#SLURM_CONFIG

Many thanks for all your kind responses.
Reply all
Reply to author
Forward
0 new messages