[slurm-users] _refresh_assoc_mgr_qos_list: no new list given back keeping cached one

32 views
Skip to first unread message

joao.damas--- via slurm-users

unread,
May 15, 2024, 9:01:35 AMMay 15
to slurm...@lists.schedmd.com
Hi all,

We are doing a simple setup for a Slurm cluster (version 23.11.6). We follow the documentation and we are trying a setup still without accounting or slurmdbd. The slurm.conf is really simple:
```
ClusterName=Develop
SlurmctldHost=head

# Slurm configuration
AuthType=auth/munge
CryptoType=crypto/munge
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdLogFile=/var/log/slurm/slurmd.log
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
StateSaveLocation=/var/spool/slurmctld

# Nodes
NodeName=worker1 CoresPerSocket=2 Sockets=1 ThreadsPerCore=1
NodeName=worker2 CoresPerSocket=2 Sockets=1 ThreadsPerCore=1

# Partitions
PartitionName=develop Default=YES MaxTime=UNLIMITED Nodes="worker1,worker2"
```

When running a simple `srun sleep 10`, all works well and the log file shows:

[2024-05-15T12:34:12.741] sched: _slurm_rpc_allocate_resources JobId=1 NodeList=worker1 usec=549
[2024-05-15T12:34:22.775] _job_complete: JobId=1 WEXITSTATUS 0
[2024-05-15T12:34:22.775] _job_complete: JobId=1 done

But when creating a scrip with the same sleep command, and submiting using `sbatch test.sh`, the log shows:

[2024-05-15T12:35:39.916] _slurm_rpc_submit_batch_job: JobId=2 InitPrio=1 usec=368
[2024-05-15T12:35:40.000] error: _refresh_assoc_mgr_qos_list: no new list given back keeping cached one.
[2024-05-15T12:35:40.000] sched: JobId=2 has invalid account
[2024-05-15T12:35:40.145] sched/backfill: _start_job: Started JobId=2 in develop on worker1
[2024-05-15T12:35:50.172] _job_complete: JobId=2 WEXITSTATUS 0
[2024-05-15T12:35:50.172] _job_complete: JobId=2 done

We have the same account with the UID and GID, as said in the documentation. Looking at the function that seems to spit out that error (https://github.com/SchedMD/slurm/blob/e9f28ede27795f525e62f998cb2d40931d884e8b/src/common/assoc_mgr.c#L1952), it appears like there should be some accounting setup? We do not have slurmdbd setup and the documentation states we should test basic functionality before implementing that daemon.

Any tips? Thanks in advance.
João

--
slurm-users mailing list -- slurm...@lists.schedmd.com
To unsubscribe send an email to slurm-us...@lists.schedmd.com

J D via slurm-users

unread,
May 16, 2024, 5:16:33 AMMay 16
to slurm...@lists.schedmd.com
I figured out that the mailing list may not be appropriate for this message, so I've created a bug report instead: https://bugs.schedmd.com/show_bug.cgi?id=19894

andreas.wiedholz--- via slurm-users

unread,
Jul 15, 2024, 8:34:19 AM (17 hours ago) Jul 15
to slurm...@lists.schedmd.com
Hi João,

did you get this problem solved? I have the exact same problem and would be very interested.

Help would be greatly appreciated!

Thank you and best regards,
Andi
Reply all
Reply to author
Forward
0 new messages