[slurm-users] why sacct display wrong username while the UID is right?

359 views
Skip to first unread message

taleint...@sjtu.edu.cn

unread,
Mar 12, 2022, 11:00:44 PM3/12/22
to slurm...@lists.schedmd.com

Hi all:

 

We encountered a strange bug when query job history using sacct. As show below, we try to list user hpczbzt‘s job, and sacct do filter the right jobs belong to this user. But there username is displayed as phywht.

 

> sacct -X --user=hpczbzt --format=jobid%16,jobidraw,user,uid,partition,start,end,AllocCPUS,state%20

           JobID JobIDRaw          User    UID  Partition               Start                 End  AllocCPUS                State

---------------- ------------ --------- ------ ---------- ------------------- ------------------- ---------- --------------------

         9882328 9882328         phywht   6270       dgx2 2022-03-13T04:50:12             Unknown          6              RUNNING

         9882330 9882330         phywht   6270       dgx2 2022-03-13T04:50:12             Unknown          6              RUNNING

         9882332 9882332         phywht   6270       dgx2 2022-03-13T04:50:12             Unknown          6              RUNNING

         9882335 9882335         phywht   6270       dgx2 2022-03-13T04:50:12             Unknown          6              RUNNING

         9882337 9882337         phywht   6270       dgx2 2022-03-13T04:50:12             Unknown          6              RUNNING

         9884211 9884211         phywht   6270       a100 2022-03-12T23:56:02 2022-03-13T00:13:43          8    CANCELLED by 6270

         9884265 9884265         phywht   6270       a100 2022-03-13T00:14:22             Unknown          8              RUNNING

         9884308 9884308         phywht   6270    64c512g 2022-03-13T01:18:44 2022-03-13T01:37:04          4    CANCELLED by 6270

         9884413 9884413         phywht   6270    64c512g 2022-03-13T04:52:06 2022-03-13T05:59:49         40            COMPLETED

         9884431 9884431         phywht   6270       a100 2022-03-13T06:09:02 2022-03-13T09:32:45          8            COMPLETED

         9887011 9887011         phywht   6270 debug64c5+ 2022-03-13T11:06:44 2022-03-13T11:07:41          1    CANCELLED by 6270

 

The UID showed by sacct is right, and actual UID of phywht is 6272 as shown below:

 

> id phywht

uid=6272(phywht) gid=6272(phywht) groups=6272(phywht)

> id hpczbzt

uid=6270(hpczbzt) gid=6270(hpczbzt) groups=6270(hpczbzt)

 

Those 2 system accounts are both stored in ldap. Also we have checked them to be consistent on either slurmctld and slurmdbd node. What’s more, scontrol and squeue can show the right username as hpczbzt:

 

> scontrol show job 9884265

JobId=9884265 JobName=af_test_session

   UserId=hpczbzt(6270) GroupId=hpczbzt(6270) MCS_label=N/A

   Priority=519 Nice=0 Account=acct-phywht QOS=normal

   JobState=RUNNING Reason=None Dependency=(null)

……

> squeue --user=hpczbzt

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)

           9884265      a100 af_test_  hpczbzt  R   11:43:46      1 gpu04

           9882328      dgx2 repeat_V  hpczbzt  R    7:07:56      1 vol05

……

 

So is there any guess about why only sacct display the wrong username?

Rémi Palancher

unread,
Mar 18, 2022, 4:21:34 AM3/18/22
to Slurm User Community List
Hi,

Le dimanche 13 mars 2022 à 04:59, <taleint...@sjtu.edu.cn> a écrit :

> Hi all:
>
> […]
>
> So is there any guess about why only sacct display the wrong username?

I guess sacct reports the username as found in cluster_assoc_table of SlurmDBD database, linked to cluster_job_table through the id_assoc field. There might not be NSS resolution in the output.

Did the UID of phywht change over time? That would explain why the jobs are associated to this user in the SlurmDBD database.

--
Rémi Palancher
Rackslab: Open Source Solutions for HPC Operations
https://rackslab.io


Reply all
Reply to author
Forward
0 new messages