Hi all:
We encountered a strange bug when query job history using sacct. As show below, we try to list user hpczbzt‘s job, and sacct do filter the right jobs belong to this user. But there username is displayed as phywht.
> sacct -X --user=hpczbzt --format=jobid%16,jobidraw,user,uid,partition,start,end,AllocCPUS,state%20
JobID JobIDRaw User UID Partition Start End AllocCPUS State
---------------- ------------ --------- ------ ---------- ------------------- ------------------- ---------- --------------------
9882328 9882328 phywht 6270 dgx2 2022-03-13T04:50:12 Unknown 6 RUNNING
9882330 9882330 phywht 6270 dgx2 2022-03-13T04:50:12 Unknown 6 RUNNING
9882332 9882332 phywht 6270 dgx2 2022-03-13T04:50:12 Unknown 6 RUNNING
9882335 9882335 phywht 6270 dgx2 2022-03-13T04:50:12 Unknown 6 RUNNING
9882337 9882337 phywht 6270 dgx2 2022-03-13T04:50:12 Unknown 6 RUNNING
9884211 9884211 phywht 6270 a100 2022-03-12T23:56:02 2022-03-13T00:13:43 8 CANCELLED by 6270
9884265 9884265 phywht 6270 a100 2022-03-13T00:14:22 Unknown 8 RUNNING
9884308 9884308 phywht 6270 64c512g 2022-03-13T01:18:44 2022-03-13T01:37:04 4 CANCELLED by 6270
9884413 9884413 phywht 6270 64c512g 2022-03-13T04:52:06 2022-03-13T05:59:49 40 COMPLETED
9884431 9884431 phywht 6270 a100 2022-03-13T06:09:02 2022-03-13T09:32:45 8 COMPLETED
9887011 9887011 phywht 6270 debug64c5+ 2022-03-13T11:06:44 2022-03-13T11:07:41 1 CANCELLED by 6270
The UID showed by sacct is right, and actual UID of phywht is 6272 as shown below:
> id phywht
uid=6272(phywht) gid=6272(phywht) groups=6272(phywht)
> id hpczbzt
uid=6270(hpczbzt) gid=6270(hpczbzt) groups=6270(hpczbzt)
Those 2 system accounts are both stored in ldap. Also we have checked them to be consistent on either slurmctld and slurmdbd node. What’s more, scontrol and squeue can show the right username as hpczbzt:
> scontrol show job 9884265
JobId=9884265 JobName=af_test_session
UserId=hpczbzt(6270) GroupId=hpczbzt(6270) MCS_label=N/A
Priority=519 Nice=0 Account=acct-phywht QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
……
> squeue --user=hpczbzt
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
9884265 a100 af_test_ hpczbzt R 11:43:46 1 gpu04
9882328 dgx2 repeat_V hpczbzt R 7:07:56 1 vol05
……
So is there any guess about why only sacct display the wrong username?