[slurm-users] slurmrestd connect to 192.168.87.113:6819 Connection refused

59 views
Skip to first unread message

shaobo liu via slurm-users

unread,
Apr 12, 2024, 5:36:42 AMApr 12
to slurm...@lists.schedmd.com
hi,slurm configured primary and secondary,The error when requesting slurmrest api is as follows, may I ask what is the reason?

# scontrol ping
Slurmctld(primary) at node003 is UP
Slurmctld(backup) at node113 is UP


# systemctl status slurmrestd.service
● slurmrestd.service - Slurm REST daemon
     Loaded: loaded (/lib/systemd/system/slurmrestd.service; enabled; vendor preset: enabled)
     Active: active (running) since Fri 2024-04-12 17:07:08 CST; 21min ago
   Main PID: 705425 (slurmrestd)
      Tasks: 21 (limit: 629145)
     Memory: 20.3M
     CGroup: /system.slice/slurmrestd.service
             └─705425 /usr/sbin/slurmrestd -f /etc/slurm/slurm.conf unix:/var/spool/slurm/slurmrestd.socket 0.0.0.0:6820 -vvv

Apr 12 17:08:46 node003 slurmrestd[705425]: debug2: _slurm_connect: failed to connect to 192.168.87.113:6819: Connection refused
Apr 12 17:08:46 node003 slurmrestd[705425]: debug2: Error connecting slurm stream socket at 192.168.87.113:6819: Connection refused
Apr 12 17:08:46 node003 slurmrestd[705425]: slurmrestd: error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:node113:6819: Connection refused
Apr 12 17:08:46 node003 slurmrestd[705425]: slurmrestd: error: Sending PersistInit msg: Connection refused
Apr 12 17:08:46 node003 slurmrestd[705425]: slurmrestd: error: slurm_rest_auth_p_get_db_conn: unable to connect to slurmdbd: Connection refused
Apr 12 17:08:46 node003 slurmrestd[705425]: slurmrestd: error: init_connection[v0.0.39]:[[2.0.1.191]:50652] rc[7000]=Unable to connect to database -> openapi_get_db_conn() failed to open slurmdb connecti>
Apr 12 17:08:46 node003 slurmrestd[705425]: error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:node113:6819: Connection refused
Apr 12 17:08:46 node003 slurmrestd[705425]: error: Sending PersistInit msg: Connection refused
Apr 12 17:08:46 node003 slurmrestd[705425]: error: slurm_rest_auth_p_get_db_conn: unable to connect to slurmdbd: Connection refused
Apr 12 17:08:46 node003 slurmrestd[705425]: error: init_connection[v0.0.39]:[[2.0.1.191]:50652] rc[7000]=Unable to connect to database -> openapi_get_db_conn() failed to open slurmdb connection

Nico Derl via slurm-users

unread,
Apr 12, 2024, 8:20:18 AMApr 12
to shaobo liu, Slurm Users
Hey,
Are slurmctrld and restd on separate machines? Can you manually reach them? Could there be a firewall/closed port in the way?


12. Apr. 2024, 11:36 von slurm...@lists.schedmd.com:

shaobo liu via slurm-users

unread,
Apr 14, 2024, 11:01:30 PMApr 14
to nico...@tutanota.com, slurm...@lists.schedmd.com
Thanks, The reason was found. It was caused by the expiration of the rest api token.

<nico...@tutanota.com> 于2024年4月12日周五 22:56写道:
If you say DBd isn't using 6819, in the sense that you selected a different port, make sure the dbdport directive reflects that in both slurmdbd.conf and AccountingStoragePort in slurm.conf.
It must be getting the 6819 from somewhere.


12. Apr. 2024, 16:05 von dspa...@gmail.com:
slurmctrld and rest are on the same machine, No firewall.  secondary slurmdbd is background mode, slurmdbd does not listen on port 6819.

OS: ubuntu 20.04
SLURM: 23.11.0

<nico...@tutanota.com> 于2024年4月12日周五 20:18写道:
Reply all
Reply to author
Forward
0 new messages