[slurm-users] Unexpected missing socket error

8 views
Skip to first unread message

Ozeryan, Vladimir via slurm-users

unread,
Oct 6, 2025, 11:37:39 AMOct 6
to slurm...@lists.schedmd.com

Hello everyone,

 

Not sure if you guys have heard this tune already but did anyone come across a solution for “Unexpected missing socket error”.
There is nothing useful in the logs but the message appears on compute nodes and slurm controller node.

 

Thank you,

 

Vlad Ozeryan

AMDS – AB1 Linux-Support

Vladimir...@jhuapl.edu

Ext. 23966

 

Tilman Hoffbauer via slurm-users

unread,
Oct 6, 2025, 12:23:34 PMOct 6
to slurm...@lists.schedmd.com

Hello,

we had this issue previously - it was connected to timeouts, where the socket disappeared due to a timeout before a reply could be sent back. In our case this was caused by having link-local multicast name resolution (LLMNR) on by default in systemd-resolved, which was evident by slow calls to `getent hosts <hostname>`.

Hope this helps,
Tilman Hoffbauer

John Hearns via slurm-users

unread,
Oct 7, 2025, 9:34:02 AMOct 7
to Tilman Hoffbauer, slurm...@lists.schedmd.com
As usual  - everything is a DNS (name resolution problem.

If your data centre is on fire, then your DNS server is burning. So it is a DNS problem.

--
slurm-users mailing list -- slurm...@lists.schedmd.com
To unsubscribe send an email to slurm-us...@lists.schedmd.com
Reply all
Reply to author
Forward
0 new messages