[slurm-users] Error " slurm_receive_msg_and_forward: Zero Bytes were transmitted or received"

1,858 views
Skip to first unread message

Gestió Servidors

unread,
Nov 30, 2021, 6:24:37 AM11/30/21
to slurm...@lists.schedmd.com

Hello,

 

In last days, my nodes are showing error “slurm_receive_msg_and_forward: Zero Bytes were transmitted or received”. After reviewing all configuration, I have notice that problem is the time difference between nodes and server. If nodes are “bad” configured (time in the future or in the past respect to the server), then, slurmd daemon starts but user can’t run “squeue” or “sinfo”. After executing “date MMYYhhmm” (with the server hour) and, also, “hwclock --systohc” in each node, slurmd daemons runs perfectly in each node and user can submit jobs or get the queues info.

 

I know I can use “ntpd” or similar, but I don’t know why, when I configure my slurmctld server as a NTP server, it can share its date/time but when nodes tries to syncronize with it, stratum shows value 16, so nodes couldn’t syncronize...

 

My question is: is there any configuracion parameter to allow that SLURM works fine regardless of the time/date of the server?

 

Thanks.

Nicolas Greneche

unread,
Nov 30, 2021, 6:43:05 AM11/30/21
to Slurm User Community List
Hi,

I had the same issue with ntpd. My ntp service on clients did not synchronize because the drift with the ntp server was too large.

Maybe you can synchronize with ntpdate before using ntp service on your clients.

Regards,

Gestió Servidors

unread,
Dec 1, 2021, 8:51:26 AM12/1/21
to nicolas....@univ-paris13.fr, slurm...@lists.schedmd.com

Hi,

 

I can’t syncronize before with “ntpdate” because when I run “ntpdate -s my_NTP_server”, I only received message “ntpdate: no server suitable for synchronization found”…

 

Thanks.--

Daniel Ruiz Molina
Tècnic Mitjà Informàtic

Arquitectura de Computadors i Sistemes Operatius
Escola d'Enginyeria

Edifici Q - Despatx QC/3052 - Carrer de les Sitges
Campus de la UAB · 08193 Bellaterra
(Cerdanyola del Vallès) · Barcelona · Spain

+34 93 581 35 44
www.uab.cat
Daniel Ruiz at UAB

 

Aquest missatge s'adreça exclusivament a la persona destinatària i pot contenir informació privada o confidencial. Si l'heu rebut per error, comuniqueu-nos-ho i destruïu-lo, i tingueu present que no teniu autorització per fer-ne cap ús.

Abans d'imprimir aquest missatge penseu en el medi ambient.

 

 

Christopher Samuel

unread,
Dec 1, 2021, 11:28:32 PM12/1/21
to slurm...@lists.schedmd.com
On 12/1/21 5:51 am, Gestió Servidors wrote:

> I can’t syncronize before with “ntpdate” because when I run “ntpdate -s
> my_NTP_server”, I only received message “ntpdate: no server suitable for
> synchronization found”…

Yeah, you'll need to make sure your NTP infrastructure is working first.
There is useful information (including NTP background info) here:

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/system_administrators_guide/ch-configuring_ntp_using_ntpd

and for chronyd (rather than ntpd):

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/system_administrators_guide/ch-configuring_ntp_using_the_chrony_suite

All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Reply all
Reply to author
Forward
0 new messages