Automatically disabling hyperthreading on compute nodes

545 views
Skip to first unread message

Bo Langgaard Lind

unread,
Jan 8, 2021, 10:44:12 AM1/8/21
to google-cloud-slurm-discuss
Has anyone found a good way to make sure that compute nodes start up with hyperthreading disabled?

Our very own Wyatt Gorman has something going on here:

And there's some good discussions here:

-Bo

Joseph Schoonover

unread,
Jan 8, 2021, 10:52:04 AM1/8/21
to Bo Langgaard Lind, google-cloud-slurm-discuss
Hey Bo,
This marketplace solution, which branched off the schedmd/slurm-gcp repo about a year and a half ago, has an option to set up partitions with hyperthreading disabled. Enabling this option uses Wyatt's scripts to disable hyperthreading during the startup process for compute nodes. 
You can set this option during deployment, or post-deployment using the cluster-services CLI that comes with that solution.
Screenshot_2021-01-08_08-48-44.png

The content of this email is confidential and intended for the recipient specified in message only. It is strictly forbidden to share any part of this message with any third party, without a written consent of the sender. If you received this message by mistake, please reply to this message and follow with its deletion, so that we can ensure such a mistake does not occur in the future.



Dr. Joseph Schoonover

Chief Executive Officer

Senior Research Software Engineer

j...@fluidnumerics.com








--
You received this message because you are subscribed to the Google Groups "google-cloud-slurm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-slurm-discuss/d0743ba1-e85e-4f07-9da8-96ff08f7a6a2n%40googlegroups.com.

Bo Langgaard Lind

unread,
Jan 14, 2021, 5:23:48 AM1/14/21
to google-cloud-slurm-discuss
I've gotten a bit further.

On an already running compute node instance, issuing the following command as root successfully disables hyperthreading:

echo off > /sys/devices/system/cpu/smt/control

but I have not found a way to perform this step automatically upon instance creation (from the compute node image.)

Another way is to pass an option to the kernel being booted:

sed -i 's/noop/noop nosmt\=force/g' /boot/efi/EFI/centos/grub.cfg 

This step works (when inserted into scripts/startup.sh) but has the effect that the node is created dynamically, but never shows up as "ready".

Best way I've found to check for whether or not hyperthreading is enabled is to run the lscpu command and look for the "Off-line CPU(s) list:".

Thoughts on how to succeed with either method would be greatly appreciated.

Bo Langgaard Lind

unread,
Jan 14, 2021, 6:24:54 AM1/14/21
to google-cloud-slurm-discuss
Figured it out.

In /apps/slurm/current/etc/slurm.conf, the following line:

NodeName=DEFAULT CPUs=30 RealMemory=118880 State=UNKNOWN

Needs to have its CPUs cut in half, to match the non-HT nature of the compute node instances. This can either happen by manipulating setup.py, or manually.

You received this message because you are subscribed to a topic in the Google Groups "google-cloud-slurm-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-cloud-slurm-discuss/x3v8jnFr4Fs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-cloud-slurm-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-slurm-discuss/2d42ba7d-377c-496c-894a-1e718d9229d4n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages