You should determine whether the node hangs or whether the network is
just taken down. Try logging into the node console and see if it is
responsive.
This thread:
https://groups.google.com/g/cloudlab-users/c/FnKBA1eHgBg
talks about the two most common causes for a node to seemingly lock up.
One is explicitly or implicitly installing NetworkManager which interferes
with our network configuration scripts. The other is explicitly or implicitly
installing nvidia packages intended for a desktop environment that attempt
to suspend the processor.
If you current experiment on Utah Cloudlab cluster is the same profile, then
it does appear that NetworkManager is installed.
BTW, I notice that you are running a web server. If you do this you should
ensure that it is not listening on the Internet-facing control network
interface or, if that is needed, that you have strong credentials on any
web-based services.
> --
> You received this message because you are subscribed to the Google Groups
> "cloudlab-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to
cloudlab-user...@googlegroups.com.
> To view this discussion visit
https://groups.google.com/d/msgid/cloudlab-users/
> acc92574-47b5-42c4-a9ae-7aa04c0b654fn%
40googlegroups.com.