Can you say more about what you were doing in your experiment? (you can email
to
porta...@cloudlab.us if you want to take this offline).
Did you have a custom kernel or custom kernel modules? Are you using any
specific CPU features? I ask because I did a quick check of two of the nodes
and they both showed a Machine Check error while they were in your experiment.
On Sun, Mar 22, 2026 at 09:08:16PM -0700, Leonid Kondrashov wrote:
> Hello,
>
> During my experiment on the r6615 nodes at Clemson, I observed that some nodes
> rebooted at random times (experiment:
https://www.cloudlab.us/status.php?uuid=
> 029e88bc-1ed0-4e86-93a2-5ba77374a1da#). Our experimental setup doesn't support
> graceful recovery on reboot, so the reboots themself are problematic for me.
>
> ...
>
> Regards,
> Leonid
>
> --
> You received this message because you are subscribed to the Google Groups
> "cloudlab-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to
cloudlab-user...@googlegroups.com.
> To view this discussion visit
https://groups.google.com/d/msgid/cloudlab-users/
> a229ec89-e02d-4cf0-8e93-85cfd380f711n%
40googlegroups.com.