Wisconsin Cluster Down?

48 views
Skip to first unread message

Jaylen Wang

unread,
Dec 21, 2025, 11:33:13 AM12/21/25
to cloudlab-users
Hi Cloudlab Admins,

I was having trouble reaching nodes in my experiment from the Wisconsin cluster (experiment: https://www.cloudlab.us/status.php?uuid=0534976b-dceb-11f0-bc80-e4434b2381fc). Their statuses are unknown and unreachable. On the reservation page as well for me, it says the cluster is currently unavailable (I've attached an image). Apologies if this is planned (though I was unable to find information about any planned outage in Wisconsin for today). Screenshot 2025-12-21 at 11.30.46 AM.png

Thanks,
Jaylen

Johannes Freischuetz

unread,
Dec 21, 2025, 2:03:30 PM12/21/25
to cloudlab-users
I am also having the same issue. I cannot ssh into my machines, and am not able to power cycle the nodes.

Thanks,
Johannes

ajma...@gmail.com

unread,
Dec 21, 2025, 3:14:38 PM12/21/25
to cloudlab-users
Hi Jaylen, Johannes,

This was not planned (at least to my knowledge).  I'm not sure what happened, but I am trying to find out more information from our contacts at Wisconsin.  From what I can make of things, there was a power outage around 10 AM CST that lasted until around 11:20 AM CST.  While many nodes appear to have come back following this, there are some that did not.  For example, Johannes, your node did not come back up and I am unable to reach it even over the management interface.  Jaylen, ditto for your node0 in your experiment, but your node1 is completely reachable.  I'll send an update here once I have more information, and apologies for the inconvenience.

Best,
 - Aleks 

ajma...@gmail.com

unread,
Dec 22, 2025, 12:41:13 AM12/22/25
to cloudlab-users
Just as an update, there are still a handful of switches that have not come back up and require in-person investigation.  We hope to have somebody there tomorrow to bring them back online.  This appears to be impacting access to at least some of the c220g5 nodes as well as some experiment net connectivity.  The switch state on the rest of the switches should have been restored this afternoon.

Johannes Freischuetz

unread,
Dec 22, 2025, 1:13:24 AM12/22/25
to cloudlab-users
My node is back up, but my drive seems to be reset back to the original image. Is this related?
If it isn't, is there any way to recover this?

Mike Hibler

unread,
Dec 22, 2025, 9:00:15 AM12/22/25
to 'Johannes Freischuetz' via cloudlab-users
At around 14:34 CST yesterday, a "reload" operation was issued.
Since the node was inaccessible at that time, it did not actually happen
until the switch connecting it to the infrastructure (the control net)
came back up at around 23:48 last night. I see nothing in the infrastructure
logs that would have trigged a reload, did you do it while you were trying
to bring the node back to life?

A reload issued from within an experiment will only clobber the boot disk,
so your filesystem on the second disk is intact. But you home directory on
the boot disk was reset.
> Wisconsin for today). Screenshot 2025-12-21 at 11.30.46 AM.png
>
> Thanks,
> Jaylen
>
> --
> You received this message because you are subscribed to the Google Groups
> "cloudlab-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to cloudlab-user...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/cloudlab-users/
> 2c169bab-b4f4-449b-9bd7-d771a3d46957n%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages