Wisconsin c4130-110433 becomes unresponsive frequently

23 views
Skip to first unread message

Wei Luo

unread,
Nov 30, 2022, 4:52:48 PM11/30/22
to cloudlab-users
Hi,

I have the same issue stated in https://groups.google.com/g/cloudlab-users/c/Or-wYwEY5xM/m/KvUzmKbMFAAJ and reboot is required every 10 - 15min.
and I have tried all the network settings including both NetworkManager-wait-online.service and sudo systemctl disable NetworkManager but it is still unresponsive frequently. I previously have the same issue with Clemson c4130 (where I asked yesterday) but it is solved by disabling NetworkManager, however it is not working on the current Wisconsin Node. Thank you!

Here's my node information: c4130-110433


David M Johnson

unread,
Nov 30, 2022, 6:55:50 PM11/30/22
to cloudla...@googlegroups.com
On 11/30/22 14:52, Wei Luo wrote:
> Hi,
>
> I have the same issue stated
> in https://groups.google.com/g/cloudlab-users/c/Or-wYwEY5xM/m/KvUzmKbMFAAJ
> and reboot is required every 10 - 15min.
> and I have tried all the network settings including both
> *NetworkManager-wait-online.service* and *sudo systemctl disable
> NetworkManager* but it is still unresponsive frequently. I previously have
> the same issue with Clemson c4130 (where I asked yesterday) but it is
> solved by disabling NetworkManager, however it is not working on the
> current Wisconsin Node. Thank you!

Unfortunately, there was nothing significant on the console, and the
console was unresponsive. I did a hard power off ; wait ; on, and it is
now booting up. Were you using all four GPUs? Maybe this is the
instability problem Leigh mentioned.

David

Wei Luo

unread,
Nov 30, 2022, 6:57:20 PM11/30/22
to cloudlab-users
Hi David,

Thank you for the reply, but I was only using 1 GPU.

Best,
Wei

Wei Luo

unread,
Nov 30, 2022, 8:46:11 PM11/30/22
to cloudlab-users
Hi David,

It is currently still unresponsive, I didn't try to login the node until now, but currently it is already unresponsive even when I don't login and use it.

Best,
Wei

Mike Hibler

unread,
Nov 30, 2022, 9:57:13 PM11/30/22
to cloudla...@googlegroups.com
There is nothing obviously wrong with it. I ran the diagnostics I could
remotely. We will have to schedule it out of service and have someone at
Wisconsin look at it closer.

I do notice you are running an X server on the graphics hardware. You
probably want to uninstall that. Probably not a problem, but who knows
what it might be suspending or tweaking on its own.
> --
> You received this message because you are subscribed to the Google Groups
> "cloudlab-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to cloudlab-user...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/
> cloudlab-users/6752e7b0-3fb8-41ce-92a8-ef25f6507404n%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages