node frequent goes offline

39 views
Skip to first unread message

peter.wa...@gmail.com

unread,
Aug 17, 2022, 11:42:09 PM8/17/22
to cloudlab-users
Hi admins, 
     node c240g5-110101.wisc.cloudlab.us goes down from time to time with no reason, can you help me? Thank you! 

Leigh Stoller

unread,
Aug 18, 2022, 10:02:03 AM8/18/22
to cloudla...@googlegroups.com

> Hi admins,
> node c240g5-110101.wisc.cloudlab.us goes down from time to time with no reason, can you help me? Thank you!

Hi. We are looking at the node. Looks like it locked up and we are
not able to get it back into a working state. Stay tuned for more
info.

Thanks
Leigh


Leigh Stoller

unread,
Aug 18, 2022, 12:47:39 PM8/18/22
to cloudla...@googlegroups.com

> Hi admins,
> node c240g5-110101.wisc.cloudlab.us goes down from time to time with no reason, can you help me? Thank you!

Hi. This node is back online. Can you start up whatever it was you
were doing again so we can watch it?

Is whatever you are doing GPU intensive?

Thanks
Leigh


Juncheng Yang

unread,
Aug 18, 2022, 2:19:38 PM8/18/22
to cloudla...@googlegroups.com
Thank you, Leigh! It happened when I was trying to install cudnn library, I will try to do this again.
> --
> You received this message because you are subscribed to a topic in the Google Groups "cloudlab-users" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/cloudlab-users/rfYaBiKzKD8/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to cloudlab-user...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/cloudlab-users/E8F63CC6-C81B-4482-B473-56B06DAA3328%40gmail.com.

peter.wa...@gmail.com

unread,
Aug 18, 2022, 2:31:17 PM8/18/22
to cloudlab-users
I just tried to log in, but it seems the node is still down. 

Leigh Stoller

unread,
Aug 18, 2022, 2:34:18 PM8/18/22
to cloudla...@googlegroups.com

> I just tried to log in, but it seems the node is still down.

Yep, it is dead again. I have scheduled this to go out service,
we will have to run some diagnostics which we cannot do while it
is allocated to your experiment. Go ahead and terminate the
experiment please.

Thanks
Leigh

Mike Hibler

unread,
Aug 18, 2022, 2:39:12 PM8/18/22
to cloudla...@googlegroups.com
Does this only happen during/after installation of the cuda library?
Or at other times as well?
> You received this message because you are subscribed to the Google Groups
> "cloudlab-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to cloudlab-user...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/
> cloudlab-users/e3b2d2b3-c207-470f-8c89-069f9eeb9f7dn%40googlegroups.com.

Juncheng Yang

unread,
Aug 18, 2022, 2:53:26 PM8/18/22
to cloudla...@googlegroups.com
Thank you! Mike and Leigh!
The first time it happened when the node was idle after installing the cuda library.
The second time is when I was install the library.
The third time (after Leigh took it back online) it went dead before I tried to connect to it.
> To view this discussion on the web visit https://groups.google.com/d/msgid/cloudlab-users/20220818183909.GL8951%40flux.utah.edu.

Reply all
Reply to author
Forward
0 new messages