I'm facing the same issue on the following c4130 node in the Wisconsin cluster from the past few days. Machine becomes unresponsive after every 15-20 minutes. A reboot from the dashboard is necessary to bring it to usable state.
Could you please let me know if I'm missing something in configuration?
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to cloudla...@googlegroups.com
> I'm facing the same issue on the following c4130 node in the Wisconsin cluster from the past few days. Machine becomes unresponsive after every 15-20 minutes. A reboot from the dashboard is necessary to bring it to usable state.
>
> Could you please let me know if I'm missing something in configuration?
Good question. The same problem followed you to a different node at
a different cluster. :-) The thing to do at this point is tell us
what packages you installed and what config changes you made to the
node.
Thanks
Leigh
Rajesh Shashi Kumar
unread,
Apr 11, 2022, 6:18:09 PM4/11/22
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to cloudlab-users
Thank you for the quick reply. I only installed CUDA on top of the provided RSPEC:
RSPEC used: # Import the Portal object. import geni.portal as portal # Import the ProtoGENI library. import geni.rspec.pg as pg # Import the Emulab specific extensions. import geni.rspec.emulab as emulab
# Create a portal object, pc = portal.Context()
# Create a Request object to start building the RSpec. request = pc.makeRequestRSpec()
# Print the generated rspec pc.printRequestRSpec(request)
Thanks, Rajesh
Leigh Stoller
unread,
Apr 11, 2022, 6:19:09 PM4/11/22
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to cloudla...@googlegroups.com
> On Apr 11, 2022, at 3:18 PM, 'Rajesh Shashi Kumar' via cloudlab-users <cloudla...@googlegroups.com> wrote:
>
> Thank you for the quick reply. I only installed CUDA on top of the provided RSPEC:
>
The fix is either to `systemctl disable NetworkManager` before installing.
I still encounter the same issue. Please let me know if I am referring to the correct workaround.
Thanks, Rajesh
Mike Hibler
unread,
Apr 12, 2022, 12:40:25 AM4/12/22
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to 'Rajesh Shashi Kumar' via cloudlab-users
I have the console working again for this particular node. So if it hangs
up again, see if you can connect to the console and get a login prompt.
> --
> You received this message because you are subscribed to the Google Groups
> "cloudlab-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to cloudlab-user...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/ > cloudlab-users/945a8edf-39d1-45dd-929a-a1130de43aebn%40googlegroups.com.
Rajesh Shashi Kumar
unread,
Apr 12, 2022, 1:51:58 AM4/12/22
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to cloudlab-users
Hi,
It is unresponsive again. Trying to connect to console from CloudLab dashboard does not seem to work. Leaving it without a reboot for now in case it helps.
Just to double check, here's what I had done: sudo -s
systemctl disable NetworkManager
<install CUDA>
Thank you for your time, Rajesh
Mike Hibler
unread,
Apr 12, 2022, 10:05:42 AM4/12/22
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to 'Rajesh Shashi Kumar' via cloudlab-users
Are you using all four GPUs on the Wisconsin node? Maybe you should try
using only one or two and see what happens. It is possible there is a thermal
issue. I don't see the network manager running, so I assume it is correct.