Unable to connect to node through SSH

84 views
Skip to first unread message

Xinji Jiang

unread,
Feb 14, 2025, 4:09:13 PMFeb 14
to cloudlab-users
Hello,

I'm recently starting an experiment on Clemson cluster, and the SSH connection is dropping frequently about 10 minutes after everything is setup.

I got the error message

My experiment link is

Thanks,
Xinji

Leigh Stoller

unread,
Feb 14, 2025, 6:52:19 PMFeb 14
to cloudlab-users

> On Feb 14, 2025, at 1:09 PM, Xinji Jiang <jian...@purdue.edu> wrote:
>
> I'm recently starting an experiment on Clemson cluster, and the SSH connection is dropping frequently about 10 minutes after everything is setup.
>
> I got the error message
> Unable to connect to clnode299.clemson.cloudlab.us:22

Hi. Can you tell us if you have installed any software packages,
new kernel, etc.

Note that you can power cycle the node by clicking on the node
in the Topology diagram, there is a context menu with that option.

Thanks
Leigh


Xinji Jiang

unread,
Feb 15, 2025, 2:13:03 PMFeb 15
to cloudlab-users
Hello,

Here are the packages that I installed
ffmpeg
libgflags-dev
libgoogle-glog-dev
libboost-all-dev
libavcodec-dev
libavformat-dev
libswscale-dev
libdouble-conversion-dev
libfmt-dev
libevent-dev
libssl-dev
cmake
mahimahi
nginx
libnginx-mod-rtmp

I tried running it on Clemson's c6320 and r6525 and faced the same issue. I tried rebooting the node but the connection will still drop after a while.

However, I had the same environment setup on cluster other than Clemson and never faced the issue.

Another thing worth noting is that on Clemson when I'm trying to use my own disk image from emulab.net, it will always get into the error
Experiment setup on the Cloudlab Clemson cluster failed: Could not import GeniSlices/disk_image_name
I had to create another image on clemson.cloudlab.us to make it work. This problem also doesn't exist in other cluster.

Thanks,
Xinji

Mike Hibler

unread,
Feb 15, 2025, 4:04:26 PMFeb 15
to cloudla...@googlegroups.com
You should determine whether the node hangs or whether the network is
just taken down. Try logging into the node console and see if it is
responsive.

This thread:

https://groups.google.com/g/cloudlab-users/c/FnKBA1eHgBg

talks about the two most common causes for a node to seemingly lock up.
One is explicitly or implicitly installing NetworkManager which interferes
with our network configuration scripts. The other is explicitly or implicitly
installing nvidia packages intended for a desktop environment that attempt
to suspend the processor.

If you current experiment on Utah Cloudlab cluster is the same profile, then
it does appear that NetworkManager is installed.

BTW, I notice that you are running a web server. If you do this you should
ensure that it is not listening on the Internet-facing control network
interface or, if that is needed, that you have strong credentials on any
web-based services.
> --
> You received this message because you are subscribed to the Google Groups
> "cloudlab-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to cloudlab-user...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/cloudlab-users/
> acc92574-47b5-42c4-a9ae-7aa04c0b654fn%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages