Node stops accepting SSH connections nor POWDER Top Processes/Shell

38 views
Skip to first unread message

Ryan W. West

unread,
Jun 3, 2022, 12:24:06 AM6/3/22
to Powder Users
Hi,

Somehow I'm getting some of my nodes into a strange state where I can no longer SSH into them unless I power cycle the node on POWDER's experiment page. Not even the embedded Shell nor 'Top Processes' toolbar options work within the experiment; for the shell, it says "No existing session". Would anyone know how to solve this?

Interestingly, this eventually seems to happen when I run a command with 'sudo'. Doing that causes all further input via SSH to fail for that session, though any background processes that were already sending output to stdout continue to send it. At this point, I can't SSH in anywhere (doing so with the verbose option enabled gives this precise issue, which suggests the server is malfunctioning: https://superuser.com/questions/1374076/what-does-it-mean-if-ssh-hangs-after-connection-established). Existing SSH sessions that were already open to the node in question continue to function properly until I try to use a command with sudo or su, and then the same thing happens. 

I've used POWDER a lot and have never had an issue with sudo before, but I'm now also maintaining 4+ simultaneous ssh sessions per server and running lots of programs on them at once. Still, RAM/CPU doesn't seem high enough for something to go majorly wrong.

The server console log (which I could still open) is here, if it stays alive: https://www.emulab.net/spewconlog.php3?node_id=pc08-fort&key=5223796f81b6e6b9b16de4855b026259014ffdc8

Any help is appreciated. Thanks!

Ryan West

Ryan W. West

unread,
Jun 3, 2022, 12:27:29 AM6/3/22
to Powder Users
Looks like that log link doesn't work, so here's the downloaded log.
spewconlog.txt

Leigh Stoller

unread,
Jun 3, 2022, 8:20:24 AM6/3/22
to powder...@googlegroups.com

> Any help is appreciated. Thanks!

Hi. How about telling us the experiment (a link to the status page)
and the node that is causing problems. :-)

Also tell us what packages you have installed, did you upgrade the
kernel, etc.

Thanks
Leigh


Ryan W. West

unread,
Jun 3, 2022, 2:49:10 PM6/3/22
to Powder Users

Yes, thanks. I had to create a new experiment since the last expired, this one is good for the rest of today and I can extend it if needed. https://www.powderwireless.net/status.php?uuid=42c8b9b5-e34a-11ec-b318-e4434b2381fc. node-1 is currently in this state and cannot have ssh connected. I don't think the kernel is upgraded... the profile is here: https://gitlab.flux.utah.edu/ryanwwest/d5g

In this case, the supposed state where new ssh sessions won't connect and any use of 'sudo' locks down the existing session occurred when I ran this command on an ssh session: "cd /local/repository/bin/ && sudo bash gen-configs.sh; sudo bash start-gnb.sh".  (The scripts called here first run some python scripts then start an OAI gNB with command "cd /z/openairinterface5g/cmake_targets/ && sudo RFSIMULATOR=server ./ran_build/build/nr-softmodem --rfsim --sa -O /local/repository/etc/gnb.conf". But this isn't particularly important as other commands seem to invoke this state as well). When running the above command, my gNB simulator program started and ran, but the following sudo command broke the ssh (without explicitly closing the connection).

Docker is also installed with some other OAI 5G Core docker images like spgwu-tiny, but I don't think any were running at that time. They were previously running on this system. Also running are a few python Flask API programs and Hyperledger Sawtooth validator blockchain programs.

Leigh Stoller

unread,
Jun 3, 2022, 3:05:42 PM6/3/22
to powder...@googlegroups.com

> Yes, thanks. I had to create a new experiment since the last expired, this one is good for the rest of today and I can extend it if needed. https://www.powderwireless.net/status.php?uuid=42c8b9b5-e34a-11ec-b318-e4434b2381fc. node-1 is currently in this state and cannot have ssh connected. I don't think the kernel is upgraded... the profile is here: https://gitlab.flux.utah.edu/ryanwwest/d5g.

Yes it is hung. After a reboot, sudo works fine, I think that sudo is
just hanging for you after something else (in the network stack) goes
bad. I noticed that upon reboot, the load immediately spikes to 20 while
a bunch of compiler processes run, but I do not know anything about all
the Powder stuff that gets installed.

Gonna have to pass this off to someone who knows all the Powder
related packages and docker.

Leigh

Reply all
Reply to author
Forward
0 new messages