Scripts executed by "(Re)run Espec" button are terminated when jFed GUI is closed

4 views
Skip to first unread message

Erik Pohle

unread,
Mar 25, 2022, 5:17:54 AM3/25/22
to fed4fire-ex...@googlegroups.com
Hello,

I'm using the jFed GUI to reserve some computation nodes connected in a
network. I am then using the (Re)run Espec button to upload my
experiment, e.g. some binaries and scripts that execute and time them.
This works well and is much more convenient than uploading everything
manually (and faster as well).

However, I noticed that the script execution is terminated once I close
the jFed GUI. So if I have a long running experiment that I want to run
over night, I have to keep my PC with the jFed GUI running as well.

Is there an option (similarly to recover experiment) for Espec started
from the (Re)run Espec button?

Thanks and best regards,

Erik

Wim Van de Meerssche

unread,
Mar 27, 2022, 1:26:15 AM3/27/22
to Erik Pohle, fed4fire-ex...@googlegroups.com
Hi Erik,

jFed's ESpec are really only meant for bootstrapping. The idea is that they setup the basics needed and then hand over to an automation platform like ansible, and/or experiment orchestration scripts.
Of course, you can use jFed to do more, but you should be aware of its limitations.

In this case, the limitation is that jFed starts all scripts over an SSH connection that's made by jFed and thus gets broken when jFed closes.
Because of this, the default Unix behaviour applies: When an SSH connection breaks, the shell running in that SSH connection at the other end terminates. When a shell terminates, the HUP signal is sent to all processes that are still running in that shell. If these processes do not handle the HUP signal, they are terminated.

So there are 2 easy fixes:
- Make your scripts handle and ignore the HUP signal. Make sure you also run your scripts in the background.
- Run your scripts in a way that something else handles and ignores the HUP signal for them. Either use the "nohup" tool, or better, start your processes in tmux.

tmux is the recommended way. It not only prevents the scripts you run in it from being stopped, it also allows you to reconnect to the node later, and access the shell the scripts are running in to see their output etc. To use tmux in this way, start your command with: tmux new-session -d -s <my-session-name> "<my-command>". For example: tmux new-session -d -s demo "while true; do sleep 1; echo -n 'hello world at '; date; done"
(You can reconnect to that command's shell with: "tmux a -t demo")


Best regards,
Wim

--
You received this message because you are subscribed to the Google Groups "fed4fire-experimenters" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fed4fire-experime...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fed4fire-experimenters/9a0acb9d-8ef9-e11c-7058-88357c655de6%40esat.kuleuven.be.


--
Wim Van de Meerssche
Ghent University - imec
IDLab
iGent Tower - Department of Information Technology
Technologiepark-Zwijnaarde 126, B-9052 Ghent, Belgium
Tel: +32 9 33 14940
Email: wim.vande...@UGent.be
Web: IDLab.UGent.be
Web: IDLab.technology

Erik Pohle

unread,
Mar 28, 2022, 12:31:07 PM3/28/22
to Wim Van de Meerssche, fed4fire-ex...@googlegroups.com

Hi Wim,

Thank you very much. I guess I'll be using tmux then.

Best regards,

Erik

Reply all
Reply to author
Forward
0 new messages