Not a SIGNINT stop signal?

424 views
Skip to first unread message

Ashley Thornton

unread,
Apr 26, 2017, 11:30:07 PM4/26/17
to Gurobi Optimization
Hi,

I've got a problem running on a cluster node that I need to interrupt. However I want to do so gracefully. I know from using Gurobi on my local computer that if I hit CTRL-C during the MIP exploration phase it interrupts the process and returns the best solution found so far.

From my understanding CTRL-C sends a SIGINT signal to Gurobi, however when I tried to do that via the cluster management program slurm (scancel -s INT <job id>) it didn't interrupt the solve process at all.

I know from other times that when I've used scancel (default is SIGTERM) it's just killed the entire process and I've lost the best found solution.

What does the Gurobi solver actually listen for to initiate the exiting gracefully behavior I described?

Kind regards,
Ashley

Tobias Achterberg

unread,
Apr 27, 2017, 4:05:06 AM4/27/17
to gur...@googlegroups.com
As you have guessed, Gurobi listens to the SIGINT signal, using

signal(SIGINT, our_signal_handler);

But it only does so when used through the command line tool gurobi_cl. If you
call Gurobi from your own program via the library, then you can catch this
signal yourself and call GRBterminate(model) from your signal handler.

The implementation in the Python shell is a little bit different.

If you are using gurobi_cl on your SLURM cluster and sending the signal does not
work, then something weird seems to go on, which I don't understand right now.
It should work. But note that sometimes it can take a while until Gurobi reacts
to the signal because it only checks the termination flag at certain points in
the algorithm where it is possible to exit gracefully. This should usually be
often enough to interrupt almost immediately, but there are counter-examples, in
particular for very large models.


Regards,

Tobias

Renan Garcia

unread,
Apr 27, 2017, 6:59:29 AM4/27/17
to gur...@googlegroups.com
Note that some of the slurm commands don't forward the SIGINT signal to the running job by default, but you can enable this with options. For example, the -X option does this for srun (see https://computing.llnl.gov/tutorials/linux_clusters/man/srun.txt), but you could also enable this via the SLURM_DISABLE_STATUS environment variable.

--

---
You received this message because you are subscribed to the Google Groups "Gurobi Optimization" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gurobi+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ashley Thornton

unread,
May 2, 2017, 8:41:19 AM5/2/17
to Gurobi Optimization
Thanks for the reply. I had built the model through a python script and had submitted that python script as the job to the cluster. To solve the problem (for anyone who's finding this) I ssh'd onto the the actual cluster node running the job, found the pid (process ID) of the python script with 'ps ux'  and then sent a SIGINT to it with 'kill -INT <pid>'

Which worked nicely. :)

Reply all
Reply to author
Forward
0 new messages