Aug 21, 2020, 8:30:36 AM8/21/20
So - here is what is happening (we wanted to eliminate all external issues before we got back to you)
We have defined a axlJobIntfc custom subclass which does the following:
On submitjob we call a script with system() that launches the job on our remote cluster
getjobstatus gets called every 5 seconds, we query the status with another script (every 30 seconds)
axlJobIntfcHealthMethod has been changed to translate our cluster managers job status to "alive" or "dead" as follows:
SUBMITTED (until our scheduler starts the job) : alive
PROCESSING (job is running) :alive
We are able to launch this configuration with Cadence Virtuoso on a test solve
The job gets scheduled on our cluster, it runs
The log shows the correct statuses, and it goes from SUBMITTED to PROCESSING to COMPLETED and the remote job finishes.
However, once it completes, Cadence seems to not accept that the solve completed - it relaunches the job again and then again - totally 3 times and gives up with a warning dialog
Note that the remote job writes its output files to a commonly mounted network share, so the solvers output files are seen as if they were local.
We are not able to understand what we need to do to tell Cadence that the solve completed. Our assumption was that setting the "dead" health and prescence of the solved files should have done the trick....
What are we missing?
Thanks in advance