Hi,
I am trying to perform the parameter tunning of my algorithm with IRACE at a cluster from Compute Canada, then I must adapt IRACE to the batch mode, where the script "target-runner-slurm" must send jobs to the queue and the file "target-evaluator" evaluates the results obtained.
When IRACE is executed in the slurm environment IRACE sends a group of jobs to the queue, when the jobs are done, then IRACE analyzes the results obtained with the jobs and then sends another group of jobs to the queue. However, when I run IRACE, at some point of the execution IRACE stops sending jobs, I mean, at some point of the execution of IRACE, IRACE sends a group of jobs, but after these jobs are finished then IRACE does not send the next group of jobs, and IRACE stops, no more advance is produced.
The main problem is that I do not get an error message, IRACE just stops sending jobs and stops running.In order to find the cause of this problem, I have done the following:
*I checked the processes that are currently running, in order to see if the scripts "target-runner-slurm" or "target-evaluator" are running in an infinite loop, however, these scripts are not running.
*When I execute IRACE, I store all the files *.stdout, I checked them and all contains the result obtained with my algorithm.
*When I execute IRACE, I store all the file *.stderr, all these files contain the text "Picked up JAVA_TOOL_OPTIONS: -Xmx2g", because when I run my algorithm (I implemented my algorithm with Java) then this text is printed in the terminal.
*When a job is executed, then the system produces a file "slurm-<Job ID>.txt", I checked all the slurm files produced when I execute IRACE, where each slurm file correspond to a job snet and performed with IRACE. I checked these files and all containst the text "OK"
I could not detect where the problem lies. What do you think can be the problem?