problem with checkForRemoteErrors(val)

39 views
Skip to first unread message

Anisha Maharani

unread,
Dec 12, 2024, 11:51:18 AM12/12/24
to The irace package: Iterated Racing for Automatic Configuration
Hello,

I was running irace with parallelization (parallel = 20) using a personal computer (24 physical cores 32 logical processors, 128GB RAM),  and I encountered problem as follows:

Error in checkForRemoteErrors(val) :
  3 nodes produced errors; first error: == irace == The output of targetRunner should not be more than two numbers!
== irace == The call to targetRunner was:
C:/Users/DOCUME~1/irace/IRACE_~1/target-runner.bat 71 33 259406559 C:/Users/DOCUME~1/irace/IRACE_~1/trainingInstances/053_p100_e0.3_r1_m0.8_3_h.txt  --alphaMin88 --alphaMax307
== irace == The output was:
The system cannot find the file cfiles\c71-33-259406559.stdout.
ECHO is off.
== irace == This is not a bug in irace, but means that something failed when running the command(s) above or they were terminated before completion. Try to run the command(s) above from the execution directory 'C:/Users/DOCUME~1/irace/IRACE_~1/AlgoReduceTest' to investigate the issue. See also Appendix B (targetRunner troubleshooting checklist) of the User Guide (https://cran.r-project.org/package=irace/vignettes/irace-package.pdf).
>

This is what I put in the targetrunner.bat to set the stdout & stderr:

SET "stdout=cfiles\c%config_id%-%instance_id%-%seed%.stdout"
SET "stderr=cfiles\c%config_id%-%instance_id%-%seed%.stderr"

I have the impression that several nodes in the parallelization trying to access the same stdout file at the same time. Could you help me with this error? 

Many thanks in advance.

Best regards,
Anisha

Anisha Maharani

unread,
Dec 12, 2024, 12:02:48 PM12/12/24
to The irace package: Iterated Racing for Automatic Configuration
I forgot to mention another question. I read in the newest version of the user guide that apparently we cannot print the call debugLevel = 2 (or more) for parallelization. 
Indeed, I would like to understand what is happening at each iteration of the race (for example how many nodes/cores are used, and which configuration and instance that are tested for each), is there any way to have this?

You mention in the user guide as follows: " If you need to understand how irace calls targetRunner when running in parallel, you can implement a logging mechanism able to handle parallelism directly inside the targetRunner", but to be honest I don't have any idea how to do so. Could you give a more guidance on this?

Thank you.

Best regards,
Anisha 

Manuel López-Ibáñez

unread,
Dec 12, 2024, 4:52:03 PM12/12/24
to The irace package: Iterated Racing for Automatic Configuration
Hi Anisha,

The combination of config_id, instance_id and seed should be unique for one run of irace. If you target-runner is using those values to create unique files, then it cannot happen that multiple calls to target-runner are trying to access the same stdout file. But you can check that yourself, by testing if the file exists already before writing to it in the target-runner (you may need to cleanup any files left by a previous run of irace).

As the error message says, this is not a bug in irace but a problem with either target-runner.bat or the software called by target-runner.bat. You need to run target-runner.bat outside irace and debug it. The error printed by target-runner.bat "The system cannot find the file cfiles\c71-33-259406559.stdout." suggests that target-runner.bat is trying to read this file but it cannot find it, maybe because something deleted it or because the command responsible to create it, never did. I cannot help you further because the error is not in irace. 

I also do not know how to write BAT files for Windows. The example provided with irace (https://github.com/MLopez-Ibanez/irace/blob/master/inst/templates/windows/target-runner.bat) was contributed by another user. Please note that you do not need to use BAT for the target-runner in Windows, you can use Python, R, C/C++, ... anything that can be executed as a command from the windows shell.

Best wishes,

Manuel.

Manuel López-Ibáñez

unread,
Dec 12, 2024, 5:34:30 PM12/12/24
to The irace package: Iterated Racing for Automatic Configuration
The number of cores used will vary during the run of irace. Sometimes irace will have to wait until the last launched configuration finishes to decide what to launch next, thus using only one core. Using to the futures package (https://future.futureverse.org/) within irace will improve its parallelism but I don't have the free time to take on such project. Contributions by motivated users are welcome.

I am not sure how you can know which configuration/instance pair is running in each core. The operating system decides how to schedule the running processes and it may move them from one core to another while they are running.

If you mean keeping track of which pairs of configuration/instance are running and which ones have completed, you can track that easily by creating a unique file ".stderr" (like the example target-runner.bat does) and writing something like "Started running" just after creating it and "DONE" just before exiting. You can also print the current date/time before each message (like irace does). I don't know how you do that on Windows but it should be possible (it is trivial to do on Linux). If you find a way to write to the same file from parallel processes without messing up the content of the file, then you could just use a single log file to track start/done (this is very easy to do on Linux  but I have no idea how it works on Windows).

I hope the above is helpful!

Best wishes,

Manuel.

Anisha Maharani

unread,
Dec 24, 2024, 9:49:28 AM12/24/24
to The irace package: Iterated Racing for Automatic Configuration
Hi Manuel,

Many thanks for your answer.

I tried to find the cause of the error and tried many different solutions, but it still did not work. I tried to clean up the cfiles folder for every irace run, but it did not work either.
It is strange to me since the corresponding "cfiles\c71-33-259406559.stdout" exists in the folder, gives a correct output, and has never been deleted by any means. Moreover, the corresponding stderr file did not return any error. 
I tried to run the same target-runner in a cluster (single node, multiple cores), and it works perfectly fine. So, I guess there are other problems that I cannot comprehend when I run it on my personal computer.

Thank you also for your answer regarding the logout file, I managed to work on it.

Since I am now working on a computer cluster (slurm), I have a question regarding the implementation of mpi to run irace. I need to use several computer nodes because due to the limited time per user allowed in the cluster, I need to do high parallelization (i.e., more than the max number of CPUs on a single node). So, the cluster administrator suggested I use mpi for this. I managed to follow and run the example that is written here: https://docs.alliancecan.ca/wiki/R#Rmpi to test how to work with MPI and Rscript in the cluster, but I don't know how to extend this knowledge to implement irace.

What I do now to run irace in in the cluster with a single node, multiple cores is to create a bash file that calls my Rsript in which I run irace from the scenario file. As in the example of "slurm" in irace documentatin.
But, then I do not have any idea on how to extend this when I want to use mpi for several computer nodes in the cluster.  To be honest, I am having trouble understanding the command in "parallel-irace-mpi". Do I need to implement this command on the bash file or on the specified target runner? Moreover, is it correct if I reference myself to "parallel-irace-mpi"? Or should I use other examples in this case?

I hope my questions are clear enough. Any feedback from you would be very much appreciated.

Thank you again and hope you have a nice holiday season!

Best regards,
Anisha 

Manuel López-Ibáñez

unread,
Dec 24, 2024, 1:04:27 PM12/24/24
to The irace package: Iterated Racing for Automatic Configuration
Hi Anisha,

Sorry, I am not sure why you need to call an Rscript to use irace. I also do not understand what "I run irace from the scenario file" means. The scenario file is a file (or a list of options) given to irace.

Fortunately, running irace with MPI under SLURM is quite easy. In the job.sh example here: https://docs.alliancecan.ca/wiki/R#Rmpi change the last line from:

  mpirun -np 1 R CMD BATCH test.R test.txt


to:

  mpirun -np 1 ~/bin/irace --exec-dir=~/local/irace-execdir/ --parallel $((SLURM_NTASKS-1)) --mpi 1

$SLURM_NTASKS is expanded automatically to the value you give to --ntasks=. We subtract one from that value because irace itself consumes one task.

I usually add a symbolic link from the location where the irace executable is installed to ~/bin/irace. Otherwise, replace ~/bin/irace with the path to the irace executable. It will be something like: ~/local/R_libs/4.1/irace/bin/irace. How to find this path is explained in the user guide and in the quick-install intro: https://mlopez-ibanez.github.io/irace/#installing-the-irace-package

You can add additional irace options to the line above (see the user-guide), for example --scenario "tuning-scenario.txt". I added --exec-dir above but you can define the execDir in the scenario file instead. It is important that it is a location that is available for writing from the computing nodes. You may also want to redirect the output of job.sh to some file you can read later. How to do that should be explained in the documentation of your cluster. Also, make sure that you are using the same R version when installing irace than the version that you are loading within job.sh. Otherwise, you may get strange errors.

parallel-irace-mpi was designed for a different type of job system (SGE) and it is a bit more fancy than the above.

I hope the above helps. Have a nice holiday break

Best regards,

Manuel.

Anisha Maharani

unread,
Dec 24, 2024, 1:59:33 PM12/24/24
to The irace package: Iterated Racing for Automatic Configuration
Hi Manuel,

Firstly thank you for your prompt response.

I am sorry that I was not being clear before. When I tried to run irace on a single node, I followed the example here: https://github.com/MLopez-Ibanez/irace/tree/master/inst/examples/slurm . So, in my understanding, I need to have a bash file in which I call the Rscript where I define all the things needed to run irace (scenario, parameters, target runner, etc). 

OK, I think I get the main idea. Just to clarify if I run the above line of code, I don't need to call Rscript and change anything from the target-runner file? My target-runner file currently is only an adaptation from the ACOTSP example.

I am sorry for my questions as I am very new to all of this.

Thank you again for your help!

Best regards,
Anisha 

Manuel López-Ibáñez

unread,
Dec 25, 2024, 2:55:32 PM12/25/24
to The irace package: Iterated Racing for Automatic Configuration
Hi,

 irace is designed to not need to know R to use it, so you can do almost everything using the command-line 'irace' command that comes with the package. However, if you want to do everything in R, that is also possible. You just need to do the R equivalent of "--parallel $((SLURM_NTASKS-1)) --mpi " within run_irace.R. 

Best regards,

Manuel.
Reply all
Reply to author
Forward
0 new messages