Problem in parallelizing irace

45 views
Skip to first unread message

Rachel Yuan

unread,
Apr 1, 2023, 11:18:26 PM4/1/23
to The irace package: Iterated Racing for Automatic Configuration
Hi there,

I am working on a project that needs parameter tuning with irace. My datasets are many (around 100 datasets), and in order to get the results faster, our team is using Compute Canada to obtain results for large datasets. I was using Rscript command to call and run irace. However, as our data are growing, the running time became longer, even beyond the allowed running time on Compute Canada. So, Rscript is no longer sufficient for our use and how to split jobs for many large datasets has become a struggle. 

As far as I know, both the parameter space and datasets to be used are fixed in an irace run. So there is no way for us to divide the parameter space or datasets for job submissions in parallel (we need to run 100 datasets altogether in one irace run). From the user guide, there are four options to parallelize irace runs. But I am not that familiar with those options.

So I wonder if there is an approach to parallelize irace for cluster jobs (Compute Canada uses slurm) in a way that, given fixed parameter space and datasets, it can produce many array jobs (10-20, for example) to run irace concurrently but still evaluating the same parameter space and the same datasets. Or is there a way to submit a few jobs to evaluate parts of parameter candidates and obtain some results back, then send another jobs to evaluate the remaining parameter candidates and produce final results, based on the first returned results? Does the batchmode or batchtools work in this case?

I am not sure if I have explained the problem clearly. If not, let me know and thank you very much for your help!

Best,
Rachel

Manuel López-Ibáñez

unread,
Apr 3, 2023, 6:48:55 AM4/3/23
to The irace package: Iterated Racing for Automatic Configuration
Hi Rachel,

I'm not sure I understand completely your setup and limitations. What does it mean to "split the parameter space" or "split the datasets"? In any case there are four main options:

The simplest form of parallelization is to execute the calls to target-runner in parallel. If you can reserve N CPUs in a single machine, then you can simply submit the job to the cluster and use the irace option "--parallel N" so that irace will evaluate multiple calls to target-runner in parallel. This means that you submit 1 job (= 1 run of irace) to the cluster but the job uses many CPUs within the same machine.

The second option is to use MPI to use CPUs from multiple machines. This means that you submit 1 job (= 1 run of irace) but the job is setup to use MPI. For this you need to install Rmpi: https://docs.alliancecan.ca/wiki/R#Rmpi (Make sure that it works). Then replace:

 R CMD BATCH test.R test.txt

with whatever way you use to call irace, maybe:

/path/to/irace/bin/irace --mpi 1 --parallel N

or 

Rscript launch_irace.R

(In launch_irace.R, you need to use the scenario options mpi=TRUE and paralllel=N)

N here is the number of CPUs to use (ideally as many as you can, at least 64 or more) and it will be the same as

#SBATCH --ntasks=N # number of MPI processes

The third option is that you run irace in the submission node using the scenario option --batchmode=slurm (or whatever system your cluster uses). You need to make sure that irace will not get killed when you logout. In this mode,  irace is the one submitting jobs to the cluster and waiting for them to finish, that is, irace will submit many jobs, wait for them to finish, then submit more jobs. For this, you need to write a target-runner similar to https://github.com/MLopez-Ibanez/irace/blob/master/inst/examples/batchmode-cluster/target-runner-slurm that submits one run of your target-runner (NOT of irace) to the cluster. You also need to write a target-evaluator (https://github.com/MLopez-Ibanez/irace/blob/master/inst/examples/batchmode-cluster/target-evaluator) that will be called when the target-runner is done, collect the results available and print the cost.

The 4th option is to write your own function targetRunnerParallel(). In this function you can use whatever R/Python packages you wish to do the parallelization. For example, you could use the package: https://mllg.github.io/batchtools/ or the package: https://future.batchtools.futureverse.org/ or something else that you find easy to work with. Irace will call the function with the list of executions it wants to do, and the function must return the results. If you use this 4th option, please let me know as we don't have an example of this in the documentation and it would be great to have one if you wish to share your code.

There are other things that you can do to reduce the problem. If your algorithm reports progress over time, you can use capping to avoid running bad configurations until the end and terminating them earlier: https://www.sciencedirect.com/science/article/pii/S0305054821003300

Best wishes,

Manuel.

Rachel Yuan

unread,
Apr 5, 2023, 7:38:44 PM4/5/23
to The irace package: Iterated Racing for Automatic Configuration
Hi Manuel,

Thank you so much for your detailed descriptions!

I am mostly interested in using option 2 and option 3, but I have a few more questions about how to use them. 

1) For option 2, I assume that using multiple machines should run faster than using a single machine. I am in the process of installing the Rmpi. Just wondering, is there an example of using mpirun for irace? 

2) For option 3, by enabling "batchmode = slurm" in the scenario list, what does it mean to have jobs submitted serially? When the first group of jobs is submitted, say 10, these jobs evaluate 10 configuration candidates. Once finished, it sends back the results and another groups of 10 jobs is submitted representing another 10 configuration candidates for evaluation. When all the results are returned, it produces the best configuration. Is it what it means? 

Also, I wonder how fast option 3 is compared with option 2 in terms of parallelized computation time. (my highest priority is to minimize the running time at this point)

Thanks,
Rachel

Manuel López-Ibáñez

unread,
Apr 6, 2023, 7:37:17 AM4/6/23
to The irace package: Iterated Racing for Automatic Configuration
On Thursday, 6 April 2023 at 00:38:44 UTC+1 rachel....@gmail.com wrote:

1) For option 2, I assume that using multiple machines should run faster than using a single machine. I am in the process of installing the Rmpi. Just wondering, is there an example of using mpirun for irace? 

Not necessarily. If your cluster has nodes with 64 or 128 CPUs, then it will be faster to run on a single node with 128 CPUs than run on 8 nodes with 16 CPUs. The communication between nodes required by MPI can be very very slow depending on the cluster.

This is an example of launching irace with MPI in a SGE cluster: https://github.com/MLopez-Ibanez/irace/blob/master/inst/bin/parallel-irace-mpi
If you manage to make it work with SLURM, please share the submission script.

2) For option 3, by enabling "batchmode = slurm" in the scenario list, what does it mean to have jobs submitted serially? When the first group of jobs is submitted, say 10, these jobs evaluate 10 configuration candidates. Once finished, it sends back the results and another groups of 10 jobs is submitted representing another 10 configuration candidates for evaluation. When all the results are returned, it produces the best configuration. Is it what it means?

Yes, irace will submit jobs (so the target-runner must call sbatch and return the jobID), irace will monitor the jobs in the queue and submit more when needed. In this mode, --parallel option controls the maximum number of jobs allowed in the queue (your cluster may have a maximum, otherwise do not use the option so irace will submit as many as needed)
Also, I wonder how fast option 3 is compared with option 2 in terms of parallelized computation time. (my highest priority is to minimize the running time at this point)

Once irace is running, option 1 is the fastest. Option 2 can be faster if you can request many more CPUs than with option 1. However, depending on the configuration of your cluster, requesting many CPUs may require waiting in the queue a long time until all requested CPUs are available.

Option 3 is the slowest BUT it may start running as soon as 1 CPU is available, so it may actually finish earlier than the other options if there is always a few CPUs available in the system.

Personally, I would advise to check in the documentation of your cluster, how many CPUs a single node may have, then run option 1 with that number of CPUs. If that number is 64 or more, it should be enough unless a single run of irace evaluates thousands of configurations. Otherwise, investigate option 2. If option 2 doesn't work, then investigate option 3.

Best,

Manuel.

Rachel Yuan

unread,
Apr 6, 2023, 2:07:37 PM4/6/23
to The irace package: Iterated Racing for Automatic Configuration

Got it. Thank you very much, Manuel! I will check out # of CPUs in the clusters and go from there to see which option I should pursue.

Thanks once again!

Rachel
Reply all
Reply to author
Forward
0 new messages