How to work with datasets in irace

72 views
Skip to first unread message

Phoebe Liu

unread,
Feb 27, 2023, 6:43:00 PM2/27/23
to The irace package: Iterated Racing for Automatic Configuration
Hi there,

I have a question about importing datasets into irace. For example, a dataset dataset_A with 10 rows and 10 columns is assigned to be instances in scenario (scenario$instances = dataset_A). However, when the targetRunner function is run using dataset_A as instances, it seems only the first column is used. I tested it by printing experiment$instance as cost value, and only the values of the first column are shown. Is it normal? My ultimate goal is to use the full set of dataset_A with 10 columns and I need to do some computing using the 10 columns. How should I import it so that I can use 10 columns?

Thank you in advance.

Best regards,

Phoebe

Manuel López-Ibáñez

unread,
Feb 28, 2023, 3:33:43 AM2/28/23
to The irace package: Iterated Racing for Automatic Configuration
If your instance is a dataset, then either put the dataset in a file and give to irace the name of the file instancesFile (see the examples with the TSP problem in the user-guide) or directly in R, you can call:

scenario <- list(targetRunner = target.runner,
                 instances = list(dataset),  
                 maxExperiments = 200,  
                 logFile = "") 

If you have two datasets, you can call:

scenario <- list(targetRunner = target.runner,
                 instances = list(dataset1, dataset2),  
                 maxExperiments = 200,  
                 logFile = "") 


and so on.

Complete example:

library(irace)

data <- matrix(runif(1000), nrow=500)
# target runner function
target.runner <- function(experiment, scenario) {
 
  instance <- experiment$instance
  configuration <- experiment$configuration
  # This is an example that does not do anything useful, just illustrates how to use a dataset.
  res <- list(value = sum(instance[, as.numeric(configuration[["mode"]])]))
  return(list(cost = res$value))
   
}

parameters <- readParameters(text='
mode "" c (1,2)
')
# scenario
scenario <- list(targetRunner = target.runner,
                 instances = list(data),  
                 maxExperiments = 200,  
                 logFile = "")  

# check that the scenario is valid.
checkIraceScenario(scenario, parameters = parameters)
Message has been deleted

Phoebe Liu

unread,
Feb 28, 2023, 5:46:52 PM2/28/23
to The irace package: Iterated Racing for Automatic Configuration
Hi Manuel,

Thank you so much for your detailed explanation! It is so helpful. The code has run successfully now.
 
I notice that in the example, the instances can include multiple datasets (instances = list(dataset1, dataset2)). Just some follow-up questions about this example, if I want to include three datasets, run the algo on each of the datasets, and summarize the final result by averaging the results from these three datasets. Should the code be something like this:

//

library(irace)


# target runner function
target.runner <- function(experiment, scenario) {
 
  instance <- experiment$instance
  configuration <- experiment$configuration

  # run algorithm on each dataset
  res <- list()
  for(i in 1:3) {
   res[[i]] <-my_algo(instance[[i]])
   }

  res_final <- list("value" = as.numeric(sapply(res, mean))     # ensure the final result returns a single value

  return(list(cost =  res_final$value))

}

# scenario
scenario <- list(targetRunner = target.runner,
                 instances = list(dataset1, dataset2, dataset3),  

                 maxExperiments = 200,  
                 logFile = "")  

# check that the scenario is valid.
checkIraceScenario(scenario, parameters = parameters)

//

I am not sure if for-loop should be used within the target.runner function or not.

In addition, to make it run faster, I want to use parallel computing and then modify the code for the scenario as follows while keeping the other parts the same:

//

# get cores
n.cores <- parallel::detectCores()

# scenario
scenario <- list(targetRunner = target.runner,
                 instances = list(dataset1, dataset2, dataset3),  
                 maxExperiments = 200,  
                 parallel = n.cores,
                 logFile = "")  

//

In this way, if n.cores = 12, I assume that each dataset will take 4 cores and be applied to the algorithm in parallel. Do I understand it correctly?

Thank you once again!

Best regards,

Phoebe

Phoebe Liu

unread,
Mar 1, 2023, 12:19:43 AM3/1/23
to The irace package: Iterated Racing for Automatic Configuration
Hi Manuel,

I was also able to use "trainInstancesDir" and "trainInstancesFile" to load the data. But unfortunately, among 3 datasets, only one dataset was read in and run. My code and train-instances.txt file are like,

//

# train-instances.txt file under Instances folder

dataset1.csv
dataset2.csv
dataset3.csv


# scenario
scenario <- list(targetRunner = target_runner,
                 trainInstancesDir = "~/Instances",
                 trainInstancesFile = "train-instances.txt",
                 maxExperiments = 200,  
                 # parallel = n.cores,
                 logFile = "")   


# target_runner function

target_runner <- function(experiment, scenario) {

 
  instance <- experiment$instance
 
  configuration <- experiment$configuration
 
  data <- read.csv(instance)
 
  res <- list("value" =  my_algo(data, as.numeric(configuration(..))))
  
  return(list(cost = res$value))
 
}

//

Running this with checkIraceScenario() function always gave me twice the same algo outputs from either one of the three datasets (because targetRunner was executed only twice in checkIraceScenario(..)). It looks like only one dataset was read in and used whenever target_runner is run. 

The configuration is all good, but I am really troubled by how to read in all the datasets and apply the algo to each of them for an output. The final results will be the averaged outputs from three datasets. Did I misunderstand or miss anything inside the function?

Thanks a lot!

Phoebe



On Tuesday, February 28, 2023 at 1:33:43 AM UTC-7 Manuel López-Ibáñez wrote:

Manuel López-Ibáñez

unread,
Mar 1, 2023, 4:49:52 AM3/1/23
to The irace package: Iterated Racing for Automatic Configuration
Dear Phoebe,

'checkIraceScenario()' only performs a couple of evaluations to quickly detect errors in the setup. If you run a short irace run you should see irace using all instances.

Note that irace does not simply average the output of the instances. Irace evaluates each configuration on a number of instances (this number is adaptively set) and performs statistical tests to detect when one configuration is significantly worse than the best known. Please read the original irace paper for all the details: https://doi.org/10.1016/j.orp.2016.09.002

Hence, target_runner should return the output of one instance and not the average over multiple instances. Irace will use the individual outputs to decide what to do (and sometimes it may resort to averaging if needed).

Also, if your ultimate performance metric (the metric that you would use to decide that one configuration is better than another) is mean performance over instances, then you may wish to setup irace to use testType="t-test" (see the user-guide for a discussion about testType).

> if n.cores = 12, I assume that each dataset will take 4 cores and be applied to the algorithm in parallel. Do I understand it correctly?

Not exactly, irace parallelizes at the level of instances, so it will run 12 candidate configurations in parallel on the same instance. It would be possible to have a more advanced (ideally asynchronous) parallelization that evaluates multiple instances simultaneously, but nobody has implemented that as far as I know (pull requests/forks are welcome: https://github.com/MLopez-Ibanez/irace)

Cheers,

Manuel.

Phoebe Liu

unread,
Mar 1, 2023, 1:58:06 PM3/1/23
to The irace package: Iterated Racing for Automatic Configuration
Hi Manuel,

Thank you for your reply. I apologize for my confusing words. My goal is to import all the datasets into the target_runner function, run my algorithm on each of them, and then I will take an average of the results. It is all within the target_runner function. In this case, when one set of parameters is provided, each dataset will output a result based on my algorithm, and there will be three outputs due to three datasets. I will average the results myself as a final result for the cost value for this particular set of parameters within the target_runner function. Then, after exploring all kinds of combinations of parameters, the best parameter will be found. Is it possible to do that? The algorithm is the same, except that the input is different, which includes different datasets and different configurations, leading to different results. I will summarize these results (take average) myself in the target_runner function.

I am able to load multiple datasets using the scenario list, but I am not sure how experiment$instance works with these datasets in the target_runner function. Does experiment$instance import one dataset or all of them? Is it possible to run the algorithm on these datasets (by a loop or in parallel etc.) and generate outputs so that I can do something with these outputs in the target_runner function, given a set of parameters? If so, could you provide some suggestions on what I can do to loop over these datasets on my algorithm in the target_runner function?

Thank you very much!

Phoebe

Phoebe Liu

unread,
Mar 3, 2023, 1:51:53 PM3/3/23
to The irace package: Iterated Racing for Automatic Configuration
Hi Manuel,

I am sorry for troubling you again. I tried to follow your example, but I am still stuck on running results on multiple datasets. I will use your example to illustrate my problem.

//

library(irace)

# target runner function
target.runner <- function(experiment, scenario) {
  instance <- experiment$instance
  configuration <- experiment$configuration
  res <- list("value" = colMeans(instance))
  return(list(cost = res$value))
}

# generate datasets
data1 <- matrix(runif(1000), nrow=50)
data2 <- matrix(runif(1000), nrow=50)


# scenario
scenario <- list(targetRunner = target.runner,
                 instances = list(data1, data2),

                 maxExperiments = 200,  
                 logFile = "")  

# parameters (parameters don't contribute to the results here)
parameters <- readParameters(text='
mode "" c (1,2)
')

# check that the scenario is valid
checkIraceScenario(scenario, parameters = parameters)

//

As the scenario says, both data1 and data2 are imported. However, the results are only the column means for either data1 or data2 as if only one dataset is imported, while my hope is that this target.runner function could produce and display the column means results for both data1 and data2. The format of the results is flexible, such as a list of 4 columns representing the column means of both datasets. This is the core issue of what I have experienced previously. Could you please help me see where I was wrong on that (data importing or mean calculation in the function etc.)? I am really struggling with this, and it stands in the way of moving my project forward. If this is cleared, I should be able to run this myself on my own algorithm. 

Thank you once again for your help! 

Best regards,

Phoebe

Manuel López-Ibáñez

unread,
Mar 4, 2023, 4:32:48 AM3/4/23
to The irace package: Iterated Racing for Automatic Configuration
target.runner must return a single number in 'cost'. You cannot return 4 numbers or 4 columns. How should irace compare those numbers or columns to decide with parameter configuration is better?

If you need both datasets within a single target.runner call, then combine them into a single dataset or a list of datasets:

single_instance <- list(data1,data2)
 
and pass it to irace as a single instance as discussed earlier.

instances = list(single_instance),

Best,

Manuel.

Phoebe Liu

unread,
Mar 7, 2023, 2:01:25 PM3/7/23
to The irace package: Iterated Racing for Automatic Configuration
Hi Manuel,

Thank you for your help. Now it works. The datasets were loaded and run. 

Best,

Phoebe

Reply all
Reply to author
Forward
0 new messages