Job status on pbs clusters

29 views
Skip to first unread message

benjam...@gmail.com

unread,
Mar 5, 2020, 6:21:06 AM3/5/20
to The irace package: Iterated Racing for Automatic Configuration
Hello,

I am currently using irace to configure an algorithm on a pbs cluster, using the "--bachmode pbs" option.
It turns out, irace fails to retrieve the status of jobs on such clusters, and ends up waiting for them for much longer than needed.
The issue comes from the pbs.job.finished function, which simply calls qstat and returns its exit status.
This fails when the job is completed, but has not been archived yet.

I believe changing the function to something like this should fix the issue in most cases:

pbs.job.finished <- function(jobid){
 qstat_out <- system (paste0("qstat ", jobid), ignore.stdout=FALSE,
                      ignore.stderr=TRUE, intern=TRUE, wait=TRUE)
 return(grep(" C ", qstat_out))
}


I have however been unable to rebuild the package from source to test it.
For some reason, vignettes can be neither built nor ignored, and build fails with the following error:

Warning in file(con, "w") :
  cannot open file
'/home/user/R/x86_64-pc-linux-gnu-library/3.4/irace/doc/index.html': No such file or directory
Error in file(con, "w") : cannot open the connection
ERROR
: installing vignettes failed
* removing ‘/home/user/R/x86_64-pc-linux-gnu-library/3.4/irace
Warning message:
In install.packages(".", repos = NULL, type = "source") :
  installation of
package ‘.’ had non-zero exit status


I may be missing something, though, as I am not that familiar with R's environment.

Thanks!
Best regards.

Manuel López-Ibáñez

unread,
Mar 5, 2020, 6:43:41 AM3/5/20
to The irace package: Iterated Racing for Automatic Configuration
Dear Benjamin,

Should the grep call be instead "grepl" ? Also, could the job be completed and removed from the queue before irace has time to check? See the torque.job.finished function and comments there. Perhaps we should use similar code for PBS?

I'm not sure why it fails to build. There is no HTML vignettes and the vignette is already build in the source package (in inst/doc/), so there is no need to build them. Perhaps you can disable building them in Rstudio. In the command-line, it is simply:

R CMD build  '--no-build-vignettes'

Cheers,

Manuel.

benjam...@gmail.com

unread,
Mar 5, 2020, 10:33:39 AM3/5/20
to The irace package: Iterated Racing for Automatic Configuration
Dear Benjamin,

Should the grep call be instead "grepl" ? Also, could the job be completed and removed from the queue before irace has time to check? See the torque.job.finished function and comments there. Perhaps we should use similar code for PBS?

Actually, it seems the cluster I am working with uses Torque to manage resources. The job file format is pretty similar and misled me, sorry for the confusion.
Anyway, running the same code with "--batchmode torque" does not seem to solve the problem, although I'd expect torque.job.finished to work this time.
 
I'm not sure why it fails to build. There is no HTML vignettes and the vignette is already build in the source package (in inst/doc/), so there is no need to build them. Perhaps you can disable building them in Rstudio. In the command-line, it is simply:

R CMD build  '--no-build-vignettes'

Building the package does indeed work with this option.
For the record: to install the package, I had to use:
R CMD INSTALL '--libs-only' 
For some reason, the '--no-doc' and '--no-html' options were not enough.

Manuel López-Ibáñez

unread,
Mar 5, 2020, 1:18:41 PM3/5/20
to The irace package: Iterated Racing for Automatic Configuration
On Thursday, 5 March 2020 15:33:39 UTC, benjam...@gmail.com wrote:
Dear Benjamin,

Should the grep call be instead "grepl" ? Also, could the job be completed and removed from the queue before irace has time to check? See the torque.job.finished function and comments there. Perhaps we should use similar code for PBS?

Actually, it seems the cluster I am working with uses Torque to manage resources. The job file format is pretty similar and misled me, sorry for the confusion.
Anyway, running the same code with "--batchmode torque" does not seem to solve the problem, although I'd expect torque.job.finished to work this time.

Do you mean that the pristine package (not the version that you installed) using --batchmode torque does not work?

I would also have expected the torque code to work.

Can you find out what is different in your output from what irace expects?

Note that for "qstat JOBID", irace expects something like

JOBID C something

If irace is waiting is because qstat returned something, just not what was expected.


benjam...@gmail.com

unread,
Mar 10, 2020, 7:01:14 AM3/10/20
to The irace package: Iterated Racing for Automatic Configuration

Do you mean that the pristine package (not the version that you installed) using --batchmode torque does not work?

I would also have expected the torque code to work.
 
Can you find out what is different in your output from what irace expects?

Note that for "qstat JOBID", irace expects something like

JOBID C something

If irace is waiting is because qstat returned something, just not what was expected.

I think I found what's failing here. Irace expects to find text matching the regex produced by paste0(jobid, ".*\\sC\\s"), where jobid id is the full name of the job.
But qstat truncates job ids in its output (keeping only the first 29 characters in my case, but might be cluster specific).
Then, because irace cannot find the "job completed" pattern, it keeps waiting for jobs until they are eventually removed from the queue.

Manuel López-Ibáñez

unread,
Mar 10, 2020, 7:04:50 AM3/10/20
to The irace package: Iterated Racing for Automatic Configuration

Is there a way in torque to avoid this truncation? We need to match the specific jobid.

benjam...@gmail.com

unread,
Mar 10, 2020, 2:47:01 PM3/10/20
to The irace package: Iterated Racing for Automatic Configuration
This stackoverflow topic suggests there is no simple command line option that can be used to keep job names from truncating.
Most answers seem to get and parse an XML version of qstat's output.

It may not be necessary to get the full name of the job after the qstat call, though, as they are passed as argument to the command in the first place -- job ids are only truncated in the summary produced by qstat.
Could it be assumed that matching " C " is enough?



Manuel López-Ibáñez

unread,
Mar 10, 2020, 3:57:58 PM3/10/20
to The irace package: Iterated Racing for Automatic Configuration
I guess so. I cannot test it as I don't have  access to a PBS/Torque cluster.

Best,

Manuel.
Reply all
Reply to author
Forward
0 new messages