segfault, memory not mapped

1,055 views
Skip to first unread message

M. Wagner

unread,
Jul 30, 2017, 8:07:15 AM7/30/17
to The irace package: Iterated Racing for Automatic Configuration
Hi irace team!

Have you come across the following before?

-----------------------------------------------------------------------------------------------------------
 *** caught segfault ***
address 0x20, cause 'memory not mapped'

Traceback:
 1: mcfork(detached)
 2: mcparallel(FUN(X[[nexti]], ...), mc.set.seed = mc.set.seed, silent = mc.silent)
 3: parallel::mclapply(experiments, exec.target.runner, mc.preschedule = !scenario$loadBalancing,     mc.cores = parallel, scenario = scenario)
 4: execute.experiments(experiments[which.exps], scenario)
 5: race.wrapper(configurations = configurations, instance.idx = race.instances[current.task],     which.alive = which.alive, which.exe = which.exe, parameters = parameters,     scenario = scenario)
 6: race(scenario = scenario, configurations = testConfigurations,     parameters = parameters, maxExp = currentBudget, minSurvival = minSurvival,     elite.data = elite.data, elitistNewInstances = if (indexIteration >         1) scenario$elitistNewInstances else 0)
 7: irace(scenario = scenario, parameters = parameters)
 8: irace.main(scenario)
 9: irace.cmdline()
aborting ...
/var/spool/slurmd/job112982/slurm_script: line 37: 28856 Segmentation fault      $(tail -n+$new_index $1 | head -n1)
-----------------------------------------------------------------------------------------------------------

I have a batch of irace runs running in parallel on a slurm cluster, and I get this error from time to time. It does not happen deterministically, e.g. once in iteration 1, next time in iteration 4, sometimes never.

Do you think this is an issue with irace (or something else), or with me and the cluster's software? :p

Software:
Ubuntu 16.04.2 LTS
irace 2.3.1807
R 3.2.3

Best wishes,
Markus

Manuel López-Ibáñez

unread,
Jul 30, 2017, 4:15:55 PM7/30/17
to The irace package: Iterated Racing for Automatic Configuration
Hi Markus,

This is a segfault crash in R and the error is happening at the call to target-runner, but it is difficult to say what may be causing it. Is this the only output?  In theory, irace cannot crash R, because we do not load any compiled code. Also, in theory, your target-runner cannot crash irace and much less crash R, you should get the usual error saying "this is not a bug in irace...". Still, several things may cause R to crash:

* irace is installed (byte-compiled) using one version of R (in your submit node) and loaded with a (perhaps older) version of R (in your execution node).

* The cluster system (via system limits or cluster limits) is killing the R process (or some child of the R process) because either irace or your program is consuming too much memory, or too much disk space or spawning too many children or...

* A bug in R. Those exists but I don't know about this one.

Does it ever happen when running outside SLURM?

Could you try to reproduce the crash when irace is running under valgrind? Just pre-prend: "valgrind --error-exitcode=1 --log-file='irace-%p' --trace-children=yes" to your call to irace.

Cheers,

Manuel.

M. Wagner

unread,
Aug 1, 2017, 4:22:27 AM8/1/17
to Manuel López-Ibáñez, The irace package: Iterated Racing for Automatic Configuration
Hi Manuel,

Thanks a lot for this.

On 31 Jul 2017, at 5:45 am, Manuel López-Ibáñez <manuel.lo...@manchester.ac.uk> wrote:

Hi Markus,

This is a segfault crash in R and the error is happening at the call to target-runner, but it is difficult to say what may be causing it. Is this the only output?  In theory, irace cannot crash R, because we do not load any compiled code. Also, in theory, your target-runner cannot crash irace and much less crash R, you should get the usual error saying "this is not a bug in irace…".

On another cluster, I am getting "this is not a bug in irace..” messages, and I am living with them, although I don’t like them. The segfault messages are new on this cluster.

Still, several things may cause R to crash:

* irace is installed (byte-compiled) using one version of R (in your submit node) and loaded with a (perhaps older) version of R (in your execution node).

I am asking my admins to update R now. As I said, this happens randomly. Maybe the configuration of the compute nodes is not consistent...

* The cluster system (via system limits or cluster limits) is killing the R process (or some child of the R process) because either irace or your program is consuming too much memory, or too much disk space or spawning too many children or…


* A bug in R. Those exists but I don't know about this one.

Does it ever happen when running outside SLURM?

I have not noticed it. I know that I should debug this on my laptop. There is a slight technical problem: the crash probability is about 1/3 in runs that consume a couple CPU weeks, so it is a little tricky to debug this on a 2-core laptop. 

Could you try to reproduce the crash when irace is running under valgrind? Just pre-prend: "valgrind --error-exitcode=1 --log-file='irace-%p' --trace-children=yes" to your call to irace.

I am asking my admins to install valgrind. Will let you know, but this will take a while…

Cheers,
Markus




Cheers,

Manuel.


--
You received this message because you are subscribed to a topic in the Google Groups "The irace package: Iterated Racing for Automatic Configuration" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/irace-package/HCOMUNlxNiA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to irace-packag...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/irace-package/9022f75d-cc0d-4bbd-b5fc-496e6c21ca04%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

M. Wagner

unread,
Aug 18, 2017, 8:33:15 AM8/18/17
to The irace package: Iterated Racing for Automatic Configuration, manuel.lo...@manchester.ac.uk


On Tuesday, 1 August 2017 17:52:27 UTC+9:30, M. Wagner wrote:
Hi Manuel,

Thanks a lot for this.

On 31 Jul 2017, at 5:45 am, Manuel López-Ibáñez <manuel.lopez-ibanez@manchester.ac.uk> wrote:

Hi Markus,

This is a segfault crash in R and the error is happening at the call to target-runner, but it is difficult to say what may be causing it. Is this the only output?  In theory, irace cannot crash R, because we do not load any compiled code. Also, in theory, your target-runner cannot crash irace and much less crash R, you should get the usual error saying "this is not a bug in irace…".

On another cluster, I am getting "this is not a bug in irace..” messages, and I am living with them, although I don’t like them. The segfault messages are new on this cluster.

FYI, my best guess right now: old R version in combination with an unreliable file system. Maybe this combination leads to rather strange errors.
The cause for my problems might be a completely different one, however, this is my best guess right now.

 

Still, several things may cause R to crash:

* irace is installed (byte-compiled) using one version of R (in your submit node) and loaded with a (perhaps older) version of R (in your execution node).

I am asking my admins to update R now. As I said, this happens randomly. Maybe the configuration of the compute nodes is not consistent...

My admins have not gotten back to me, so I have no further information on this yet.
 

* The cluster system (via system limits or cluster limits) is killing the R process (or some child of the R process) because either irace or your program is consuming too much memory, or too much disk space or spawning too many children or…


* A bug in R. Those exists but I don't know about this one.

Does it ever happen when running outside SLURM?

I have not noticed it. I know that I should debug this on my laptop. There is a slight technical problem: the crash probability is about 1/3 in runs that consume a couple CPU weeks, so it is a little tricky to debug this on a 2-core laptop. 

Could you try to reproduce the crash when irace is running under valgrind? Just pre-prend: "valgrind --error-exitcode=1 --log-file='irace-%p' --trace-children=yes" to your call to irace.

I am asking my admins to install valgrind. Will let you know, but this will take a while…

Similarly here: I have no further information on this yet.

Cheers,
Markus
Reply all
Reply to author
Forward
0 new messages