Issues with Parallel

Juan Ossa-Moreno

unread,

Aug 12, 2016, 4:32:48 AM8/12/16

to simmer-devel

Hello

Am trying to use parallel to run a simulation several times. However, I got this error.

Error: cannot allocate vector of size 48.9 Mb

I tried to run it again and got:

Error: cannot allocate vector of size 39.4 Mb

I did a search on google and found suggestions as running gc() or restarting R, before running the model. Or running the model on a CPU with better RAM. The first did not work and the latter is not an option in the short term, is there any way of getting around this? Perhaps defining some parameters on mclapply?

Regards

Juan

Iñaki Úcar

unread,

Aug 12, 2016, 6:35:24 AM8/12/16

to simmer-devel

Wow, are you running out of RAM? Can you share with us your code (or a similar one with the same effect)? I would like to test it in order to discard memory leaks. It would be useful also if you provide us with more details about your system: RAM amount, OS, R version and simmer version.

Since yesterday, there is a new version of simmer on CRAN (3.4.1) in which I solved some memory leaks. Please, update the package if you have a previous one.

If there are no memory leaks, then either your simulation is huge or you are trying to run too many replicas at the same time, or a combination of both. Either way, if the final amount of data does not fit into your system's RAM, you won't be able to process it with standard tools. You'll need to save it to disk and use advanced tools such as the bigmemory package. Therefore, you better batch up your replicas.

For instance, if you want, say, 1000 replicas, and you have a CPU with 4 cores with hyperthreading, you cannot get any performance gain beyond 8 replicas at the same time. So you can set up a loop with a mclapply of 8 replicas and save data to disk, 8 replicas and save to disk... and so on, up to your 1000 replicas.

But still, I would appreciate if you can provide us with the code, because I would like to discard (or fix) possible memory leaks.

Regards,

Iñaki

--
You received this message because you are subscribed to the Google Groups "simmer-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to simmer-devel+unsubscribe@googlegroups.com.
To post to this group, send email to simmer...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/simmer-devel/2bfe2522-a1cd-4baf-9d8c-686417fe0f0e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Iñaki Úcar
http://www.enchufa2.es
@Enchufa2

Iñaki Úcar

unread,

Sep 5, 2016, 9:35:12 AM9/5/16

to simmer-devel

Hi Juan,

Any news on this issue?

Regards,

Iñaki

Iñaki Úcar

unread,

Sep 30, 2016, 2:10:17 PM9/30/16

to simmer-devel, Juan Ossa Moreno

Hi Juan,

I’ve analysed the code you sent me privately. The conclusions may be valuable for someone else, so I’ll reply through the mailing list.

First of all, I’ve discarded any memory leaks (in the current version of simmer on CRAN). What happens is that your simulation is huge: you are generating a lot of data, so you need to follow an alternative approach.

Keep in mind that mclapply creates new workers basically by duplicating the environment. So start with an environment as clean as possible. Ideally, you should launch a script containing only the steps needed with Rscript. If you need to play around with RStudio, for instance, clean your environment of unwanted variables with rm and call the garbage collector gc before running the simulation.
wrap is a nice tool, but it cannot be used when the simulation environment grows huge. For each run, the best approach is to extract the data you need, edit the replication index and write the data to disk.

An example of this:

library(simmer)
# other stuff

simulation <- function(i) {
  des <- simmer()

  # define trajectories
  # add resources
  # add generators
  # run

  # save arrivals to disk
  A <- get_mon_arrivals(des, per_resource=T)
  A$replication <- i
  write.csv(A, file = paste0("arrivals_", i, ".csv"))

  # save attributes to disk
  B <- get_mon_attributes(des)
  B$replication <- i
  write.csv(B, file = paste0("attributes_", i, ".csv"))

  # save resources to disk
  C <- get_mon_resources(des)
  C$replication <- i
  write.csv(C, file = paste0("resources_", i, ".csv"))
}

mclapply(1:100, mc.preschedule=F, simulation)

# load data
# analyse data

This way, you should be able to run as many replications as you want (provided that the volume of data generated * number of cores of your computer stays below your RAM capacity, which in my case is true). Then, once the simulation is done, you’ll have the data for all the replicas on disk, and you’ll see whether it fits in RAM. If not, you’ll need special packages for the analysis part, but that’s another story…

Regards,
Iñaki

Juan Ossa-Moreno

unread,

Oct 8, 2016, 3:40:02 AM10/8/16

to simmer-devel

Hi Inaki

Thanks a lot for this! Will extract the info i need only and avoid wrap.

Regards

Juan

To unsubscribe from this group and stop receiving emails from it, send an email to simmer-devel...@googlegroups.com.

To post to this group, send email to simmer...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/simmer-devel/2bfe2522-a1cd-4baf-9d8c-686417fe0f0e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.