[R] untaring files in parallel with foreach and doSNOW?

162 views
Skip to first unread message

arielob

unread,
Jul 24, 2012, 12:06:19 PM7/24/12
to r-h...@r-project.org
Hello,

I'm running some code that requires untaring many files in the first step.
This takes a lot of time and I'd like to do this in parallel, if possible.
If it's the disk reading speed that is the bottleneck I guess I should not
expect an improvement, but perhaps it's the processor. So I want to try this
out.

I'm working on windows 7 with R 2.15.1 and the latest foreach and doSNOW
packages. See sessionInfo() below. Thanks in advance for any inputs!

# With lapply it works (i.e. each .tar.gz file is decompressed into several
directories with the files of interest inside)
lapply(tar.files.vector, FUN=untar)

# It also works with foreach in serial mode:
foreach(i=1:length(tar.files.vector)) %do% untar(tar.files.vector[i])

# However, foreach in parallel model gives an error....
foreach(i=1:length(tar.files.vector)) %dopar% untar(tar.files.vector[i])

Error in untar(tar.files.vector[i]) :
task 1 failed - "cannot open the connection"

Any ideas on how to address this problem (with these packages or other
ones)?

Thanks in advance.

Ariel

> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] doSNOW_1.0.6 snow_0.3-10 iterators_1.0.6 foreach_1.4.0
[5] raster_2.0-08 rgdal_0.7-12 sp_0.9-99

loaded via a namespace (and not attached):
[1] codetools_0.2-8 compiler_2.15.1 grid_2.15.1 lattice_0.20-6




--
View this message in context: http://r.789695.n4.nabble.com/untaring-files-in-parallel-with-foreach-and-doSNOW-tp4637614.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

arielob

unread,
Jul 24, 2012, 12:42:53 PM7/24/12
to r-h...@r-project.org
By the way, the code works under MacOS with:

>library(foreach)
>library(doMC)
>registerDoMC()
>foreach(i=tar.files.vector) %dopar% untar(i)

So it has to do with how I write the foreach command on windows... I tried
reading the vignettes and they didn't help much on this.

Thanks again,

Ariel



--
View this message in context: http://r.789695.n4.nabble.com/untaring-files-in-parallel-with-foreach-and-doSNOW-tp4637614p4637620.html

Prof Brian Ripley

unread,
Jul 24, 2012, 2:22:17 PM7/24/12
to arielob, r-h...@r-project.org
On 24/07/2012 17:42, arielob wrote:
> By the way, the code works under MacOS with:
>
>> library(foreach)
>> library(doMC)
>> registerDoMC()
>> foreach(i=tar.files.vector) %dopar% untar(i)
>
> So it has to do with how I write the foreach command on windows... I tried
> reading the vignettes and they didn't help much on this.

So why are you asking us how to use third-party software whose
documentation is not comprehensible to you?

This would be very easy to do in R itself (see package 'parallel'). And
BTW the example it that package's vignette of installing R packages in
parallel does do a lot of untarring in parallel


--
Brian D. Ripley, rip...@stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595

arielob

unread,
Jul 24, 2012, 4:15:59 PM7/24/12
to r-h...@r-project.org
Thanks,

I had the wrong impression that the parallel package had only forking
capabilities (such as multicore), but I missed the parLapply() function.


This code now works under Windows:

>library(parallel)
>cl <- makeCluster(4)

>parLapply(cl, X=tar.files.vector, fun=untar)




--
View this message in context: http://r.789695.n4.nabble.com/untaring-files-in-parallel-with-foreach-and-doSNOW-tp4637614p4637663.html
Sent from the R help mailing list archive at Nabble.com.

Reply all
Reply to author
Forward
0 new messages