Any help on this efficiency issue would be greatly appreciated.
I would like to find the most efficient way to run a non-vectorized function (here: fisher exact test p-value) iteratively using 4 matrices with identical dimensions. And as a result I aim for an array with identical dimensions containing the corresponding p-values. Please consider some code using a trivial example with 3x4 arrays below. Eventually I would like to run code on 2e3 x 7e6 arrays, for which someone suggested Amazon EC2 already...
Q1: would you agree that fisher.test() is not vectorizable? e.g. fisher.test( matrix(c(Ax,Ay,Bx,By),ncol=2) ) does not work Q2: direct use of Ax, Ay, Bx, By as input instead of a (list) transform for the input would seem beneficial for speed Q3: parallelization of the iterative process seems to make sense. Q4: a progress bar seems to save peace of mind having no clue of the runtime. Q5: avoidance of an output transform to get array from vector Q6: for Q2/3/4/5 plyr seems to be ideal (e.g. maply)
Please also find some solutions below. solution 1: using mapply solution 2: using lapply solution 3: using mclapply attempt 4: stuck on plyr implementation
I think you will only get marginal gains through tweaking how you run
the non-vectorised code. If you really want performance improvements,
I think you need to bite the bullet and vectorise fisher.test.
On Sun, Sep 2, 2012 at 12:05 PM, philip <pcv...@gmail.com> wrote:
> Dear list members,
> Any help on this efficiency issue would be greatly appreciated.
> I would like to find the most efficient way to run a non-vectorized function
> (here: fisher exact test p-value) iteratively using 4 matrices with
> identical dimensions. And as a result I aim for an array with identical
> dimensions containing the corresponding p-values. Please consider some code
> using a trivial example with 3x4 arrays below. Eventually I would like to
> run code on 2e3 x 7e6 arrays, for which someone suggested Amazon EC2
> already...
> Q1: would you agree that fisher.test() is not vectorizable? e.g.
> fisher.test( matrix(c(Ax,Ay,Bx,By),ncol=2) ) does not work
> Q2: direct use of Ax, Ay, Bx, By as input instead of a (list) transform for
> the input would seem beneficial for speed
> Q3: parallelization of the iterative process seems to make sense.
> Q4: a progress bar seems to save peace of mind having no clue of the
> runtime.
> Q5: avoidance of an output transform to get array from vector
> Q6: for Q2/3/4/5 plyr seems to be ideal (e.g. maply)
> Please also find some solutions below.
> solution 1: using mapply
> solution 2: using lapply
> solution 3: using mclapply
> attempt 4: stuck on plyr implementation
> --
> You received this message because you are subscribed to the Google Groups
> "manipulatr" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/manipulatr/-/MRvbT1kwDmMJ.
> To post to this group, send email to manipulatr@googlegroups.com.
> To unsubscribe from this group, send email to
> manipulatr+unsubscribe@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/manipulatr?hl=en.