|efficient parallel foldMap for lists/sequences||Sebastian Fischer||4/1/11 2:45 AM|
Hello Parallel Haskellers,
(x-post from haskell-cafe)
in parallel programs it is a common pattern to accumulate a list of
results in parallel using an associative operator. This can be seen as
a simple form of the map-reduce pattern where each element of the list
is mapped into a monoid before combining the results using `mconcat`.
This pattern can be abstracted by the `foldMap` function of the
`Foldable` class and we can give a simple parallel implementation
using `par` and `pseq`:
foldMap :: Monoid m => (a -> m) -> [a] -> m
foldMap f  = mempty
foldMap f [x] = f x
foldMap f xs =
let (ys,zs) = splitAt (length xs `div` 2) xs
ys' = foldMap f ys
zs' = foldMap f zs
in zs' `par` (ys' `pseq` (ys' `mappend` zs'))
How can this pattern be implemented in Haskell efficiently?
I plan to investigate the following options and am interested in previous work.
1. Data.Sequence (from containers)
As finger trees are already balanced they look like a good candidate
for parallel consumption. However, the current implementation of the
`Foldable` instance is sequential. I wonder what would be the overhead
and how it could be reduced if it is is too large. I think, I would
first go for an implementation outside of `Foldable` and later
consider the overhead of overloading.
2. Data.Array.Repa (from repa)
provides a `map` and (sequential) `fold` functions. I think that the
main advantage of repa is fusion of successive traversals which is not
necessary for the pattern in question. Hence, I'm not sure whether
it's a good candidate for the job. Also I don't know how to implement
the pattern using repa. Can it be done using `slice` or should it be
done by changing repa itself?
3. Data.Monoid.Reducer (from monoids)
`foldMapReduce` looks promising. I wonder whether it could be used for
parallel reduction and how efficient it would be.
Which approach do you think is most promising? What other options are there?
|Re: efficient parallel foldMap for lists/sequences||Johannes Waldmann||6/16/11 11:23 AM|
Sebastian Fischer <federigo.pescatore@...> writes:
> foldMap :: Monoid m => (a -> m) -> [a] -> m
0. use actual lists: does not really work: length and splitAt are expensive.
> 1. Data.Sequence (from containers)
Indeed. But you have Data.Sequence.splitAt (in log time),
So what I'm using now is Data.Vector(.Unboxed), and this looks good.
|Re: efficient parallel foldMap for lists/sequences||Christopher Brown||6/16/11 1:30 PM|
This looks like a classic divide & conquer pattern to me. What you need to do is
a) only spark of computations when the granularity is high enough to make it worth while
b) only spark of computations that are evaluated to normal forms. Otherwise, Haskell will only apply WHNF evaluation to the lists, which won't be very useful for parallelisation
I would use a version using the new strategies:
(ys'', zs'') = runEval $ do ys'' <- ((rabs ys) `dot` rdeepseq) ys'
zs'' <- ((rabs zs) `dot` rdeepseq) zs'
return (ys'', zs'')
rabs xs | length xs > THRESHOLD = rpar
| otherwise = rseq
in ys'' `mappend` zs''
You will have to define THRESHOLD to be a value that gives you good granularity: you may need to experiment with this.
I hope this gives you some insight.
|Re: efficient parallel foldMap for lists/sequences||Johannes Waldmann||6/16/11 3:19 PM|
> a) only spark of computations when the granularity is high enough to make it
sure, that's what I do:
I tried this with Data.List, and Data.Sequence,
I found this post very helpful:
|Re: efficient parallel foldMap for lists/sequences||Sebastian Fischer||6/17/11 3:08 AM|
Hi Johannes and Chris,
thanks! The version with unboxed vectors looks like a good baseline to
>> a) only spark of computations when the granularity is high enough to make it
Not exactly. Your code sparks of computations as long as there are
The threshold approach can create more sparks than there are