[Haskell-cafe] Increasing parallelism in a frequency counter

Andreas Pauley

unread,

May 21, 2015, 5:08:40 PM5/21/15

to haskel...@haskell.org

Hi everyone,

I've started playing with some parallel programming in Haskell, mostly
by applying the first few examples from Simon Marlow's book to my own
toy problem:
https://github.com/apauley/parallel-frequency

I'm quite pleased with the speedup I have so far, the fastest parallel
implementation is just over twice as fast as its sequential
counterpart.

But looking at the threadscope images from the generated event logs I
can see that there are still significant portions of my program that
runs on only one core.

Is anyone interested to see if we can increase the parallelism in this
frequency counter?

Kind regards,
Andreas

--
https://keybase.io/apauley
http://za.linkedin.com/in/apauley
http://www.meetup.com/lambda-luminaries
http://www.meetup.com/Bitcoins-Baby-Bitcoins

GPG Public Key: https://keybase.io/apauley/key.asc
_______________________________________________
Haskell-Cafe mailing list
Haskel...@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

Claude Heiland-Allen

unread,

May 23, 2015, 11:13:30 AM5/23/15

to haskel...@haskell.org

Hi Andreas, all,

On 21/05/15 22:08, Andreas Pauley wrote:
> Hi everyone,
>
> I've started playing with some parallel programming in Haskell, mostly
> by applying the first few examples from Simon Marlow's book to my own
> toy problem:
> https://github.com/apauley/parallel-frequency

Nice.

> I'm quite pleased with the speedup I have so far, the fastest parallel
> implementation is just over twice as fast as its sequential
> counterpart.
>
> But looking at the threadscope images from the generated event logs I
> can see that there are still significant portions of my program that
> runs on only one core.
>
> Is anyone interested to see if we can increase the parallelism in this
> frequency counter?

If the problem is specified as taking the N most frequently occuring
items in a list:

f :: Ord a => Int -> [a] -> [(a, Int)]

then I cheated, I assume the input is pre-chunked into a sensible number
of pieces:

f' :: Ord a => Int -> [[a]] -> [(a, Int)]

with which it is much easier to get good parallel speedups.

Anyway, some general optimisation points, not all directly related to
the parallel part of the problem:

0. profile to see what is taking time (and space, it increases GC time)
you can use +RTS -h without having to recompile for profiling

1. lists are memory inefficient (eg: Map keys): replace String with Text

2. lists and Text are bad at random access (eg: splitting at an index)

3. use Data.Map.Strict to avoid building up (+) thunks

4. (take n . sort) is possibly taking a lot of sequential time (see 0)

5. perhaps try parallel reductions instead of linear folds:
a * b * c * d * ... -> ((a * b) * (c * d)) * ...
though the overhead didn't seem worth it in my tests

6. not optimisation related, but consider writing a defaultMain
then each executable source can be reduce to
import DefaultMain (defaultMain)
import FrequencyImplementation (frequency)
main :: IO ()
main = defaultMain frequency
you could even have all the implementations with the same module name
but in different directories, and use hs-src-dirs: to select, then
you would be able to use an identical Main module.

more on 2:

A bit of a low-level hack, this: I read the whole input file as a
ByteString and split it into an appropriate number of chunks, taking
care not to split UTF-8 sequences, also taking care not to split
mid-word. Each output chunk references the original input without
copying the data. Then I decodeUtf8 each ByteString chunk to a Text.

This reduced peak memory usage (as reported by +RTS -h graphed with
hp2ps) from ~400MB of (:) to ~30MB of ARR_WORDS for the 11MB artamene.txt.

more on 4:

So you have a large Map k Int containing frequencies, and you want the
largest n frequency counts. There is an O(1) Map.splitRoot, which you
can use to parallel divide and conquer - the largest n of the whole map
are going to be among the (n * split count) combined largest n of each
of the split maps. This didn't give me much speed up at all, though.

I put my implementation at http://code.mathr.co.uk/word-histogram

Thanks,

Claude
--
http://mathr.co.uk

Andreas Pauley

unread,

May 24, 2015, 8:00:16 PM5/24/15

to haskel...@haskell.org, Claude Heiland-Allen

Hi Claude,

On 23 May 2015 at 17:12, Claude Heiland-Allen <cla...@mathr.co.uk> wrote:
>
>
>
> Anyway, some general optimisation points, not all directly related to the
> parallel part of the problem:

Thank you so much for taking the time to implement your own version,
and to explain everything you did.
Your slowest sequential version is about 3 times faster than my
fastest parallel version, I'm really impressed :-)
Thank you also for the general coding tips, I'll definitely do some
cleanup with a defaultMain or similar.

I am busy looking at your implementation together with the
optimisation points you mentioned.
There is a lot for me to absorb, and I'll probably learn more from
comparing your code (and the thinking behind it) with mine than I
would have just reading a book or an article.

Is there a field in computer science similar to comparative
literature? Because I find comparing different implementations of the
same program very useful as a learning tool.

Kind regards,
Andreas

--
https://keybase.io/apauley
http://za.linkedin.com/in/apauley
http://www.meetup.com/lambda-luminaries
http://www.meetup.com/Bitcoins-Baby-Bitcoins

GPG Public Key: https://keybase.io/apauley/key.asc

Andrew Butterfield

unread,

May 25, 2015, 4:27:20 AM5/25/15

to Andreas Pauley, Haskell Cafe

> On 24 May 2015, at 22:07, Andreas Pauley <apa...@gmail.com> wrote:
>
> I am busy looking at your implementation together with the
> optimisation points you mentioned.
> There is a lot for me to absorb, and I'll probably learn more from
> comparing your code (and the thinking behind it) with mine than I
> would have just reading a book or an article.
>
> Is there a field in computer science similar to comparative
> literature? Because I find comparing different implementations of the
> same program very useful as a learning tool.

Look at exercism.io - it supports Haskell

Andrew Butterfield
School of Computer Science & Statistics
Trinity College
Dublin 2, Ireland

Reply all

Reply to author

Forward