I have implemented a wf10 in C++ using 537 LOC. When using with standard c++
allocator the program needs:
By switching to the hoard allocator as outlined in Christoph Bartoschek's mail
this goes down to:
I made the same observations as Christoph that when you can manage to have
some data in the cache the time goes down to:
but that as you see the cpu time that is used is nearly the same so only the
speedup due to the disk caches help. If run again the values from above come
This is with 14 threads working on the counting of the data and 8 threads
reading the data.
Have a nice day,
> Congratulations. This is a very good result.
> For me you have proven that having a good idea to parallelize a
> program is more important than the technical implementation.
> Could you also show us the code?
I have uploaded the code to my webside. You can download it here: