I have implemented a wf10 in C++ using 537 LOC. When using with standard c++
allocator the program needs:
real 5m31.650s
user 66m29.504s
sys 4m44.607s
By switching to the hoard allocator as outlined in Christoph Bartoschek's mail
this goes down to:
real 4m41.376s
user 54m6.374s
sys 4m5.561s
I made the same observations as Christoph that when you can manage to have
some data in the cache the time goes down to:
real 4m0.770s
user 53m20.249s
sys 4m29.073s
but that as you see the cpu time that is used is nearly the same so only the
speedup due to the disk caches help. If run again the values from above come
up again.
This is with 14 threads working on the counting of the data and 8 threads
reading the data.
Have a nice day,
Thorsten
> Congratulations. This is a very good result.
Thanks.
> For me you have proven that having a good idea to parallelize a
> program is more important than the technical implementation.
> Could you also show us the code?
I have uploaded the code to my webside. You can download it here:
http://www.zagge.de/wf2/wf10.tar.gz