Thanks for including your 'use' line -- that's so much better than
leaving it implied. Please also consider using 'require' instead, or
the :only option to 'use' to make it clear which functions are being
brought in from which lib.
> (defn split-string-in-two [s]
> (let [chunk-size (quot (count s) 2)]
> [(subs s 0 chunk-size), (subs s chunk-size)]))
Might this cut a word in half and produce (slightly) incorrect
results?
> 1. Is there a better way to do it? Perhaps agents should share some
> data structure?
I've got lots to learn in the realm of parallel program design, but
the sequential summing step at the end stood out to me. Perhaps
another agent that just does incremental summing along the way would
reduce the running time.
Also, the entire file is read in by a single thread -- perhaps the use
of mmap would allow the agents to start counting sooner and reduce
total run time that way. You may want to look at clojure.contrib.mmap
> 2. Despite producing valid results, the program never ends. Why?
When a program uses agents, the top-level of the program is
responsible for determining that all the agents have done what they
need to, and that the program can terminate. At the REPL this can
generally be done with Ctrl-D. In a Script, use (shutdown-agents). I
imagine this ought to be done by the same level of code that calls
parallel-top-words, since p-t-w itself can't know that no other agent
work is being done.
--Chouser
> On Dec 30, 9:18 am, Mibu <mibu.cloj...@gmail.com> wrote:
>> In an ideal world, standard functions like map, sort, reduce, filter,
>> etc. would know when to parallelize on their own, or even better, the
>> compiler will do it for them.
>
> The former is easier than the latter ;-) Even the smartest
> autoparallelizing compilers rely on manual annotations to expose lack
> of sequential dependencies in loops, but a function's API can
> guarantee that parallelism and let the function's implementation do
> the heavy lifting.
Manual intervention is required not only for the identification of
dependencies, but also for telling the compiler where parallelization
makes sense for gaining efficiency. It is often said that pure
functional programs have the advantage of being autoparallelizable
because there are no hidden dependencies, but there still isn't any
decent autoparallelizing compiler for any functional language at the
moment. One of the reasons is that performance analysis is still very
difficult. Perhaps one day we will have parallel JIT compilers for
this, but that's not for tomorrow.
> incredibly frustrating for users (trust me). This means it's better
> to have simpler compilers and smarter libraries.
Definitely!
Konrad.