On Mar 10, 1:28 pm, Stuart Sierra <
the.stuart.sie...@gmail.com> wrote:
> Hello all,
> I'm trying to process about a million files. I've adapted the
> parallel map function in 2 ways: 1) I don't return any values; 2) each
> "worker" thread accumulates results and writes them to a file when
> there are no more files to process. This works for a while, but after
> a few hundred thousand iterations the worker threads throw
> "java.lang.Exception: Transaction failed after reaching retry limit."
>
> Here's an outline of my function; any help appreciated.
>
Your wait function is what is called a busy-wait-loop - it is spinning/
polling and sucking away CPU cycles. You need to try a different
waiting mechanism - CountDownLatch or something.
Depending on memory usage, the problem may better suited for agents
(untested):
(defn process-all [all-files process-one nthreads]
(let [workers (map (fn [n] (agent nil)) (range nthreads))]
(map (fn [job worker] (! worker process-one job))
all-files (cycle workers))
(apply await workers)))
You don't say much about the data or the jobs (e.g. how long does it
take to run process-one once?), so it's hard to make recommendations.
Rich