Allow fold to parallelize over lazy sequences

Paul Butcher

unread,

Apr 16, 2013, 7:07:18 AM4/16/13

to cloju...@googlegroups.com

At Stuart Sierra's suggestion, I recently submitted the following patch:

http://dev.clojure.org/jira/browse/CLJ-1197

As this is my first Clojure patch, it's entirely possible that I've missed something. I'd greatly appreciate feedback - is there anything further that I should do to increase the chances of it being accepted?

Many thanks in advance,

--
paul.butcher->msgCount++

Snetterton, Castle Combe, Cadwell Park...
Who says I have a one track mind?

http://www.paulbutcher.com/
LinkedIn: http://www.linkedin.com/in/paulbutcher
MSN: pa...@paulbutcher.com
AIM: paulrabutcher
Skype: paulrabutcher

Andy Fingerhut

unread,

Apr 16, 2013, 1:06:28 PM4/16/13

to cloju...@googlegroups.com

As far as the basic mechanics go, the patch is in the correct format, applies cleanly, and passes all tests with the latest Clojure master. That isn't enough to be accepted, but it is better than the alternative.

The average rate of commits going into Clojure has averaged around 13 or 14 per month over the last couple of years. Enhancements tend to be given lower priority than bugs.

Mentioning it here on the mailing list can help: it gives other contributors an opportunity to know about it, take a look, and perhaps vote on it if they like it. Votes may give it some priority over other unvoted enhancements when screeners look through the not-yet-classified tickets.

Andy

--
You received this message because you are subscribed to the Google Groups "Clojure Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure-dev...@googlegroups.com.
To post to this group, send email to cloju...@googlegroups.com.
Visit this group at http://groups.google.com/group/clojure-dev?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Paul Butcher

unread,

Apr 17, 2013, 12:01:36 PM4/17/13

to cloju...@googlegroups.com

Thanks for the feedback, Andy. It sounds like I've done everything I can? So I'll just wait patiently for the screeners to get to it.

In the meantime, if anyone has any feedback on the patch (good, bad or indifferent) I'd be very interested to hear it.

Thanks again,

--
paul.butcher->msgCount++

Snetterton, Castle Combe, Cadwell Park...
Who says I have a one track mind?

http://www.paulbutcher.com/
LinkedIn: http://www.linkedin.com/in/paulbutcher
MSN: pa...@paulbutcher.com
AIM: paulrabutcher
Skype: paulrabutcher

Alan Malloy

unread,

Apr 17, 2013, 2:18:19 PM4/17/13

to cloju...@googlegroups.com

Personally, I don't see much benefit to a foldable lazy-seq. You have to realize it all at once anyway, so why not just dump it into a vector, and then use vector's efficient fold?

Paul Butcher

unread,

Apr 17, 2013, 5:41:04 PM4/17/13

to cloju...@googlegroups.com

On 17 Apr 2013, at 19:18, Alan Malloy <al...@malloys.org> wrote:

Personally, I don't see much benefit to a foldable lazy-seq. You have to realize it all at once anyway, so why not just dump it into a vector, and then use vector's efficient fold?

The use case that led me to implementing foldable-seq was working on a sequence that was 40GiB (a complete Wikipedia dump). There's no way that that would fit into RAM, so realising it all at once is impossible. foldable-seq enables it to be processed in parallel very nicely...

Alan Malloy

unread,

Apr 17, 2013, 8:05:27 PM4/17/13

to cloju...@googlegroups.com

That's not deterministic at all, though: if threads happen to be scheduled in a particular way, the entire sequence can be in memory all at once. It's not a general solution to the problem you ran into. For example, imagine that the sequence you want to fold is the first billion integers, (range 1e9). On my computer, it takes about three minutes to walk over all of them; if the operation you want to perform takes longer than three minutes for each partition, then all the partitions will be realized and put into the fork-join task queue before any chunks can be GCed.

I might be missing some subtleties regarding the lazy calls to chunk-task/fjfork, so that the problem I describe only happens with sequences that are short, but I could certainly construct a collection and fold/reduce functions that cause the entire sequence to be in memory at once when folding a lazy sequence with your implementation.

Alan Malloy

unread,

Apr 17, 2013, 9:17:05 PM4/17/13

to cloju...@googlegroups.com

I did a little poking around, to back up my assertion:

(fold 100 + (fn

([] (Thread/sleep 5000) 0)

([x y] (+ x (count y))))

(foldable-seq (repeatedly 1000 #(int-array 10000000))))

OutOfMemoryError Java heap space clojure.lang.Numbers.int_array (Numbers.java:1092)

So we make a thousand very large arrays, and want to compute the sum of their lengths in parallel, a hundred per thread. I also sleep for five seconds at the start of each chunk, to ensure that they all get scheduled at the same time; in my tests we run out of heap space even without this sleep, but I wanted to make sure it was reliably reproducible. At any rate, even though the sequence is perfectly lazy, and the sum can easily be computed in a single thread, attempting to parallelize it causes the whole thing (or at least a large portion of it) to be crammed into memory at once, and we run out of heap.

Folding lazy sequences in parallel may help with performance sometimes, but it introduces its own problems. I'm not actually sure whether it's possible, in general, to do any better than falling back to a single-threaded reduce, without introducing breakages like this patch does. I'd love to find out that it is possible, but nothing occurs to me.

Andy Fingerhut

unread,

Apr 18, 2013, 1:52:10 AM4/18/13

to cloju...@googlegroups.com

Isn't it true in general that parallelizing computation for things that are "embarrasingly parallel" takes as much memory as the sum of the things you are running at the same time, whereas doing it sequentially takes less?

Is there something about your example that is worse than that effect?

Thanks,
Andy

Paul Butcher

unread,

Apr 18, 2013, 8:23:11 AM4/18/13

to cloju...@googlegroups.com

What foldable-seq does (or at least, tries to do) is carve off n "chunks" from the start of the sequence (where by default, n is 10) and reduce those chunks in parallel. As the result of each reduction becomes available, it's combined into an accumulator and one additional chunk carved off to be reduced.

This means that the max memory that should ever be in use is n * the size of a chunk, plus whatever working memory the reduce and combine functions require.

In your example, this will certainly mean that it will run out of RAM because each chunk contains 100 arrays. Multiply that by the 10 chunks that foldable-seq uses by default and you're trying to process all 1000 arrays at once.

But it should be possible to parallelise successfully by choosing different parameters to fold and foldable-seq. I would expect the following, for example, to both parallelise and not run out of RAM:

(fold 10 + (fn ([] 0) ([x y] (+ x (count y)))) (foldable-seq 4 (repeatedly 1000 #(int-array 10000000))))

Because in this case, each chunk contains 10 arrays, and foldable-seq only creates 4 chunks at a time. So the max memory should be 4 * 10 * array-size.

Having said that, the above *does* give an out of memory error, which implies (damn!) that I have a bug somewhere in my implementation. Thanks for leading me to it - I'll see if I can work out what I've screwed up.

--
paul.butcher->msgCount++

Snetterton, Castle Combe, Cadwell Park...
Who says I have a one track mind?

http://www.paulbutcher.com/
LinkedIn: http://www.linkedin.com/in/paulbutcher
MSN: pa...@paulbutcher.com
AIM: paulrabutcher
Skype: paulrabutcher

Alan Malloy

unread,

Apr 18, 2013, 2:41:21 PM4/18/13

to cloju...@googlegroups.com

I don't think so, no. In the cold light of morning, foldable-seq seems pretty reasonable to me; I was under the vague impression that he was making all lazy sequences foldable, which seemed like a terrible choice, but as an opt-in measure it's a tradeoff you have to make on purpose, which seems fine.

Paul Butcher

unread,

Apr 18, 2013, 6:50:39 PM4/18/13

to cloju...@googlegroups.com

On 18 Apr 2013, at 19:41, Alan Malloy <al...@malloys.org> wrote:

In the cold light of morning, foldable-seq seems pretty reasonable to me; I was under the vague impression that he was making all lazy sequences foldable, which seemed like a terrible choice,

Oh - that *would* have been a terrible choice! But as you say, it's not what I'm proposing :-)

but as an opt-in measure it's a tradeoff you have to make on purpose, which seems fine.

Cool. I'm still working on why I can't get that example of yours to work without running out of memory (I suspect that I'm falling foul of chunked sequences somewhere, but haven't quite worked out where yet) and will modify the patch when I've diagnosed it.

Thanks for the feedback.

Paul Butcher

unread,

Apr 18, 2013, 7:54:36 PM4/18/13

to cloju...@googlegroups.com

On 18 Apr 2013, at 23:50, Paul Butcher <pa...@paulbutcher.com> wrote:

I'm still working on why I can't get that example of yours to work without running out of memory

So I've made some progress with this - but I've found myself deep in the twilight zone in the process. I can reproduce the OutOfMemoryError without any of my foldable-seq code whatsoever. Here's a REPL session that illustrates what I'm seeing (this is all with vanilla Clojure 1.5.1 - none of my changes compiled in):

user=> (require '[clojure.core.reducers :as r])
nil
user=> (reduce (fn [x y] (+ x (count y))) 0 (repeatedly 1000 #(int-array 10000000)))
10000000000

user=> (reduce (fn [x y] (+ x (count y))) 0 (r/map identity (repeatedly 1000 #(int-array 10000000))))

OutOfMemoryError Java heap space clojure.lang.Numbers.int_array (Numbers.java:1092)

So, I don't get an OOM error if I just use reduce, but I do if I use clojure.core.reducers/map with the identity function.

I have no clue whatsoever why this might be the case - all that clojure.core.reducers/map does is return a reification of CollReduce that forwards to clojure.core.protocols/coll-reduce.

I would be very grateful indeed for any light that anyone could cast on this.

In any case - it looks like the problem is not specific to foldable-seq as I can reproduce it with the 1.5.1 version of clojure.core.reducers.

Alan Malloy

unread,

Apr 18, 2013, 8:56:42 PM4/18/13

to cloju...@googlegroups.com

r/map creates a closure that holds onto the head of its input sequence, as do most coll-fold implementations. Normally this isn't a problem, because they're holding onto something small and cheap, like a (Range. 1000000) or something, but if you have an actual sequence as your backing store, holding the head will be a huge problem. I hadn't thought of this problem before; I wonder if Rich has a solution for it?

Paul Butcher

unread,

Apr 19, 2013, 6:44:19 AM4/19/13

to cloju...@googlegroups.com

On 19 Apr 2013, at 01:56, Alan Malloy <al...@malloys.org> wrote:

r/map creates a closure that holds onto the head of its input sequence, as do most coll-fold implementations. Normally this isn't a problem, because they're holding onto something small and cheap, like a (Range. 1000000) or something, but if you have an actual sequence as your backing store, holding the head will be a huge problem. I hadn't thought of this problem before; I wonder if Rich has a solution for it?

Oh - of course it does - I don't know how I didn't see that. Thanks.

That's a pain. I'd be very interested to hear Rich's thoughts.

Paul Butcher

unread,

Apr 19, 2013, 10:41:15 AM4/19/13

to cloju...@googlegroups.com

As a straw-man (I'm sure that there must be nicer ways to achieve this) if I modify clojure.core.reducers/folder as follows, then I no longer get an out of memory error when reducing over Alan's sequence of arrays:

(defn- deref-and-discard [a]
(let [v @a]
(reset! a nil)
v))

(defn folder
"Given a foldable collection, and a transformation function xf,
returns a foldable collection, where any supplied reducing
fn will be transformed by xf. xf is a function of reducing fn to
reducing fn."
{:added "1.5"}
([coll xf]
(let [coll-atom (atom coll)]
(reify
clojure.core.protocols/CollReduce
(coll-reduce [_ f1]
(clojure.core.protocols/coll-reduce (deref-and-discard coll-atom) (xf f1) (f1)))
(coll-reduce [_ f1 init]
(clojure.core.protocols/coll-reduce (deref-and-discard coll-atom) (xf f1) init))

CollFold
(coll-fold [_ n combinef reducef]
(coll-fold (deref-and-discard coll-atom) n combinef (xf reducef)))))))

--
paul.butcher->msgCount++

Snetterton, Castle Combe, Cadwell Park...
Who says I have a one track mind?

http://www.paulbutcher.com/
LinkedIn: http://www.linkedin.com/in/paulbutcher
MSN: pa...@paulbutcher.com
AIM: paulrabutcher
Skype: paulrabutcher

Reply all

Reply to author

Forward