faster flatten?

131 views
Skip to first unread message

Cam

unread,
Jul 12, 2010, 10:46:36 PM7/12/10
to Clojure
Another flatten thread! Sorry..

Hello all, before I realized there was a flatten in the master branch
(and before I looked at contrib) I wrote this pretty standard code:

(defn my-flatten [coll]
(lazy-seq
(when-let [coll (seq coll)]
(let [x (first coll)]
(if (sequential? x)
(concat (my-flatten x) (my-flatten (next coll)))
(cons x (my-flatten (next coll))))))))

(There's very similar versions on the boards. I'm not claiming this is
anything amazing or unique.)

It's not as elegant as what's in core, but in my micro benchmarks (ran
on my laptop; 2.26 core 2 and 4gb ram) it seems to perform a bit
better, _especially_ in the already flattened case. It behaves just
like core/flatten except that it doesn't return an empty list when
passed a map or set, it just returns whatever you gave it but with the
top level converted to a seq. I'm pretty much a clojure noob, so are
there any hidden detractors of this implementation as opposed to the
version introduced in 1.2?

Also, quick note, if you swap the call to sequential? with seqable?
from contrib/core, it flattens maps and sets like you'd expect as
well.
Here is how it looks
user=> (my-flatten #{1 {2 3} 4 [5 6 7 #{8 {9 10}}]})
(1 2 3 4 5 6 7 9 10 8)

And for the micro-benchmarks (using "sequential?"):

user=> (time (dotimes [_ 1e7] (flatten [1 2 3 4])))
"Elapsed time: 14,661.592 msecs"
nil

user=> (time (dotimes [_ 1e7] (my-flatten [1 2 3 4])))
"Elapsed time: 922.268 msecs"
nil

user=> (time (dotimes [_ 1e7] (flatten [1 [2 [3 [4 [5 [6 [7 [8]
[[[9]]] 10 [11] 12 [13 14 [15]]]]]]]]])))
"Elapsed time: 18,147.959 msecs"
nil

user=> (time (dotimes [_ 1e7] (my-flatten [1 [2 [3 [4 [5 [6 [7 [8]
[[[9]]] 10 [11] 12 [13 14 [15]]]]]]]]])))
"Elapsed time: 6,088.914 msecs"
nil

user=> (time (dotimes [_ 1e7] (flatten [[1 2 3 4 5 6 7 8 9 10]])))
"Elapsed time: 11,696.693 msecs"
nil

user=> (time (dotimes [_ 1e7] (my-flatten [[1 2 3 4 5 6 7 8 9 10]])))
"Elapsed time: 1,533.983 msecs"
nil

Thoughts?

Mark Engelberg

unread,
Jul 13, 2010, 1:02:05 AM7/13/10
to clo...@googlegroups.com
Unless you wrap a "doall" around the calls to flatten and my-flatten,
you're not really timing anything relevant.

Stuart Halloway

unread,
Jul 13, 2010, 9:10:34 AM7/13/10
to clo...@googlegroups.com
Hi Cam,

Your tests aren't testing the interesting part without a doall.

That said, my quick tests with doall show your approach faring even better. :-) Also, I think what my-flatten does with Java arrays is intuitive (and the current flatten not so much).

A patch that preserves the semantics of the existing flatten (except for working with Java arrays) would be welcome.

Thanks!
Stu

> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en

Cam

unread,
Jul 13, 2010, 11:04:34 AM7/13/10
to Clojure
Hi Stuart,

Thanks for checking that out for me! Sorry for not realizing in the
first place.

I of course would be happy to submit a patch. Should I submit that
here or over on the assembla page?

Cam

unread,
Jul 13, 2010, 11:43:00 AM7/13/10
to Clojure
Hi again, I modified my-flatten to return the empty list for sets and
maps as core/flatten does. It doesn't seem to handle Arrays anymore
though. I'm assuming it's because ArrayList and (int-array ...) don't
implement Sequential. None the less should I still submit this
modified version that behaves just like core/flatten?

(defn my-flatten
[coll]
(lazy-seq
(when-let [coll (if (sequential? coll) (seq coll))]
(let [x (first coll)]
(if (sequential? x)
(concat (my-flatten x) (my-flatten (next coll)))
(cons x (my-flatten (next coll))))))))

Might it be worth promoting "seqable?" to core? In that case flatten
would handle pretty much everything you could throw at it like you'd
expect. I don't speak for everyone but when I saw sequential? I
assumed it would have the semantics that seqable? does.

Stuart Halloway

unread,
Jul 13, 2010, 11:49:09 AM7/13/10
to clo...@googlegroups.com
Hi Cam,

The full instructions for joining the team and then submitting a patch are at [1] an [2], but in short:

* send in a CA
* join the Assembla space under you real name
* post a patch there linking to this thread

Thanks!
Stu

[1] http://clojure.org/contributing
[2] http://clojure.org/patches

Stuart Halloway

unread,
Jul 13, 2010, 11:57:10 AM7/13/10
to clo...@googlegroups.com
Hi Cam,

Please submit the modified version, and, if you want, create a separate ticket for "seqable?". I would like to review the latter separately.

Stu

Cam

unread,
Jul 13, 2010, 1:47:58 PM7/13/10
to Clojure
OK, all submitted. The tickets are up for discussion at

http://www.assembla.com/spaces/clojure/support/tickets/400-a-faster-flatten
http://www.assembla.com/spaces/clojure/support/tickets/401-promote--seqable---from-contrib-

I will mail my CA in tomorrow morning.

Thanks Stu and Mark!

On Jul 13, 11:57 am, Stuart Halloway <stuart.hallo...@gmail.com>

miner

unread,
Jul 14, 2010, 1:56:38 PM7/14/10
to Clojure
I think it's worthwhile to have a faster flatten even if it doesn't
look as elegant as the current implementation. You could do a bit of
refactoring and save yourself a call to sequential? since the
recursive calls are guaranteed to have seqs (as a result of next).

Also, I'd prefer flatten to return the argument if it isn't
sequential? so for example, (flatten 10) ==> 10. I think it would be
less likely to give mysterious results, especially with mistaken
arguments. I understand that the current flatten for 1.2 beta doesn't
do this -- I'm just throwing in another suggestion after playing with
it for a while.

Here's my suggestion:

(defn fl1 "faster flatten" [coll]
(letfn [(flcoll [coll]
(lazy-seq
(when-let [c (seq coll)]
(let [x (first c)
nxt (flcoll (next c))]
(if (sequential? x)
(concat (flcoll x) nxt)
(cons x nxt))))))]
(if (sequential? coll) (flcoll coll) coll)))

Cam

unread,
Jul 14, 2010, 2:40:48 PM7/14/10
to Clojure
I definitely like this version a little better. If you change the else
of the if to be just (list), it returns the empty list just as core/
flatten does. Mind if I update the ticket with this patch?

Steve Miner

unread,
Jul 14, 2010, 3:55:13 PM7/14/10
to clo...@googlegroups.com

On Jul 14, 2010, at 2:40 PM, Cam wrote:

> I definitely like this version a little better. If you change the else
> of the if to be just (list), it returns the empty list just as core/
> flatten does. Mind if I update the ticket with this patch?

It's all yours. Really, just a slight change from your code anyway.

I wonder about the call to next. I'm thinking it should be rest instead. (See http://clojure.org/lazy)

I definitely don't like the way (flatten 10) and (flatten {:a 1 :b 2}) return the empty list (in 1.2 beta). I think these are accidents, and I worry that they will obscure higher-level bugs.

My preference is to return the arg if it's not sequential? as I believe it will provide a more useful result at no extra cost. In that case, the argument to flatten was probably a mistake, and it's better if the value shows up somewhere rather than being mysteriously swallowed. On the other hand, it might make sense to return (list arg) on the theory that flatten should always return a seq. I could live with that.

Cam

unread,
Jul 15, 2010, 12:02:01 AM7/15/10
to Clojure
You're right about rest, I didn't see that rest should be preferred to
next when consuming seqs in this style.

I agree that flatten returning the empty list for some of the things
that actually can be "seqed" is odd. I'm sure there's a good reason
somewhere, but it's not to hard to change by just using seqable? from
contrib. That makes flatten handle anything. After benchmarking it
though, it seems like its muuuch slower; so slow that I got bored of
waiting after 5 minutes. Seems like a punt though :/


On Jul 14, 3:55 pm, Steve Miner <stevemi...@gmail.com> wrote:
> On Jul 14, 2010, at 2:40 PM, Cam wrote:
>
> > I definitely like this version a little better. If you change the else
> > of the if to be just (list), it returns the empty list just as core/
> > flatten does. Mind if I update the ticket with this patch?
>
> It's all yours.  Really, just a slight change from your code anyway.
>
> I wonder about the call to next.  I'm thinking it should be rest instead.  (Seehttp://clojure.org/lazy)
Reply all
Reply to author
Forward
0 new messages