(take-by f coll), (drop-by f coll): Problem Comparing Adjacent Items Of A Collection

134 views
Skip to first unread message

Stefan Rohlfing

unread,
Apr 2, 2011, 11:38:26 PM4/2/11
to clo...@googlegroups.com
Dear Clojure Group,

In the style of take-while and drop-while I wrote two functions take-by and drop-by that take an arbitrary function f instead of a predicate and return a lazy sequence based on comparing the result of (f item) of successive items in the collection.

  However, whenever I need to compare adjacent items of a collection, I quickly get lost and the result always looks like a hack. Here, this is especially true for the implementation of drop-by:

(defn take-by [f coll]
  "Starting from the first item in coll,
  returns a lazy sequence of items for which (f item)
  returns the same result."
  (lazy-seq
    (when-let [s (seq coll)]
      (let [x (first s)]
        (cons x (take-by f (when-let [y (second s)]
                             (if (= (f x) (f y))
                               (rest s)
                               nil))))))))


;; (take-by (partial + 2) [])
;; ()
;; (take-by (partial + 2) [2 2 2 3 3 4 4])
;; (2 2 2)



(defn drop-by [f coll]
  "Returns a lazy sequence of the items in coll starting
  from the first item for which (f item) returns the same result."
  (lazy-seq
    (let [s (seq coll), x (first s), y (second s)]
      (if (and s y)
        (if (= (f x) (f y))
          (drop-by f (rest (rest s)))
          (rest s))
        s))))


;; (drop-by (partial + 2) [])
;; ()
;; (drop-by (partial + 2) [2 2 2 3 3 4])
;; (3 3 4)


I am sure there is a standard functional way of comparing adjacent items in a coll and would be glad if someone could point me to it.
Also, any suggestions of making my code a little bit more idiomatic are highly welcome!

Best regards,

Stefan

Ken Wesson

unread,
Apr 3, 2011, 12:16:39 AM4/3/11
to clo...@googlegroups.com
On Sat, Apr 2, 2011 at 11:38 PM, Stefan Rohlfing
<stefan....@gmail.com> wrote:
> I am sure there is a standard functional way of comparing adjacent items in
> a coll and would be glad if someone could point me to it.

(defvar- sentinel (Object.))

(defn take-by [f coll]
(let [fs (map f coll)
ps (map = fs (rest fs))
zs (map #(if %1 %2 sentinel) ps coll)]
(take-while (partial not= sentinel) zs))

user=> (take-by #(mod % 3) [1 4 1 7 34 16 10 2 99 103 42])
(1 4 1 7 34 16)

I leave drop-by as an exercise for the reader, but the punch line here
is (map foo s (rest s)). :)

Ken Wesson

unread,
Apr 3, 2011, 12:21:03 AM4/3/11
to clo...@googlegroups.com
On Sun, Apr 3, 2011 at 12:16 AM, Ken Wesson <kwes...@gmail.com> wrote:
> On Sat, Apr 2, 2011 at 11:38 PM, Stefan Rohlfing
> <stefan....@gmail.com> wrote:
>> I am sure there is a standard functional way of comparing adjacent items in
>> a coll and would be glad if someone could point me to it.
>
> (defvar- sentinel (Object.))
>
> (defn take-by [f coll]
>  (let [fs (map f coll)
>        ps (map = fs (rest fs))
>        zs (map #(if %1 %2 sentinel) ps coll)]
>    (take-while (partial not= sentinel) zs))
>
> user=> (take-by #(mod % 3) [1 4 1 7 34 16 10 2 99 103 42])
> (1 4 1 7 34 16)

Actually, this should include the 10 after the 16. It needs to be

(defn take-by [f coll]
(let [fs (map f coll)
ps (map = fs (rest fs))

zs (map #(if %1 %2 sentinel) ps (rest coll))]
(cons (first coll) (take-while (partial not= sentinel) zs))))

Andreas Kostler

unread,
Apr 3, 2011, 12:24:24 AM4/3/11
to clo...@googlegroups.com
Or more in line with the implementation of take-while:

(defn take-by [f coll]


(lazy-seq
(when-let [s (seq coll)]
(let [x (first s)

y (second s)]
(if (= (f x) (f y))

(cons x (take-by f (rest s)))
(list x))))))

Kind Regards
Andreas

> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en

--
"Test-driven Dentistry (TDD!) - Not everything should be test driven"
- Michael Fogus
--
**********************************************************
Andreas Koestler, Software Engineer
Leica Geosystems Pty Ltd
270 Gladstone Road, Dutton Park QLD 4102
Main: +61 7 3891 9772 Direct: +61 7 3117 8808
Fax: +61 7 3891 9336
Email: andreas....@leica-geosystems.com

************www.leica-geosystems.com*************

when it has to be right, Leica Geosystems

Please consider the environment before printing this email.

Andreas Kostler

unread,
Apr 3, 2011, 12:31:57 AM4/3/11
to clo...@googlegroups.com
Without bugs this time:
(defn take-by [f coll]

(lazy-seq
(when-let [s (seq coll)]
(let [x (first s)
y (second s)]
(if (and x y (= (f x) (f y)))

(cons x (take-by f (rest s)))
(list x))))))

Andreas Kostler

unread,
Apr 3, 2011, 1:04:04 AM4/3/11
to clo...@googlegroups.com
Another way to skin the cat:

(defn take-by [f coll]
(cons (first coll)
(for [[x y]
(map (fn [x y] [x y]) coll (rest coll))
:while (= (f x) (f y))] x)))

Cheers
Andreas


On 03/04/2011, at 2:21 PM, Ken Wesson wrote:

Ken Wesson

unread,
Apr 3, 2011, 3:17:31 AM4/3/11
to clo...@googlegroups.com
On Sun, Apr 3, 2011 at 1:04 AM, Andreas Kostler
<andreas.koe...@gmail.com> wrote:
>                       (map (fn [x y] [x y]) coll (rest coll))

What's your reason for using (fn [x y] [x y]) here instead of just
vector? Is it more efficient because it isn't variable-arity?

Stefan Rohlfing

unread,
Apr 3, 2011, 3:23:20 AM4/3/11
to clo...@googlegroups.com
@ Ken, Andreas

Thank you for your nice implementations!

As far as I can see, there are two main methods of comparing adjacent items of a list:

1) Take the first item of a coll. Compare the remaining items with an offset of 1:
(map f coll (rest coll))
;; apply some sort of filter

2) Take the first and the second item of a coll. Test if both items exists and then compare:
(let [x (first coll), y (second coll)]
   (and x y (= (f x) (f y))))

The first method works for take-by, but not for drop-while (or so I think).

Using the second method take-by can be changed into drop-by by only changing the last two lines:

(defn drop-by2 [f coll]

  (lazy-seq
    (when-let [s (seq coll)]
      (let [x (first s)
            y (second s)]
        (if (and x y (= (f x) (f y)))
          (drop 2 s)
          (drop 1 s))))))


I am curious to know if one of these methods is the preferred way of doing a comparison in Clojure (aka Python's one way of doing things). The sheer number of possible solutions to the same problem sometimes overwhelms me a bit, so that I'm looking for some orientation ;-)

Best regards,

Stefan

Andreas Kostler

unread,
Apr 3, 2011, 4:08:19 AM4/3/11
to clo...@googlegroups.com
I can't answer that Ken,
I guess I wasn't thinking of vec when I wrote it :)

Andreas Kostler

unread,
Apr 3, 2011, 4:38:11 AM4/3/11
to clo...@googlegroups.com
Hi Stefan,
I am overwhelmed by the 'freedom of choice' Clojure gives you, too. In that respect it's more like ruby than python.
Nevertheless, I think if you can come up with an algorithm using the built in functions over sequences like map, reduce, filter, etc. 
you should do so. 
Either way, it doesn't really matter which way you do it, just do it right and then refine as you go (and gain more experience). 
Just be aware that you have to explicitly loop-recur for tail recursive processes, recursive processes might blow up your call stack pretty quickly.
 
So that would cover 1)
For 2)
I can think of a straight forward tail-recursive process to do what you want:

 (defn drop-by [f coll]
            (loop [coll coll]
               (let [x (first coll)
                    y (second coll)]
                    (if (and x y (= (f x) (f y)))
                        (recur (rest coll))
                        (rest coll)))))

Maybe someone on this list can think of an implementation in terms of sequence functions. 

Cheers
Andreas
P.s. The most powerful tool that comes with Clojure is this community. Use it!

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Stefan Rohlfing

unread,
Apr 3, 2011, 5:34:49 AM4/3/11
to clo...@googlegroups.com
Hi Andreas,

You are right, the helpful Clojure community is a real asset to be proud of!

I like your solution using loop ... recur as it makes the intention of the code a bit more clear.

The pattern
 (lazy-seq
    (when-let [s (seq coll)]
      ...
is used in the implementation of take-while in Clojure Core. loop ... recur cannot be used because the function call is not the tail position.

By the way, my code for drop-by2 in my last post was not correct. So here is the corrected version:


(defn drop-by2 [f coll]
  (lazy-seq
    (when-let [s (seq coll)]
      (let [x (first s)
            y (second s)]
        (if (and x y (= (f x) (f y)))
          (drop-by2 f (rest s))
          (rest s))))))

Best regards,

Stefan





Ken Wesson

unread,
Apr 3, 2011, 6:24:58 AM4/3/11
to clo...@googlegroups.com
On Sun, Apr 3, 2011 at 3:23 AM, Stefan Rohlfing
<stefan....@gmail.com> wrote:
> @ Ken, Andreas
>
> Thank you for your nice implementations!
>
> As far as I can see, there are two main methods of comparing adjacent items
> of a list:
>
> 1) Take the first item of a coll. Compare the remaining items with an offset
> of 1:
> (map f coll (rest coll))
> ;; apply some sort of filter
>
> 2) Take the first and the second item of a coll. Test if both items exists
> and then compare:
> (let [x (first coll), y (second coll)]
>    (and x y (= (f x) (f y))))
>
> The first method works for take-by, but not for drop-while (or so I think).

I don't. :)

(defn drop-by [f coll]


(let [fs (map f coll)
ps (map = fs (rest fs))

zs (map list ps (rest coll))]
(map second (drop-while first zs))))

user=> (drop-by #(mod % 3) [1 4 1 7 34 16 10 2 99 103 42])
(2 99 103 42)

I especially like the symmetry of using take-while for the one and
drop-while for the other, under the hood.

Meikel Brandmeyer

unread,
Apr 3, 2011, 7:15:58 AM4/3/11
to Clojure
Hi,

On 3 Apr., 12:24, Ken Wesson <kwess...@gmail.com> wrote:

> I don't. :)
>
> (defn drop-by [f coll]
>   (let [fs (map f coll)
>         ps (map = fs (rest fs))
>         zs (map list ps (rest coll))]
>     (map second (drop-while first zs))))
>
> user=> (drop-by #(mod % 3) [1 4 1 7 34 16 10 2 99 103 42])
> (2 99 103 42)
>
> I especially like the symmetry of using take-while for the one and
> drop-while for the other, under the hood.

Or a bit less involved.

(defn take-by
[f coll]
(lazy-seq
(when-let [s (seq coll)]
(let [fst (first s)
value (f fst)]
(cons fst (take-while #(-> % f (= value)) (rest s)))))))

(defn drop-by
[f coll]
(lazy-seq
(when-let [s (seq coll)]
(let [fst (first s)
value (f fst)]
(drop-while #(-> % f (= value)) (rest s))))))

Sincerely
Meikel

Ken Wesson

unread,
Apr 3, 2011, 7:18:19 AM4/3/11
to clo...@googlegroups.com

"Less" involved?

Meikel Brandmeyer

unread,
Apr 3, 2011, 9:34:53 AM4/3/11
to Clojure
Hi,

On 3 Apr., 13:18, Ken Wesson <kwess...@gmail.com> wrote:

> "Less" involved?

Your solution combines 5 sequences with drop-while-first-map-second
magic which I personally don't find very self-explaining at first
sight. My solution does one step with only local impact and then
delegates to a single well-known library function.

For laziness there is some cost involved, I agree.

Sincerely
Meikel

Alan

unread,
Apr 3, 2011, 4:35:50 PM4/3/11
to Clojure
Isn't all this just a special case of partition-by?

(defn drop-by [f coll]
(apply concat (rest (partition-by f coll))))

(defn take-by [f coll]
(first (partition-by f coll)))

user> (drop-by (partial + 2) [2 2 2 3 3 4])
(3 3 4)
user> (take-by #(mod % 3) [1 4 1 7 34 16 10 2 99 103 42])
(1 4 1 7 34 16 10)

Andreas Kostler

unread,
Apr 3, 2011, 4:50:30 PM4/3/11
to clo...@googlegroups.com
There you go, symmetry and simplicity :)

Roman Sykora

unread,
Apr 3, 2011, 8:45:54 AM4/3/11
to Clojure
Hi

When I read this post in the morning I thought about how I would
implement this function, and the result was:

(defn take-by
[f coll]
(when-let [[x & xs] (seq coll)]
(let [n (f x)]
(lazy-seq (cons x (take-while #(= n (f %)) xs))))))

I didn't post it, because I really am a beginner in clojure and
functional programming and others had already given different answers
that were far more 'complicated' than my solution.

Now, my question is what's the improvement of

> (defn take-by [f coll]
> (let [fs (map f coll)
> ps (map = fs (rest fs))
> zs (map #(if %1 %2 sentinel) ps (rest coll))]
> (cons (first coll) (take-while (partial not= sentinel) zs))))

over

> (defn take-by
>   [f coll]
>   (lazy-seq
>     (when-let [s (seq coll)]
>       (let [fst   (first s)
>             value (f fst)]
>         (cons fst (take-while #(-> % f (= value)) (rest s)))))))

Sincerely
Roman

Alan

unread,
Apr 3, 2011, 5:04:48 PM4/3/11
to Clojure
I like your version, Roman. It seems as efficient as anything, and is
easy to read. For what it's worth, I'd make a small rewrite:

(defn take-by [f coll]
(lazy-seq
(when-let [[x & xs] (seq coll)]
(let [val (f x)]
(cons x (take-while (comp #{val} f) xs))))))

Ken Wesson

unread,
Apr 3, 2011, 5:13:05 PM4/3/11
to clo...@googlegroups.com
On Sun, Apr 3, 2011 at 8:45 AM, Roman Sykora <4rt....@gmail.com> wrote:
> Now, my question is what's the improvement of
>
>> (defn take-by [f coll]
>>   (let [fs (map f coll)
>>          ps (map = fs (rest fs))
>>          zs (map #(if %1 %2 sentinel) ps (rest coll))]
>>     (cons (first coll) (take-while (partial not= sentinel) zs))))
>
> over
>
>> (defn take-by
>>   [f coll]
>>   (lazy-seq
>>     (when-let [s (seq coll)]
>>       (let [fst   (first s)
>>             value (f fst)]
>>         (cons fst (take-while #(-> % f (= value)) (rest s)))))))
>
> Sincerely
> Roman

The former is shorter and, IMO, a bit clearer. The lattter though is
probably more efficient at runtime. So the answer is really "it
depends".

The partition-by implementations are even shorter and clearer, of
course, but I thought it would be instructive to show a way of
implementing that general type of thing (depending on successive
elements of a single seq) without using an existing similar function
(e.g. partition-by) -- where does the first such function come from?
-- and using the basic sequence operations like map, filter, and
reduce.

The more versatilely you can use those sequence functions to solve
problems, the better a Clojure programmer you are, quite frankly. And
so ultimately having all the different variations posted here helps
everyone reading this thread. There are even a couple of interesting
bugs and their fixes in this thread. :)

It's also true that the more you look at and understand code like
what's in this thread, the more you can get into the "functional"
mind-set, making patterns like (map foo x (rest x)) jump quickly to
mind (as well as possibly higher-level functions like partition-by)
when you see certain kinds of problem (like depending on successive
members of a seq).

Of course, part of thinking functionally includes thinking how
patterns like that can be turned into useful abstractions, too. For
example, to get groups of n consecutive elements in general. You could
use (map vector x (rest x)) to get all the pairs of successive
elements; then again we also have (partition 2 1 x) and the general
(partition n 1 x) for this.

What about generalizing (map foo x (rest x) (rest (rest x)) ...)?
Well, the obvious is (map (partial apply foo) (partition n 1 x))) for
this. You could grab this and define it as a function if you find you
need it much:

(defn map-nextn [f n s]
(map (partial apply f) (partition n 1 s)))

How would you implement this without partition? One possibility would be

(defn map-nextn [f n s]
(apply map f (take n (iterate rest s))))

which, interestingly, is actually two characters *shorter*, though I
wouldn't say it was clearer. :)

Cheers.

Roman Sykora

unread,
Apr 4, 2011, 2:09:31 AM4/4/11
to Clojure
Thank you for your time and kind answers. I really enjoy reading this
group and the whole clojure community. Enough flattery ;)

A question remains though (for now).

Would I use

(lazy-seq
(some
(stuff ...)))

instead of

(some
(stuff
(lazy-seq ...)))

because some stuff would only be realized/executed if one uses the
resulting lazy-seq in the former case ?

Sincerely
Roman

Alan

unread,
Apr 4, 2011, 1:27:14 PM4/4/11
to Clojure
Yes. Generally push lazy-seq as high in the chain as you can (but no
higher!), for that reason..

Meikel Brandmeyer

unread,
Apr 4, 2011, 5:55:03 PM4/4/11
to Clojure
Hi,

On 3 Apr., 14:45, Roman Sykora <4rt.f...@gmail.com> wrote:

> Now, my question is what's the improvement of
>
> > (defn take-by [f coll]
> >   (let [fs (map f coll)
> >          ps (map = fs (rest fs))
> >          zs (map #(if %1 %2 sentinel) ps (rest coll))]
> >     (cons (first coll) (take-while (partial not= sentinel) zs))))
>
> over
>
> > (defn take-by
> >   [f coll]
> >   (lazy-seq
> >     (when-let [s (seq coll)]
> >       (let [fst   (first s)
> >             value (f fst)]
> >         (cons fst (take-while #(-> % f (= value)) (rest s)))))))

I summarised my issues with the different solutions. Since things got
a bit philosophical I wrote up a short blog post on my point of view:
http://bit.ly/hInw0J

Sincerely
Meikel

Ken Wesson

unread,
Apr 4, 2011, 7:07:23 PM4/4/11
to clo...@googlegroups.com
> I summarised my issues with the different solutions. Since things got
> a bit philosophical I wrote up a short blog post on my point of view:
> http://bit.ly/hInw0J

I'm surprised to hear that partition-by is not very lazy, and will
hang on an infinite seq. Fortunately, that can be fixed:

(defn partition-by [f s]
(lazy-seq
(let [s (seq s)]
(if s
(let [fs (map f s)
eq (cons true (map = fs (rest fs)))
eqs (seq (map list eq s))
step (fn step [eqs]
(lazy-seq
(let [eqs (seq eqs)]
(if eqs
(cons
(lazy-seq
(cons (second (first eqs))
(map second (take-while first (rest eqs)))))
(step (drop-while first (rest eqs))))))))]
(step eqs))))))

A bit long and convoluted, with no fewer than three applications of
lazy-seq, but fully lazy:

user=> (defn report-seq
[]
(letfn [(this
[i]
(lazy-seq
(prn (str ">" i "<"))
(cons i (this (inc i)))))]
(take 14 (this 0))))
#'user/report-seq
user=> (def q (partition-by #(quot % 3) (report-seq)))
#'user/q
; No elements realized
user=> (def q (first (partition-by #(quot % 3) (report-seq))))
">0<"
#'user/q
; Had to realize 0 to know whether to return nil or a lazy-seq object here
user=> (def q (first (first (partition-by #(quot % 3) (report-seq)))))
">0<"
#'user/q
; Realized only the element it returned
user=> (def q (second (partition-by #(quot % 3) (report-seq))))
">0<"
">1<"
">2<"
">3<"
; Realized up to the first element of the next chunk, to know if that
; chunk existed
user=> (partition-by #(quot % 3) (report-seq))
">0<"
((0">1<"
1">2<"
2">3<"
) (3">4<"
4">5<"
5">6<"
) (6">7<"
7">8<"
8">9<"
) (9">10<"
10">11<"
11">12<"
) (12">13<"
13))
; Each element realized only as it's needed during a full traversal
user=> (take 5 (partition-by #(quot % 3) (iterate inc 0)))
((0 1 2)
(3 4 5)
(6 7 8)
(9 10 11)
(12 13 14))
; Does not blow up on infinite input seq
user=> (defn report-fn [x] (println "Expensive computation would have
been run on: " x) (quot x 3))
#'user/report-fn
user=> (def q (doall (map doall (partition-by report-fn (range 14)))))
Expensive computation would have been run on: 0
Expensive computation would have been run on: 1
Expensive computation would have been run on: 2
Expensive computation would have been run on: 3
Expensive computation would have been run on: 4
Expensive computation would have been run on: 5
Expensive computation would have been run on: 6
Expensive computation would have been run on: 7
Expensive computation would have been run on: 8
Expensive computation would have been run on: 9
Expensive computation would have been run on: 10
Expensive computation would have been run on: 11
Expensive computation would have been run on: 12
Expensive computation would have been run on: 13
#'user/q
; The potentially-expensive f is only run once per item during a full traversal

Ken Wesson

unread,
Apr 4, 2011, 7:27:59 PM4/4/11
to clo...@googlegroups.com
On Mon, Apr 4, 2011 at 7:07 PM, Ken Wesson <kwes...@gmail.com> wrote:
>> I summarised my issues with the different solutions. Since things got
>> a bit philosophical I wrote up a short blog post on my point of view:
>> http://bit.ly/hInw0J
>
> I'm surprised to hear that partition-by is not very lazy, and will
> hang on an infinite seq. Fortunately, that can be fixed

Oh, and it also doesn't hold onto the head:

user=> (nth (my-partition-by #(quot % 3) (iterate inc 0)) 10000000)
; after a minute or so
(30000000
30000001
30000002)
user=> (def q (iterate inc 0))
#'user/q
user=> (nth (my-partition-by #(quot % 3) q) 10000000)
; chugs for over 15 minutes, and then
#<CompilerException java.lang.OutOfMemoryError: Java heap space
(NO_SOURCE_FILE:93)>

Meikel Brandmeyer

unread,
Apr 5, 2011, 2:27:35 AM4/5/11
to Clojure
Hi,

On 5 Apr., 01:27, Ken Wesson <kwess...@gmail.com> wrote:

> Oh, and it also doesn't hold onto the head:

Your version of partition-by is basically equivalent to:

(defn partition-by
[f coll]
(lazy-seq
(when-let [s (seq coll)]
(let [fst (first s)
value (f fst)
[group tail] (split-with #(-> % f (= value)) (rest s))]
(cons (cons fst group) (partition-by f tail))))))

It traverses the input sequence twice: once for the take-while, once
for the drop-while. And it does "kind of" hold onto the head. It
doesn't really hold the head, but it still holds references to the
items of the first group, because the (drop-while first ..) is not
executed immediately. Only when the rest is actually realised with
seq, the references to the first group are dropped. So it is a weaker
type of holding the head. That's what I meant with "you would hold
unto the head *in some way*." With some numbers the effect is maybe
not visible, but holding some huge data structures without any notice
why, is maybe a problem.

Sincerely
Meikel

Ken Wesson

unread,
Apr 5, 2011, 3:44:32 AM4/5/11
to clo...@googlegroups.com
On Tue, Apr 5, 2011 at 2:27 AM, Meikel Brandmeyer <m...@kotka.de> wrote:
> Hi,
>
> On 5 Apr., 01:27, Ken Wesson <kwess...@gmail.com> wrote:
>
>> Oh, and it also doesn't hold onto the head:
>
> Your version of partition-by is basically equivalent to:
>
> (defn partition-by
>  [f coll]
>  (lazy-seq
>    (when-let [s (seq coll)]
>      (let [fst          (first s)
>            value        (f fst)
>            [group tail] (split-with #(-> % f (= value)) (rest s))]
>        (cons (cons fst group) (partition-by f tail))))))

I don't think so. That one applies f to some elements twice:

Expensive computation would have been run on: 0
Expensive computation would have been run on: 1
Expensive computation would have been run on: 2
Expensive computation would have been run on: 3
Expensive computation would have been run on: 3
Expensive computation would have been run on: 4
Expensive computation would have been run on: 5
Expensive computation would have been run on: 6
Expensive computation would have been run on: 6
Expensive computation would have been run on: 7
Expensive computation would have been run on: 8
Expensive computation would have been run on: 9
Expensive computation would have been run on: 9
Expensive computation would have been run on: 10
Expensive computation would have been run on: 11
Expensive computation would have been run on: 12
Expensive computation would have been run on: 12
Expensive computation would have been run on: 13

> It traverses the input sequence twice: once for the take-while, once
> for the drop-while.

Mine only does an = comparison on the second traversal. If f is
expensive, the extra = comparison gets dominated. If the sequence is
short, the extra traversal doesn't matter much. Only when f is very
cheap and the sequence quite long is it potentially an issue.

> And it does "kind of" hold onto the head. It doesn't really hold the head,
> but it still holds references to the items of the first group, because the
> (drop-while first ..) is not executed immediately.

Neither version has that problem:

user=> (defn heavy-structure [n]
[n (byte-array 100000000)])
user=> (defn memrem [] (doall (map heavy-structure (map print (iterate
inc 0)))))
#'user/memrem
user=> (memrem)
012345678910#<CompilerException java.lang.OutOfMemoryError: Java heap
space (NO_SOURCE_FILE:84)>
; was about 1 GB of heap available
user=> (def q (second (doall (my-partition-by alength (map byte-array
[100000000 200000000])))))
#'user/q
user=> (memrem)
012345678#<CompilerException java.lang.OutOfMemoryError: Java heap
space (NO_SOURCE_FILE:90)>
; 800 MB free
user=> (def q (second (doall (meikel-partition-by alength (map
byte-array [100000000 200000000])))))
#'user/q
user=> (memrem)
012345678#<CompilerException java.lang.OutOfMemoryError: Java heap
space (NO_SOURCE_FILE:92)>
; 800 MB free

Both of them are holding onto the second, 200MB byte array but not the
first (or memrem would stop at 7). The second array was realized to
compare its length with the first when finding the new group, and is
still referenced as it is a member of the second group, but the first
group and first array are not held onto.

user=> (def q (nth (first (my-partition-by #(quot (first %) 3) (map
heavy-structure (iterate inc 0)))) 2))
#'user/q
user=> (memrem)
0123456789#<CompilerException java.lang.OutOfMemoryError: Java heap
space (NO_SOURCE_FILE:99)>
user=> (def q (nth (first (meikel-partition-by #(quot (first %) 3)
(map heavy-structure (iterate inc 0)))) 2))
#'user/q
user=> (memrem)
0123456789#<CompilerException java.lang.OutOfMemoryError: Java heap
space (NO_SOURCE_FILE:101)>

This time we generate groups of three 100MB structures and save the
third item of the first group. The memory use this causes is 100MB, as
expected if only that third item is being held onto, and not the head
of the first group. If the head of the first group were being held
while we're deeper into the first group but haven't realized the start
of the second, we'd have 300MB of memory use instead of 100 and memrem
would have crapped out at 7.

And this:

user=> (def q (nth (first (my-partition-by #(quot (first %) 3) (map
heavy-structure (map #(doto % println) (iterate inc 0))))) 2))
0
1
2
#'user/q
user=> (def q (nth (first (x-partition-by #(quot (first %) 3) (map
heavy-structure (map #(doto % println) (iterate inc 0))))) 2))
0
1
2
#'user/q

proves that it's only realizing up to the third item in this scenario;
it really is not realizing the fourth, and the start of the second
group. Yet it's already let go of the head of the first group. If I
understand your claim correctly, it's therefore wrong. Certainly these
tests seem to show it not holding onto extravagantly-large objects any
longer than necessary.

The only awkwardness is that it does eagerly realize the first element
of each group when that group itself is realized as an element of the
outer seq, and I don't really see any easy way to avoid that; and on
that score, both of our partition-bys are vastly superior to the
clojure.core 1.2 version:

user=> (def q (second (partition-by #(quot % 3) (report-seq))))
">0<"
">1<"
">2<"
">3<"

">4<"
">5<"
">6<"
#'user/q

As you can see, that one realizes the entire second group as soon as
the second group itself is realized as an element of the outer seq.
The inner seqs are completely eager. Which I believe is the issue that
motivated this in the first place. :)

So, yours appears to beat clojure.core 1.2's by a large margin, and
mine yours by a smaller one (as it holds onto and realizes nothing
longer or sooner than yours does, and avoids one extra application of
f per group). :)

Meikel Brandmeyer

unread,
Apr 5, 2011, 6:10:17 AM4/5/11
to Clojure
Hi,

Disclaimer: believing this kind of benchmark. No clue whether the
following is really valid.

On 5 Apr., 09:44, Ken Wesson <kwess...@gmail.com> wrote:

> If I understand your claim correctly, it's therefore wrong.

Since it's correct, you don't understand my claim correctly. The
numbers in between are the memrem runs.

Using your partition-by.

user=> (do ...) ; long command snipped
----> Start empty.
----> (def s nil)
0
1
2
3
4
5
6
----> (def s (partition-by alength (map byte-array [100000000
100000000 100000000 200000000])))
0
1
2
3
4
5
6
----> Nothing realised, yet. Get first group.
----> (first s)
0
1
----> Mem used. Drop head, get rest.
----> (def s (rest s))
0
1
----> Head is hold? Realise rest.
----> (def s (seq s))
0
1
2
3
4
----> References gone. Memory freed.
nil

Using core partition-by:

user=> (do ...) ; long command snipped
----> Start empty.
----> (def s nil)
0
1
2
3
4
5
6
----> (def s (clojure.core/partition-by alength (map byte-array
[100000000 100000000 100000000 200000000])))
0
1
2
3
4
5
6
----> Nothing realised, yet. Get first group.
----> (first s)
0
1
----> Mem used. Drop head, get rest.
----> (def s (rest s))
0
1
----> Head is hold? Realise rest.
----> (def s (seq s))
0
1
----> Huh?
nil

This was surprising and turned out to be a bug in core partition-by.
Using fixed core partition-by:

user=> (do ...) ; long command snipped
----> Start empty.
----> (def s nil)
0
1
2
3
4
5
6
----> (def s (fixed-core-partition-by alength (map byte-array
[100000000 100000000 100000000 200000000])))
0
1
2
3
4
5
6
----> Nothing realised, yet. Get first group.
----> (first s)
0
1
----> Mem used. Drop head, get rest.
----> (def s (rest s))
0
1
2
3
4
----> Note that head is *not* hold, references to first are dropped.
nil

Interesting where such discussions can lead: it helped to uncover an
issue on an edge case with core partition-by.

Sincerely
Meikel

Ken Wesson

unread,
Apr 5, 2011, 7:08:01 AM4/5/11
to clo...@googlegroups.com
>> If I understand your claim correctly, it's therefore wrong.
>
> Since it's correct, you don't understand my claim correctly.

That is rather rude and arrogant.

> Using your partition-by.
[snip]

This seems to contradict the test results I posted earlier. Perhaps it
is behaving differently on your system -- what is your Clojure
version? Maybe some issue with locals clearing?

In any event, where in my code are you claiming a reference is being
held? If you continue to stand by your claim, point to the exact spot.

Meikel Brandmeyer

unread,
Apr 5, 2011, 8:09:52 AM4/5/11
to Clojure
Hi,

On 5 Apr., 13:08, Ken Wesson <kwess...@gmail.com> wrote:

> This seems to contradict the test results I posted earlier. Perhaps it
> is behaving differently on your system -- what is your Clojure
> version? Maybe some issue with locals clearing?

Clojure 1.2 running on Windows with JDK 1.6.0_24.

> In any event, where in my code are you claiming a reference is being
> held? If you continue to stand by your claim, point to the exact spot.

(defn partition-by [f s]
(lazy-seq
(let [s (seq s)]
(if s
(let [fs (map f s)
eq (cons true (map = fs (rest fs)))
eqs (seq (map list eq s))
step (fn step [eqs]
(lazy-seq
(let [eqs (seq eqs)]
(if eqs
(cons
(lazy-seq
(cons (second (first eqs))
(map second (take-while first (rest
eqs)))))
(step (drop-while first (rest
eqs))))))))] ; <- The drop-while marks the spot. There is a call to
seq missing here.
(step eqs))))))

Sincerely
Meikel

Ken Wesson

unread,
Apr 5, 2011, 8:25:18 AM4/5/11
to clo...@googlegroups.com
On Tue, Apr 5, 2011 at 8:09 AM, Meikel Brandmeyer <m...@kotka.de> wrote:
> (defn partition-by [f s]
>  (lazy-seq
>    (let [s (seq s)]
>      (if s
>        (let [fs (map f s)
>              eq (cons true (map = fs (rest fs)))
>              eqs (seq (map list eq s))
>              step (fn step [eqs]
>                     (lazy-seq
>                       (let [eqs (seq eqs)]
>                         (if eqs
>                           (cons
>                             (lazy-seq
>                               (cons (second (first eqs))
>                                 (map second (take-while first (rest
> eqs)))))
>                             (step (drop-while first (rest
> eqs))))))))] ; <- The drop-while marks the spot. There is a call to
> seq missing here.
>          (step eqs))))))

I don't think so. I just tested it and adding a call to seq there
causes it to retain more data in memory, longer, not the other way
around.

Meikel Brandmeyer

unread,
Apr 5, 2011, 11:32:00 AM4/5/11
to Clojure
Hi,

On 5 Apr., 14:25, Ken Wesson <kwess...@gmail.com> wrote:

> I don't think so. I just tested it and adding a call to seq there
> causes it to retain more data in memory, longer, not the other way
> around.

I don't understand this from a logical point of view. (seq (drop-
while ...)) can only *remove* references, not *add* them.

In my understanding placing seq there places you at the cross roads.
With seq, you get (ideal) core partition-by: eager groups, no
references to head group in the rest. Without seq, you get: lazy
groups, references to head group in the rest. In the latter case the
references only go away with realisation of the rest. I lean out of
the window and claim: you can't have both—lazy groups and no
references to head group items in the rest.

Using your benchmark approach again:

user=> (do ...) ; lengthy command snipped
----> Start empty.
----> (def s nil)
0
1
2
3
4
5
----> (def s (partition-by-with-seq alength (map byte-array [100000000
100000000 100000000 200000000])))
0
1
2
3
4
5
----> Drop head, get rest.
----> (def s (rest s))
0
1
2
3
----> No references. Memory freed.
nil

user=> (do ...) ; lengthy command snipped
----> Start empty.
----> (def s nil)
0
1
2
3
4
5
----> (def s (partition-by-without-seq alength (map byte-array
[100000000 100000000 100000000 200000000])))
0
1
2
3
4
5
----> Drop head, get rest.
----> (def s (rest s))
0
----> Head group items are hold? Realise rest.
----> (def s (seq s))
0
1
2
3
----> References gone. Memory freed.
nil

This is inline with my expectations purely from the definition (or
rather docstring) of drop-while.

Sincerely
Meikel

Reply all
Reply to author
Forward
0 new messages