(defvar- sentinel (Object.))
(defn take-by [f coll]
(let [fs (map f coll)
ps (map = fs (rest fs))
zs (map #(if %1 %2 sentinel) ps coll)]
(take-while (partial not= sentinel) zs))
user=> (take-by #(mod % 3) [1 4 1 7 34 16 10 2 99 103 42])
(1 4 1 7 34 16)
I leave drop-by as an exercise for the reader, but the punch line here
is (map foo s (rest s)). :)
Actually, this should include the 10 after the 16. It needs to be
(defn take-by [f coll]
(let [fs (map f coll)
ps (map = fs (rest fs))
zs (map #(if %1 %2 sentinel) ps (rest coll))]
(cons (first coll) (take-while (partial not= sentinel) zs))))
(defn take-by [f coll]
(lazy-seq
(when-let [s (seq coll)]
(let [x (first s)
y (second s)]
(if (= (f x) (f y))
(cons x (take-by f (rest s)))
(list x))))))
Kind Regards
Andreas
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
--
"Test-driven Dentistry (TDD!) - Not everything should be test driven"
- Michael Fogus
--
**********************************************************
Andreas Koestler, Software Engineer
Leica Geosystems Pty Ltd
270 Gladstone Road, Dutton Park QLD 4102
Main: +61 7 3891 9772 Direct: +61 7 3117 8808
Fax: +61 7 3891 9336
Email: andreas....@leica-geosystems.com
************www.leica-geosystems.com*************
when it has to be right, Leica Geosystems
Please consider the environment before printing this email.
(defn take-by [f coll]
(cons (first coll)
(for [[x y]
(map (fn [x y] [x y]) coll (rest coll))
:while (= (f x) (f y))] x)))
Cheers
Andreas
On 03/04/2011, at 2:21 PM, Ken Wesson wrote:
What's your reason for using (fn [x y] [x y]) here instead of just
vector? Is it more efficient because it isn't variable-arity?
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
I don't. :)
(defn drop-by [f coll]
(let [fs (map f coll)
ps (map = fs (rest fs))
zs (map list ps (rest coll))]
(map second (drop-while first zs))))
user=> (drop-by #(mod % 3) [1 4 1 7 34 16 10 2 99 103 42])
(2 99 103 42)
I especially like the symmetry of using take-while for the one and
drop-while for the other, under the hood.
"Less" involved?
The former is shorter and, IMO, a bit clearer. The lattter though is
probably more efficient at runtime. So the answer is really "it
depends".
The partition-by implementations are even shorter and clearer, of
course, but I thought it would be instructive to show a way of
implementing that general type of thing (depending on successive
elements of a single seq) without using an existing similar function
(e.g. partition-by) -- where does the first such function come from?
-- and using the basic sequence operations like map, filter, and
reduce.
The more versatilely you can use those sequence functions to solve
problems, the better a Clojure programmer you are, quite frankly. And
so ultimately having all the different variations posted here helps
everyone reading this thread. There are even a couple of interesting
bugs and their fixes in this thread. :)
It's also true that the more you look at and understand code like
what's in this thread, the more you can get into the "functional"
mind-set, making patterns like (map foo x (rest x)) jump quickly to
mind (as well as possibly higher-level functions like partition-by)
when you see certain kinds of problem (like depending on successive
members of a seq).
Of course, part of thinking functionally includes thinking how
patterns like that can be turned into useful abstractions, too. For
example, to get groups of n consecutive elements in general. You could
use (map vector x (rest x)) to get all the pairs of successive
elements; then again we also have (partition 2 1 x) and the general
(partition n 1 x) for this.
What about generalizing (map foo x (rest x) (rest (rest x)) ...)?
Well, the obvious is (map (partial apply foo) (partition n 1 x))) for
this. You could grab this and define it as a function if you find you
need it much:
(defn map-nextn [f n s]
(map (partial apply f) (partition n 1 s)))
How would you implement this without partition? One possibility would be
(defn map-nextn [f n s]
(apply map f (take n (iterate rest s))))
which, interestingly, is actually two characters *shorter*, though I
wouldn't say it was clearer. :)
Cheers.
I'm surprised to hear that partition-by is not very lazy, and will
hang on an infinite seq. Fortunately, that can be fixed:
(defn partition-by [f s]
(lazy-seq
(let [s (seq s)]
(if s
(let [fs (map f s)
eq (cons true (map = fs (rest fs)))
eqs (seq (map list eq s))
step (fn step [eqs]
(lazy-seq
(let [eqs (seq eqs)]
(if eqs
(cons
(lazy-seq
(cons (second (first eqs))
(map second (take-while first (rest eqs)))))
(step (drop-while first (rest eqs))))))))]
(step eqs))))))
A bit long and convoluted, with no fewer than three applications of
lazy-seq, but fully lazy:
user=> (defn report-seq
[]
(letfn [(this
[i]
(lazy-seq
(prn (str ">" i "<"))
(cons i (this (inc i)))))]
(take 14 (this 0))))
#'user/report-seq
user=> (def q (partition-by #(quot % 3) (report-seq)))
#'user/q
; No elements realized
user=> (def q (first (partition-by #(quot % 3) (report-seq))))
">0<"
#'user/q
; Had to realize 0 to know whether to return nil or a lazy-seq object here
user=> (def q (first (first (partition-by #(quot % 3) (report-seq)))))
">0<"
#'user/q
; Realized only the element it returned
user=> (def q (second (partition-by #(quot % 3) (report-seq))))
">0<"
">1<"
">2<"
">3<"
; Realized up to the first element of the next chunk, to know if that
; chunk existed
user=> (partition-by #(quot % 3) (report-seq))
">0<"
((0">1<"
1">2<"
2">3<"
) (3">4<"
4">5<"
5">6<"
) (6">7<"
7">8<"
8">9<"
) (9">10<"
10">11<"
11">12<"
) (12">13<"
13))
; Each element realized only as it's needed during a full traversal
user=> (take 5 (partition-by #(quot % 3) (iterate inc 0)))
((0 1 2)
(3 4 5)
(6 7 8)
(9 10 11)
(12 13 14))
; Does not blow up on infinite input seq
user=> (defn report-fn [x] (println "Expensive computation would have
been run on: " x) (quot x 3))
#'user/report-fn
user=> (def q (doall (map doall (partition-by report-fn (range 14)))))
Expensive computation would have been run on: 0
Expensive computation would have been run on: 1
Expensive computation would have been run on: 2
Expensive computation would have been run on: 3
Expensive computation would have been run on: 4
Expensive computation would have been run on: 5
Expensive computation would have been run on: 6
Expensive computation would have been run on: 7
Expensive computation would have been run on: 8
Expensive computation would have been run on: 9
Expensive computation would have been run on: 10
Expensive computation would have been run on: 11
Expensive computation would have been run on: 12
Expensive computation would have been run on: 13
#'user/q
; The potentially-expensive f is only run once per item during a full traversal
Oh, and it also doesn't hold onto the head:
user=> (nth (my-partition-by #(quot % 3) (iterate inc 0)) 10000000)
; after a minute or so
(30000000
30000001
30000002)
user=> (def q (iterate inc 0))
#'user/q
user=> (nth (my-partition-by #(quot % 3) q) 10000000)
; chugs for over 15 minutes, and then
#<CompilerException java.lang.OutOfMemoryError: Java heap space
(NO_SOURCE_FILE:93)>
I don't think so. That one applies f to some elements twice:
Expensive computation would have been run on: 0
Expensive computation would have been run on: 1
Expensive computation would have been run on: 2
Expensive computation would have been run on: 3
Expensive computation would have been run on: 3
Expensive computation would have been run on: 4
Expensive computation would have been run on: 5
Expensive computation would have been run on: 6
Expensive computation would have been run on: 6
Expensive computation would have been run on: 7
Expensive computation would have been run on: 8
Expensive computation would have been run on: 9
Expensive computation would have been run on: 9
Expensive computation would have been run on: 10
Expensive computation would have been run on: 11
Expensive computation would have been run on: 12
Expensive computation would have been run on: 12
Expensive computation would have been run on: 13
> It traverses the input sequence twice: once for the take-while, once
> for the drop-while.
Mine only does an = comparison on the second traversal. If f is
expensive, the extra = comparison gets dominated. If the sequence is
short, the extra traversal doesn't matter much. Only when f is very
cheap and the sequence quite long is it potentially an issue.
> And it does "kind of" hold onto the head. It doesn't really hold the head,
> but it still holds references to the items of the first group, because the
> (drop-while first ..) is not executed immediately.
Neither version has that problem:
user=> (defn heavy-structure [n]
[n (byte-array 100000000)])
user=> (defn memrem [] (doall (map heavy-structure (map print (iterate
inc 0)))))
#'user/memrem
user=> (memrem)
012345678910#<CompilerException java.lang.OutOfMemoryError: Java heap
space (NO_SOURCE_FILE:84)>
; was about 1 GB of heap available
user=> (def q (second (doall (my-partition-by alength (map byte-array
[100000000 200000000])))))
#'user/q
user=> (memrem)
012345678#<CompilerException java.lang.OutOfMemoryError: Java heap
space (NO_SOURCE_FILE:90)>
; 800 MB free
user=> (def q (second (doall (meikel-partition-by alength (map
byte-array [100000000 200000000])))))
#'user/q
user=> (memrem)
012345678#<CompilerException java.lang.OutOfMemoryError: Java heap
space (NO_SOURCE_FILE:92)>
; 800 MB free
Both of them are holding onto the second, 200MB byte array but not the
first (or memrem would stop at 7). The second array was realized to
compare its length with the first when finding the new group, and is
still referenced as it is a member of the second group, but the first
group and first array are not held onto.
user=> (def q (nth (first (my-partition-by #(quot (first %) 3) (map
heavy-structure (iterate inc 0)))) 2))
#'user/q
user=> (memrem)
0123456789#<CompilerException java.lang.OutOfMemoryError: Java heap
space (NO_SOURCE_FILE:99)>
user=> (def q (nth (first (meikel-partition-by #(quot (first %) 3)
(map heavy-structure (iterate inc 0)))) 2))
#'user/q
user=> (memrem)
0123456789#<CompilerException java.lang.OutOfMemoryError: Java heap
space (NO_SOURCE_FILE:101)>
This time we generate groups of three 100MB structures and save the
third item of the first group. The memory use this causes is 100MB, as
expected if only that third item is being held onto, and not the head
of the first group. If the head of the first group were being held
while we're deeper into the first group but haven't realized the start
of the second, we'd have 300MB of memory use instead of 100 and memrem
would have crapped out at 7.
And this:
user=> (def q (nth (first (my-partition-by #(quot (first %) 3) (map
heavy-structure (map #(doto % println) (iterate inc 0))))) 2))
0
1
2
#'user/q
user=> (def q (nth (first (x-partition-by #(quot (first %) 3) (map
heavy-structure (map #(doto % println) (iterate inc 0))))) 2))
0
1
2
#'user/q
proves that it's only realizing up to the third item in this scenario;
it really is not realizing the fourth, and the start of the second
group. Yet it's already let go of the head of the first group. If I
understand your claim correctly, it's therefore wrong. Certainly these
tests seem to show it not holding onto extravagantly-large objects any
longer than necessary.
The only awkwardness is that it does eagerly realize the first element
of each group when that group itself is realized as an element of the
outer seq, and I don't really see any easy way to avoid that; and on
that score, both of our partition-bys are vastly superior to the
clojure.core 1.2 version:
user=> (def q (second (partition-by #(quot % 3) (report-seq))))
">0<"
">1<"
">2<"
">3<"
">4<"
">5<"
">6<"
#'user/q
As you can see, that one realizes the entire second group as soon as
the second group itself is realized as an element of the outer seq.
The inner seqs are completely eager. Which I believe is the issue that
motivated this in the first place. :)
So, yours appears to beat clojure.core 1.2's by a large margin, and
mine yours by a smaller one (as it holds onto and realizes nothing
longer or sooner than yours does, and avoids one extra application of
f per group). :)
That is rather rude and arrogant.
> Using your partition-by.
[snip]
This seems to contradict the test results I posted earlier. Perhaps it
is behaving differently on your system -- what is your Clojure
version? Maybe some issue with locals clearing?
In any event, where in my code are you claiming a reference is being
held? If you continue to stand by your claim, point to the exact spot.
I don't think so. I just tested it and adding a call to seq there
causes it to retain more data in memory, longer, not the other way
around.