filter-split

4 views
Skip to first unread message

David Sletten

unread,
Mar 7, 2009, 11:29:26 PM3/7/09
to clo...@googlegroups.com
I'm reading the Sequences chapter of Programming Clojure, and Stu
points out that split-with combines the semantics of take-while and
drop-while. But is there a function that does something similar with
"filter"? Namely, rather than simply filtering the elements of a
collection that satisfy a predicate I also want to capture those that
don't. Something like this:
(defn filter-split [pred coll]
(loop [trues '() falses '() coll coll]
(cond (empty? coll)
(vector (reverse trues) (reverse falses))
(pred (first coll))
(recur (cons (first coll) trues) falses (rest coll))
:else
(recur trues (cons (first coll) falses) (rest coll)))))

(filter-split #{\a\e\i\o\u} "is this not pung?") => [(\i \i \o \u)
(\s \space \t \h \s \space \n \t \space \p \n \g \?)]
(filter-split even? (range 10)) => [(0 2 4 6 8) (1 3 5 7 9)]

Aloha,
David Sletten

Jeffrey Straszheim

unread,
Mar 7, 2009, 11:37:04 PM3/7/09
to clo...@googlegroups.com
There is separate in seq_utils in contrib.

e

unread,
Mar 7, 2009, 11:44:25 PM3/7/09
to clo...@googlegroups.com
check the discussion with the subject, "time lies, even with doall".  We came up with something like the following, but some name change change tweaks were suggested.  This thing takes a pred and a collection and returns a list of two collections -- one that passes the pred, and one that fails.

(defn filt-rem [pred coll]
   (loop [l1 () l2 () [f & r] coll]
     (if f
       (if (pred f)
         (recur (conj l1 f) l2 r)
         (recur l1 (conj l2 f) r))
       (list l1 l2))))

Adrian Cuthbertson

unread,
Mar 8, 2009, 12:17:07 AM3/8/09
to clo...@googlegroups.com
That's the beauty of this language - there are many ways to skin the cat!
Here's a version using reduce...

(defn filt-split [pred col]
(reduce (fn [[a b] x] (if (pred x) [(conj a x) b] [a (conj b x)]))
[[] []] col))

(filt-split even? [1 2 3 4 5 6 7 8])
[[2 4 6 8] [1 3 5 7]]

But when you look at separate in clojure.contrib.seq-utils its simple
and elegant;
(defn separate [f s]
[(filter f s) (filter (complement f) s)])

Rgds, Adrian.

David Sletten

unread,
Mar 8, 2009, 12:53:34 AM3/8/09
to clo...@googlegroups.com

On Mar 7, 2009, at 7:17 PM, Adrian Cuthbertson wrote:

>
> That's the beauty of this language - there are many ways to skin
> the cat!

Hmmm...I'm not sure what I'll do with a skinless cat. :)

> Here's a version using reduce...
>
> (defn filt-split [pred col]
> (reduce (fn [[a b] x] (if (pred x) [(conj a x) b] [a (conj b x)]))
> [[] []] col))
>
> (filt-split even? [1 2 3 4 5 6 7 8])
> [[2 4 6 8] [1 3 5 7]]
>

I like that a lot. As long as the repeated creation of the ephemeral
vectors isn't too expensive.

> But when you look at separate in clojure.contrib.seq-utils its simple
> and elegant;
> (defn separate [f s]
> [(filter f s) (filter (complement f) s)])
>

This is exactly what I'm trying to avoid. I don't want to traverse
the collection twice.

Aloha,
David Sletten

David Sletten

unread,
Mar 8, 2009, 1:00:41 AM3/8/09
to clo...@googlegroups.com

On Mar 7, 2009, at 6:44 PM, e wrote:

> check the discussion with the subject, "time lies, even with
> doall". We came up with something like the following, but some
> name change change tweaks were suggested. This thing takes a pred
> and a collection and returns a list of two collections -- one that
> passes the pred, and one that fails.
>
> (defn filt-rem [pred coll]
> (loop [l1 () l2 () [f & r] coll]
> (if f
> (if (pred f)
> (recur (conj l1 f) l2 r)
> (recur l1 (conj l2 f) r))
> (list l1 l2))))
>
>

This is almost identical to what I posted. However, is this the
intended behavior?
(filt-rem identity '(true nil false 8)) => ((true) ())
(filt-split identity '(true nil false 8)) => [[true 8] [nil false]]

(filt-rem even? (range 10)) => ((8 6 4 2 0) (9 7 5 3 1))


(filter-split even? (range 10)) => [(0 2 4 6 8) (1 3 5 7 9)]

> (defn filter-split [pred coll]


> (loop [trues '() falses '() coll coll]
> (cond (empty? coll)
> (vector (reverse trues) (reverse falses))
> (pred (first coll))
> (recur (cons (first coll) trues) falses (rest coll))
> :else
> (recur trues (cons (first coll) falses) (rest coll)))))

Aloha,
David Sletten

Adrian Cuthbertson

unread,
Mar 8, 2009, 1:20:20 AM3/8/09
to clo...@googlegroups.com
> ...repeated creation of the ephemeral vectors isn't too expensive.

With Clojure, although it looks like the immutable vectors are being
re-created on each iteration, under the covers it's really just
pointers being updated and the operations are (about) as efficient as
a loop/recur method.

> ...I don't want to traverse the collection twice.

Yes, I guess that even though each filter clause is lazy they each
will pass through the entire collection once.

Laurent PETIT

unread,
Mar 8, 2009, 4:29:09 AM3/8/09
to clo...@googlegroups.com
It seems to me that neither filt-split nor filter-rem from e are lazy operations (one uses reduce, the other one uses recur).
The version in clojurecontrib seems to preserve the original property of filter of returning a lazy sequence.

My 0,02€,

--
Laurent

2009/3/8 Adrian Cuthbertson <adrian.cu...@gmail.com>

That's the beauty of this language - there are many ways to skin the cat!
Here's a version using reduce...

(defn filt-split [pred col]
 (reduce (fn [[a b] x] (if (pred x) [(conj a x) b] [a (conj b x)]))
[[] []] col))

(filt-split even? [1 2 3 4 5 6 7 8])
[[2 4 6 8] [1 3 5 7]]

But when you look at separate in clojure.contrib.seq-utils its simple
and elegant;
(defn separate [f s]
 [(filter f s) (filter (complement f) s)])
Rgds, Adrian.
- Afficher le texte des messages précédents -

Adrian Cuthbertson

unread,
Mar 8, 2009, 6:11:00 AM3/8/09
to clo...@googlegroups.com
You're absolutely right...

user=> (time (let [[a b] (separate even? (range 1000000))] (nth a 3)))
"Elapsed time: 0.115 msecs"
6
user=> (time (let [[a b] (filt-split even? (range 1000000))] (nth a 3)))
"Elapsed time: 413.614 msecs"
6

and is also more efficient over large sequences...

(time (let [[a b] (filt-split even? (range 100000))] (nth a 49999)))
"Elapsed time: 44.64 msecs"
99998
user=> (time (let [[a b] (separate even? (range 100000))] (nth a 49999)))
"Elapsed time: 24.042 msecs"
99998

I guess there's only one way to skin this cat :-)

Adrian Cuthbertson

unread,
Mar 8, 2009, 6:29:17 AM3/8/09
to clo...@googlegroups.com
Sorry, further to that last example, if you actually consume all of
both even and odd sets then the reduce version is more efficient...

(time (let [[a b] (filt-split even? (range 100000))] [(nth a 49999)
(nth b 49999)]))
"Elapsed time: 36.711 msecs"
[99998 99999]

(time (let [[a b] (separate even? (range 100000))] [(nth a 49999) (nth
b 49999)]))
"Elapsed time: 67.004 msecs"
[99998 99999]

Christophe Grand

unread,
Mar 8, 2009, 6:37:19 AM3/8/09
to clo...@googlegroups.com
The question showed up the other day on #clojure with the additional
constraint to evaluate pred only once per item, here is Rich's solution:

http://paste.lisp.org/display/76458#1

(defn unzip-with [pred coll]
(let [pvs (map #(vector (pred %) %) coll)]
[(for [[p v] pvs :when p] v)
(for [[p v] pvs :when (not p)] v)]))


Christophe

David Sletten a écrit :
--
Professional: http://cgrand.net/ (fr)
On Clojure: http://clj-me.blogspot.com/ (en)


Adrian Cuthbertson

unread,
Mar 8, 2009, 6:44:06 AM3/8/09
to clo...@googlegroups.com
Hmm, on the same (micro admittedly) benchmark as above...

(time (let [[a b] (unzip-with even? (range 100000))] [(nth a 49999)
(nth b 49999)]))
"Elapsed time: 177.797 msecs"
[99998 99999]

that's a bit slower than both the previous versions. The reduce
version does only apply the pred once per item I think?

e

unread,
Mar 8, 2009, 7:10:17 AM3/8/09
to clo...@googlegroups.com

(filt-rem identity '(true nil false 8)) => ((true) ())
(filt-split identity '(true nil false 8)) => [[true 8] [nil false]]

can't speak for this one, cause I don't know enough clojure
 

(filt-rem even? (range 10)) => ((8 6 4 2 0) (9 7 5 3 1))
(filter-split even? (range 10)) => [(0 2 4 6 8) (1 3 5 7 9)]

yeah, we had variations that preserved order, too .... but that reduce one looks even cooler, anyway.




e

unread,
Mar 8, 2009, 7:21:45 AM3/8/09
to clo...@googlegroups.com
This is exactly what I'm trying to avoid. I don't want to traverse
the collection twice.

In that other thread, "Time lies, even with doall", someone helped me figure out a way to get the true time for filter-split, and concluded it was 2- 3 times faster than whats in contrib as expected . . . .you just can't time it, outright for some reason ... so that's a gotcha.

Meikel Brandmeyer

unread,
Mar 8, 2009, 7:48:33 AM3/8/09
to clo...@googlegroups.com
Hi,

Am 08.03.2009 um 11:44 schrieb Adrian Cuthbertson:

> that's a bit slower than both the previous versions. The reduce
> version does only apply the pred once per item I think?

unzip-with is lazy, the reduce version is not. I would
prefer laziness over speed.

Sincerely
Meikel

e

unread,
Mar 8, 2009, 7:53:29 AM3/8/09
to clo...@googlegroups.com

they both probably have their places -- lazy and not.  With the non-lazy one, someone can put the laziness at a higher level and batch in blocks, no?


Sincerely
Meikel


Reply all
Reply to author
Forward
0 new messages