map/filter/remove etc. change underlying structure

539 views
Skip to first unread message

Colin Yates

unread,
Sep 9, 2016, 6:23:37 AM9/9/16
to Clojure
Hi all,

So in the spirit of exposing my ignorance to the internet :-), I have just been bitten by a bug due to the behaviour of the core libraries which I find really surprising:

(def v [1 2 3])
(conj v 4) => [1 2 3 4]
(conj (map identity v) 4) => (4 1 2 3)
(conj (remove (constantly false) v) 4) => (4 1 2 3)
(conj (filter identity v) 4) => (4 1 2 3)

In other words, I was relying on map, remove and filter preserving the semantics (other than laziness) of the structure of the input, give it a vector and you get a vector-like lazy sequence. This turns out not to be the case.

Now, I know there is mapv which returns a vector but why isn't there a removev and a filterv etc.?

What makes it more onerous for me is the fact conj states that its behaviour differs depending on the concrete type, which is great, but how am I supposed to know which concrete type is returned from map|filter|remove? My assumption was it would be semantically equivalent to the input (i.e. a vector in this case).

The reason I have dodged this is because I don't frequently rely on vector semantics but I am surprised this isn't better documented?

Is it me?

Thanks,

Colin

James Reeves

unread,
Sep 9, 2016, 7:32:37 AM9/9/16
to clo...@googlegroups.com
I find the current behaviour to be perfectly intuitive. I think it would be unnecessarily complex and confusing to have seqs behave differently depending on what data structure they were originally derived from.

- James

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+unsubscribe@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mark Engelberg

unread,
Sep 9, 2016, 7:36:17 AM9/9/16
to clojure
Everything from the Clojure cheatsheet's "Seq in Seq out" section processes the input as a sequence (ignoring its concrete type) and always returns a lazy sequence.  When you pass in a vector v, the very first thing these functions typically do is call `seq` on it, and they process the input using first/next/rest.

I'm not really sure what a "lazy-like vector" would look like.  Nothing like that exists within the set of core Clojure datatypes and no functions return anything like that.

You can use `into` to "pour" the sequence into the collection of your choice.  If you're using `into`, then most of these sequence functions support transducers to avoid allocation of intermediate sequences, providing a speed boost.

Mark Engelberg

unread,
Sep 9, 2016, 7:36:58 AM9/9/16
to clojure
Scala behaves more like your intuition, generally assuming you want back the same kind of collection as what you passed in.  It can be a bit of a pain, though, when that's *not* the behavior you want.  Clojure's way puts you in control by always producing a sequence and letting you put it into the collection of your choice.

Mamun

unread,
Sep 9, 2016, 7:49:59 AM9/9/16
to Clojure
To me, Changing type or order is a lack of facility for basic task. 
In the end comping task is also become more hard. 
Have you tried to use Specter? Why do you not consider Specter lib? 


Br,
Mamun

Colin Yates

unread,
Sep 9, 2016, 8:03:45 AM9/9/16
to clo...@googlegroups.com
I did look at Specter and it looks nice and well engineered, but I never really ran into the sorts of problem it solves, at least not enough to warrant the cost of depending on a new library.
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
For more options, visit this group at
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.

Alex Miller

unread,
Sep 9, 2016, 8:11:44 AM9/9/16
to Clojure

Stuart Sierra

unread,
Sep 9, 2016, 8:36:03 AM9/9/16
to Clojure
Functions like map/filter/remove are not "collection functions" but rather "sequence functions." The collection functions like conj preserve the type of their argument. Sequence functions, by contrast, coerce any argument to a sequence, and always return a sequence.

Since Clojure 1.7, transducers provide a convenient way to compose sequence-like operations but produce a non-sequence as a result:

user=> (into [] (comp (map inc) (filter even?)) [1 2 3 4 5])
[2 4 6]

mapv and filterv were convenience functions added before the introduction of transducers, as I recall.
–S

Alan Thompson

unread,
Sep 9, 2016, 12:36:22 PM9/9/16
to clo...@googlegroups.com
Hi Colin,

I too have been bitten by this type of inconsistency in clojure.core functions. The root of the problem is that conj has different behavior for lists and vectors, and that a seq behaves like a list. When map, filter, etc convert the source vector into a seq, the behavior of conj changes accordingly.

In order to avoid this kind of unpredictability, you may wish to explore some of the functions to the Tupelo library. The goal is to make things simpler, more obvious & predictable, and as bulletproof as possible. One example is the append function.  Here is a sample program comparing conj and append:

(ns clj.core
  (:require [tupelo.core :as t] ))
(t/refer-tupelo)

(def v [1 2 3])

(conj v 4)                                  => [1 2 3 4]
(conj (map identity v) 4)                   => (4 1 2 3)
(conj (remove (constantly false) v) 4)      => (4 1 2 3)
(conj (filter identity v) 4)                => (4 1 2 3)

(t/append v 4)                              => [1 2 3 4]
(t/append (map identity v) 4)               => [1 2 3 4]
(t/append (remove (constantly false) v) 4)  => [1 2 3 4]
(t/append (filter identity v) 4)            => [1 2 3 4]

I think simpler and more bulletproof functions can go a long toward making Clojure easier to use, especially for beginners or when you are uncertain about the exact type of a parameter.

I've pasted the relevant part of the README.txt below.  Enjoy!

Alan

---------------------------------------------------------------------------------------------------------------------------------------------

Adding Values to the Beginning or End of a Sequence

Clojure has the consconj, and concat functions, but it is not obvious how they should be used to add a new value to the beginning of a vector or list:

; Add to the end
> (concat [1 2] 3)    ;=> IllegalArgumentException
> (cons   [1 2] 3)    ;=> IllegalArgumentException
> (conj   [1 2] 3)    ;=> [1 2 3]
> (conj   [1 2] 3 4)  ;=> [1 2 3 4]
> (conj  '(1 2) 3)    ;=> (3 1 2)       ; oops
> (conj  '(1 2) 3 4)  ;=> (4 3 1 2)     ; oops

; Add to the beginning
> (conj     1  [2 3] ) ;=> ClassCastException
> (concat   1  [2 3] ) ;=> IllegalArgumentException
> (cons     1  [2 3] ) ;=> (1 2 3)
> (cons   1 2  [3 4] ) ;=> ArityException
> (cons     1 '(2 3) ) ;=> (1 2 3)
> (cons   1 2 '(3 4) ) ;=> ArityException

These failures are irritating and unproductive, and the error messages don’t make it obvious what went wrong. Instead, use the simple prepend and append functions to add new elements to the beginning or end of a sequence, respectively:

  (append [1 2] 3  )   ;=> [1 2 3  ]
  (append [1 2] 3 4)   ;=> [1 2 3 4]

  (prepend   3 [2 1])  ;=> [  3 2 1]
  (prepend 4 3 [2 1])  ;=> [4 3 2 1]

Both prepend and append always return a vector result.










--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to

For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscribe@googlegroups.com.

Alex Miller

unread,
Sep 9, 2016, 1:04:44 PM9/9/16
to Clojure

On Friday, September 9, 2016 at 11:36:22 AM UTC-5, Alan Thompson wrote:
Hi Colin,

I too have been bitten by this type of inconsistency in clojure.core functions.

I disagree that the problem here is consistency. The core functions are very consistent, but I think it's easy to build an insufficiently detailed mental model of what should happen when you're not aware of the distinction between collection functions (take and return data structures - things like conj, merge, assoc, get) and sequence functions (take and return sequences or really seq-ables - map, filter, etc).
 
The root of the problem is that conj has different behavior for lists and vectors, and that a seq behaves like a list. When map, filter, etc convert the source vector into a seq, the behavior of conj changes accordingly.

In my opinion, the root of the problem is not being aware enough of when you move from working with data structures (like vectors) into working with sequence abstractions. Becoming more aware of the distinction and when those transitions occur is one of the more subtle aspects of learning Clojure.

I wrote this not too long ago on a very similar question on reddit:


In order to avoid this kind of unpredictability,

Just to belabor it, everything here is totally predictable already.
 
you may wish to explore some of the functions to the Tupelo library. The goal is to make things simpler, more obvious & predictable, and as bulletproof as possible. One example is the append function.  Here is a sample program comparing conj and append:

(ns clj.core
  (:require [tupelo.core :as t] ))
(t/refer-tupelo)

(def v [1 2 3])

(conj v 4)                                  => [1 2 3 4]
(conj (map identity v) 4)                   => (4 1 2 3)
(conj (remove (constantly false) v) 4)      => (4 1 2 3)
(conj (filter identity v) 4)                => (4 1 2 3)

As I wrote in the link above, I don't ever write code like this. When working with data in terms of seqs (map,remove,filter) you should be thinking in aggregates not in terms of individual values. Calling conj around a sequence is taking you from level of abstraction down into a lower level. This never comes up when I write Clojure (not exaggerating for effect, it just doesn't). 

I can't suggest an alternative here because the example is too narrow. Occasionally (much less now that we have transducers) I will have data in a seq and want to put it in a collection - into, vec, set are all sufficient to do so. Usually I find that either I can just leave it as a seq and continue OR that I can back up and make a collection instead of a seq in the first place (by using transducers, into, etc).
 
(t/append v 4)                              => [1 2 3 4]
(t/append (map identity v) 4)               => [1 2 3 4]
(t/append (remove (constantly false) v) 4)  => [1 2 3 4]
(t/append (filter identity v) 4)            => [1 2 3 4]


I disagree with everything about this. :) In my opinion you are working against Clojure's strengths in going down this path.
 
I think simpler and more bulletproof functions can go a long toward making Clojure easier to use, especially for beginners or when you are uncertain about the exact type of a parameter.

I think more work on understanding the collection and sequence layers would pay far greater dividends than what you are suggesting.

Rangel Spasov

unread,
Sep 9, 2016, 7:42:12 PM9/9/16
to Clojure
When I first started learning Clojure 3.5 years ago I was "bit" by this in my first month or so of doing Clojure but after spending a little bit of time to understand how the sequence abstraction works it was never a problem again. I agree with everything that Alex says here. 

Mars0i

unread,
Sep 10, 2016, 1:32:49 AM9/10/16
to Clojure


On Friday, September 9, 2016 at 6:36:17 AM UTC-5, puzzler wrote:
...
You can use `into` to "pour" the sequence into the collection of your choice.  If you're using `into`, then most of these sequence functions support transducers to avoid allocation of intermediate sequences, providing a speed boost.

I routinely use `vec` for the kind of case that Colin described.  The effect is the same as `(into [] ...)` but it's more concise and doesn't require that extra tenth of a moment to figure out what kind of thing `into` is sending the sequence into.  I have no idea whether this is more or less efficient than using `into`, however.

A succinct summary of the basic idea implicit or explicit in other answers in this thread:
    Most Clojure sequence functions produce lazy sequences.
    If you want something anything other than a lazy sequence, convert it (with vec, into, etc.).

Bit me, too, but the correct rule is very simple, and easy to remember--which doesn't mean that I always remember to follow it!

Lazy sequences are the Clojure Way.
I love 'em.
And hate them.
It depends.
But they are the Clojure Way.

(Well--until transducers.)
Reply all
Reply to author
Forward
0 new messages