clojure 1.2 seq fn enhancement FAQ

461 views
Skip to first unread message

Stuart Halloway

unread,
Apr 28, 2010, 4:31:28 PM4/28/10
to clo...@googlegroups.com
After review, several seq functions from clojure-contrib have been
promoted to clojure [1], [2], [3]. Hopefully the FAQ below will answer
the major questions you may have:

1. Is this a breaking change to Clojure?
No. Rich is super-careful to grow Clojure by expansion, not by
breaking change.

2. Is this a breaking change to Contrib?
Yes. Contrib is a community-contribution area. Part of its mission is
to germinate ideas that may get promoted to Clojure, and when that
happens contrib users will have to adapt. Usually this will simply
mean removing a require/use, or changing a fn name.

3. Why did some of the fn names change?
Because they were bad names. :-) Clojure is careful to encourage the
use of fns that will have good performance characteristics, and make
it obvious which fns are "seq-y" (therefore O(n)). rand-elt became
rand-nth to make it clear that its performance is bound to the
performance of nth on the underlying collection. Similarly, includes?
became seq-contains? to make clear that the fn must walk the sequence.
Avoid seq-contains? when you have a set or map!

4. Why did some semantics change?
Because they were not the most generally-desirable semantics. The
contrib group-by returned a sorted map, when in most cases sorting is
overkill, so the new core fn returns a plain ol' map.

5. Couldn't you have left the old fns in place in contrib, to avoid
breakage?
No, for multiple reasons. Name collisions are one. But the bigger one
is that the fns in core have been audited by Rich for performance and
correctness. Leaving the old fns in contrib would be a recipe for
lower-performing (or even, in rare cases, buggy) programs.

6. Why did this break my project?
Hopefully it doesn't. But, it seems that a lot of people are tracking
their apps (via leiningen or maven) to build snapshots of both Clojure
and Contrib. This is a risky development practice, particularly in the
case of Contrib. As a community, we need to develop a saner way to
stay near the edge without being right on it.

7. How do I fix it?
Your code gets smaller. Remove requires or uses that refer to promoted
fns. You already have them in core! For the fns whose names have
changed, update the names in your code.

Comments and questions welcome!

Stu

[1] https://www.assembla.com/spaces/clojure/tickets/312-audit---promote-some-clojure-contrib-seq-fns
[2] http://github.com/richhickey/clojure/tree/a55df92faa0c51c634d93a8d991ccfd2638b108b
[3] http://github.com/richhickey/clojure-contrib/commit/78ee9b3e64c5ac6082fb223fc79292175e8e4f0c

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Douglas Philips

unread,
Apr 28, 2010, 5:06:21 PM4/28/10
to clo...@googlegroups.com
On 2010 Apr 28, at 4:31 PM, Stuart Halloway wrote:

> After review, several seq functions from clojure-contrib have been
> promoted to clojure [1], [2], [3]. Hopefully the FAQ below will
> answer the major questions you may have:

Cool!

> 3. Why did some of the fn names change?
> ...Similarly, includes? became seq-contains? to make clear that the
> fn must walk the sequence. Avoid seq-contains? when you have a set
> or map!

Wait, what? Why should seq-contains? be slower than contains? Isn't
this exactly the kind of thing that protocols are supposed to be
solving? Your nice video showed that very thing, right?



> 6. Why did this break my project?
> Hopefully it doesn't. But, it seems that a lot of people are
> tracking their apps (via leiningen or maven) to build snapshots of
> both Clojure and Contrib. This is a risky development practice,
> particularly in the case of Contrib. As a community, we need to
> develop a saner way to stay near the edge without being right on it.

Indeed. Tracking the bleeding edge is always dicey... :-(

Thanks!

-Doug

Meikel Brandmeyer

unread,
Apr 28, 2010, 5:14:14 PM4/28/10
to clo...@googlegroups.com
Hi,

On Wed, Apr 28, 2010 at 05:06:21PM -0400, Douglas Philips wrote:

> Wait, what? Why should seq-contains? be slower than contains? Isn't
> this exactly the kind of thing that protocols are supposed to be
> solving? Your nice video showed that very thing, right?

If you have a map or a set looking up a key is fast. So contains? is
fast (read O(1) resp. something like O(log32 N)). seq-contains? does
a linear search through a sequence. That is O(N). Protocols cannot
change this.

Sincerely
Meikel

Douglas Philips

unread,
Apr 28, 2010, 5:26:46 PM4/28/10
to clo...@googlegroups.com
On 2010 Apr 28, at 5:14 PM, Meikel Brandmeyer wrote:
> If you have a map or a set looking up a key is fast. So contains? is
> fast (read O(1) resp. something like O(log32 N)). seq-contains? does
> a linear search through a sequence. That is O(N). Protocols cannot
> change this.

How is having an optimized version of seq-contains? for sets/maps any
different from what Stuart showed in his video, where reduce can have
a specialization that is faster on some types?

-Doug

Meikel Brandmeyer

unread,
Apr 28, 2010, 5:32:06 PM4/28/10
to clo...@googlegroups.com
Hi,

On Wed, Apr 28, 2010 at 05:26:46PM -0400, Douglas Philips wrote:

> How is having an optimized version of seq-contains? for sets/maps
> any different from what Stuart showed in his video, where reduce can
> have a specialization that is faster on some types?

A faster reduce is still O(N). If you need fast lookup use the correct
data structure. If you do linear search it's gonna be slow, so make it
clear in the code! This makes the developer aware of the implications.

Sincerely
Meikel

Douglas Philips

unread,
Apr 28, 2010, 5:39:37 PM4/28/10
to clo...@googlegroups.com
On 2010 Apr 28, at 5:32 PM, Meikel Brandmeyer wrote:
> On Wed, Apr 28, 2010 at 05:26:46PM -0400, Douglas Philips wrote:
>
>> How is having an optimized version of seq-contains? for sets/maps
>> any different from what Stuart showed in his video, where reduce can
>> have a specialization that is faster on some types?
>
> A faster reduce is still O(N). If you need fast lookup use the correct
> data structure. If you do linear search it's gonna be slow, so make it
> clear in the code! This makes the developer aware of the implications.


Stuart's comment was to not use seq-contains? on maps or sets.
There is no reason that it cannot be the same speed as contains? if a
set or map is passed in.
Wasn't the whole point of having sequences as an abstraction to get
away from all the cruft that traditional lisp had built up around
concrete types? Why go back to that here? Can't protocols handle
optimizing this case?
If some function I call uses seq-contains? (so that it can get all the
wonderful seq-ness abstractness clojure offers) I should have to know
that internal detail and not pass in a map or set? or worse, fail to
get an optimization in that case? C'mon, this should be an easy
protocol win (or maybe I don't really understand the limits on what
protocols can do.)

-Doug

Meikel Brandmeyer

unread,
Apr 28, 2010, 6:06:54 PM4/28/10
to clo...@googlegroups.com
Hi,

On Wed, Apr 28, 2010 at 05:39:37PM -0400, Douglas Philips wrote:

> Stuart's comment was to not use seq-contains? on maps or sets.
> There is no reason that it cannot be the same speed as contains? if
> a set or map is passed in.

Ah, ok. I misunderstood what you were saying. But I think it doesn't
change the argumentation. If seq-contains? was fast on maps and sets,
people would abandon contains? because seq-contains? is "always right":
works on seqs, fast on maps and sets. And again we are in the situation,
that the developer does not make his intentions explicit.

Sincerely
Meikel

Stuart Halloway

unread,
Apr 28, 2010, 6:10:02 PM4/28/10
to clo...@googlegroups.com
The "seq" in "seq-contains?" says "I want a sequential search."
Protocols *could* make this do something different, but that would
violate the contract of this function.

Another way to put it: If you wrote a protocol to pick a fast
implementation based on type, then seq-contains? would be the fn that
the protocol would call if it couldn't find anything faster.

There have to be primitive things somewhere...

Stu

Douglas Philips

unread,
Apr 28, 2010, 6:37:44 PM4/28/10
to clo...@googlegroups.com
On 2010 Apr 28, at 6:10 PM, Stuart Halloway wrote:

> The "seq" in "seq-contains?" says "I want a sequential search."
> Protocols *could* make this do something different, but that would
> violate the contract of this function.

How would having it work faster if passed a set or map violate that?
There is no ordering guarantee for those types. Is this a contract to
be slow? Is there something about the comparison being done by seq-
contains? that is different from contains? would seq-contains? somehow
return a different answer? And what about the sorted collections,
where seq-contains? could be fast if the value sought was smaller or
larger than the values contained?

I don't see the distinction(s) you're trying to make here.


> Another way to put it: If you wrote a protocol to pick a fast
> implementation based on type, then seq-contains? would be the fn
> that the protocol would call if it couldn't find anything faster.
>
> There have to be primitive things somewhere...

If so, then why isn't there a vector-first and a list-first and abc-
first that I can use when I 'know' what type I'm working with?

Aren't they all subsumed under 'first' because Clojure understands
that premature optimization based on concrete types was the wrong
thing to do? At least that is the message I got from Rich's videos
explaining why tying all of the "old" lisp goodness to concrete types
was wrong (unnecessarily limiting).

Stuart Halloway

unread,
Apr 28, 2010, 6:55:51 PM4/28/10
to clo...@googlegroups.com
>> Another way to put it: If you wrote a protocol to pick a fast
>> implementation based on type, then seq-contains? would be the fn
>> that the protocol would call if it couldn't find anything faster.
>>
>> There have to be primitive things somewhere...
>
> If so, then why isn't there a vector-first and a list-first and abc-
> first that I can use when I 'know' what type I'm working with?

Specializations are available where there is a clear use case, e.g.
contains?, peek (instead of last), disj, etc.

> Aren't they all subsumed under 'first' because Clojure understands
> that premature optimization based on concrete types was the wrong
> thing to do? At least that is the message I got from Rich's videos
> explaining why tying all of the "old" lisp goodness to concrete
> types was wrong (unnecessarily limiting).

Tying to concrete types is limiting. *Never* having special purpose
fns that know about performance characteristics is also limiting.

contains?/seq-contains? mark a divide where it is very easy for a
language to quietly help programmers do the wrong thing.

Stu

Douglas Philips

unread,
Apr 28, 2010, 7:17:32 PM4/28/10
to clo...@googlegroups.com
On 2010 Apr 28, at 6:55 PM, Stuart Halloway wrote:
> Specializations are available where there is a clear use case, e.g.
> contains?, peek (instead of last), disj, etc.
...
> Tying to concrete types is limiting. *Never* having special purpose
> fns that know about performance characteristics is also limiting.
>
> contains?/seq-contains? mark a divide where it is very easy for a
> language to quietly help programmers do the wrong thing.

Such as littering my application code with type-checks to decide if it
should call contains? or seq-contains?
That seems exactly the wrong thing to do.

Please explain how making seq-contains? fast for sets and maps is
wrong (exactly how does it violate some contract and what contract,
explicitly is that a violation of) and application code type checking
is right, because I really don't understand why making seq-contains?
fast isn't just a huge win-win.

Douglas Philips

unread,
Apr 28, 2010, 7:27:11 PM4/28/10
to clo...@googlegroups.com
On 2010 Apr 28, at 6:06 PM, Meikel Brandmeyer wrote:
> Ah, ok. I misunderstood what you were saying. But I think it doesn't
> change the argumentation. If seq-contains? was fast on maps and sets,
> people would abandon contains? because seq-contains? is "always
> right":
> works on seqs, fast on maps and sets. And again we are in the
> situation,
> that the developer does not make his intentions explicit.

What is the intention to be made exlicit here? That some sequences are
better than others? (car/cdr/list from old-school lisps were exactly
this kind of thinking, whether deliberate or not.)

How is it a developer is supposed to say: "only use a set or map" here?
I'm new enough to Clojure that I'm not clear on how duck-type-y the
idioms are.
Should I use assert to make my intention clear?
raise an exception?
just call some function that cares and expect that it will either
assert or raise for me?
Isn't this just the kind of thing that protocols are for? Again, I
think Stuarts video example with reduce is quite apropos. For certain
things, we can make some operations faster, and we can do it, post-
facto, wihin clojure itself.

-Doug

Mark Engelberg

unread,
Apr 28, 2010, 7:38:31 PM4/28/10
to clojure
On Wed, Apr 28, 2010 at 2:39 PM, Douglas Philips <dg...@mac.com> wrote:
> If some function I call uses seq-contains? (so that it can get all the wonderful seq-ness abstractness clojure offers) I should have to know that internal detail and not pass in a map or set? or worse, fail to get an optimization in that case? C'mon, this should be an easy protocol win (or maybe I don't really understand the limits on what protocols can do.)

Doug, I also am not particularly enamored of the name seq-contains? I
agree that it conjures up notions of Scheme's proliferation of names
like list-ref, vector-ref, string-ref, etc.

But I think you're not fully appreciating the complexity of the
situation. This is not just about performance, but a question of two
different possible semantics for "contains?", which is why it can't
*only* be addressed with a protocol.

The fundamental problem is that some Clojure data structures have a
dual nature. For example, a vector can be thought of as a collection
of items, or you can think of it as a mapping from integers to items.
A hash map can be thought of as a map, or as a sequence of key-value
pairs.

The current "contains?" function is really a "contains-key?" function.
It treats the input as a data structure with keys (a map or a set),
and asks whether it contains a given key. This has been a tremendous
source of confusion to some people who apply the contains? function to
a vector, and are startled to discover that it is only asking whether
the item is a *key* in the vector, (i.e., in the case of a vector, a
valid index between 0 and length-1). Arguably, contains-key? would
have been a much better name, and made the semantics much more clear.

There are some people who believe that Clojure would benefit from a
variant of "contains?" that views the input as a collection. But how
to make this clear? That's why the proposal is for a new function
called "seq-contains?". It's not so much a warning of "this thing has
performance like a linear search", as much as, "it views your data as
a sequence". So for example, calling seq-contains? on a vector will
actually test whether the item is an element of the vector collection.
Calling seq-contains? on a map will actually test whether a given
[key value] pair is in the map (rather than just testing for the
existence of a key).

So what are some other options?:

1. Don't include seq-contains? The behavior you want can usually be
achieved by using (some #{item} coll). Disadvantage - if you're
testing to see if the collection contains nil, that won't work.
2. Rename contains? to contains-key? and make a function with the new
semantics called contains? Disadvantage - breaks existing code.
3. Keep one function called contains?, but use protocols to achieve
different semantics for different kind of data structures. So for
sets and maps, contains? would work like a contains-key? function, but
for vectors and other sequences, it would be a more intuitive
contains-item? function. Disadvantage - semantics depends on knowing
what kind of thing you're passing to the function, also might break
existing code if anyone is actually currently using contains? on
vectors.

I'm not crazy about the name seq-contains? (and I'm skeptical that it
even belongs in the core), but all the other solutions I can think of
also have major disadvantages.

That said, you make an excellent point about performance. Even though
the new seq-contains? is intended to treat the underlying data
structure as a sequence from a semantics standpoint, there is
absolutely no reason at all why its performance should be bound by
this. Clearly contains? and seq-contains? have identical semantics
for a set, so for a set data structure the new seq-contains? should
also use protocols to achieve the same speed, so seq-contains?
performs identically on a set as contains? Similarly, even though the
new seq-contains? will have a new meaning for hash maps (testing for a
key-value pair rather than just a key), clearly it is possible to use
protocols to perform this test far faster than a linear sequential
search.

So, if things do move forward with seq-contains? to create a new
semantic notion in the clojure core of how to test for containment, I
absolutely agree that protocols should be used to make this new notion
to continue to have good performance on those sorts of collections for
which it is possible to have superior performance than a sequential
search.

--Mark

Stuart Halloway

unread,
Apr 28, 2010, 7:43:47 PM4/28/10
to clo...@googlegroups.com
Wow, and I thought this was a sore subject before, when there was no
seq-contains? and its absence was always a Top 5 FAQ. :-)

I'll wait for Rich to maybe chime in on seq-contains?. Other than seq-
contains? are people liking the new fns? Anybody having issues we
didn't anticipate?

Stu

Michael Gardner

unread,
Apr 28, 2010, 7:57:22 PM4/28/10
to clo...@googlegroups.com
On Apr 28, 2010, at 6:38 PM, Mark Engelberg wrote:

> 2. Rename contains? to contains-key? and make a function with the new
> semantics called contains? Disadvantage - breaks existing code.

You could also alias contains? to contains-key?, use seq-contains? (or some other new name) for this new function, and deprecate contains? with an eye towards removal down the road. Deprecation has been used successfully even in mature languages, hasn't it?

Rich Hickey

unread,
Apr 28, 2010, 11:04:08 PM4/28/10
to clo...@googlegroups.com


On Apr 28, 7:17 pm, Douglas Philips <d...@mac.com> wrote:
> On 2010 Apr 28, at 6:55 PM, Stuart Halloway wrote:
>
> > Specializations are available where there is a clear use case, e.g.
> > contains?, peek (instead of last), disj, etc.
> ...
> > Tying to concrete types is limiting. *Never* having special purpose
> > fns that know about performance characteristics is also limiting.
>
> > contains?/seq-contains? mark a divide where it is very easy for a
> > language to quietly help programmers do the wrong thing.
>
> Such as littering my application code with type-checks to decide if
it
> should call contains? or seq-contains?
> That seems exactly the wrong thing to do.
>
> Please explain how making seq-contains? fast for sets and maps is
> wrong (exactly how does it violate some contract and what contract,
> explicitly is that a violation of) and application code type checking
> is right, because I really don't understand why making seq-contains?
> fast isn't just a huge win-win.
>

It is an important aspect of Clojure that, in general, performance
guarantees are part of the semantics of functions. In particular,
functions are not supported on data structures where they are not
performant.

You can't just swap out sequences or lists for maps or sets in a
program. So, you need to use the right data structure for the job, and
then the right functions will be available. When it makes sense, some
functions are maximally polymorphic (e.g. seq, or into). But lookup,
under any name, shouldn't be, IMO, so it isn't in Clojure. Similarly
there is no insert at the beginning of vectors, append to end of
lists, lookup for values of maps, insertion in the middle of sequences
etc. I simply don't want to provide the tools for people to write
programs in Clojure with N^2 performance - that benefits no one. But
I've seen such performance degradation in many a Lisp program as
people build upon O(n) functions on lists.

If you have a lookup table in your program, please use a set or map,
or, if by index, a vector. You will have get, contains? or nth
available, performance will be good, and that will be clear from the
code.

seq-contains? exists because sometimes, say in a macro, you have a
known short sequence and need to test for the presence of something.
There's no need to copy it into a set. And people have been rolling
their own includes? etc for this job. Now everyone can use seq-
contains? for that. Its poor performance is indicated by its name and
documentation. Should someone try to take a piece of code based on seq-
contains? and try to scale it, they will see that function as an
obvious bottleneck.

Your type-check argument is a straw man that doesn't hold up in
Clojure practice. People tend to use the right data structures for the
job and don't choose algorithms via type checks.

Clojure is not particularly duck-type-y, in general things are not
unified by the presence of same-named functions but by underlying
shared abstractions. Just because protocols can be used to make things
reach many types doesn't mean they should be used to unify things that
aren't uniform.

If you want to get a feel for how the abstractions in Clojure are
organized, have a look at the slides in the middle of this
presentation (about 30 pages in):

http://clojure.googlegroups.com/web/tutorial.pdf

While you may not agree with it (and everyone disagrees with
contains? :), at least the design is a considered one.

Rich

Douglas Philips

unread,
Apr 29, 2010, 1:39:10 AM4/29/10
to clo...@googlegroups.com
On 2010 Apr 28, at 7:38 PM, Mark Engelberg wrote:
> But I think you're not fully appreciating the complexity of the
> situation. This is not just about performance, but a question of two
> different possible semantics for "contains?", which is why it can't
> *only* be addressed with a protocol.

...

> The fundamental problem is that some Clojure data structures have a
> dual nature. For example, a vector can be thought of as a collection
> of items, or you can think of it as a mapping from integers to items.
> A hash map can be thought of as a map, or as a sequence of key-value
> pairs.

Agreed on the hash map. For vectors, Clojure has already decided that
'seq'ing them ignores their indexes.


> The current "contains?" function is really a "contains-key?" function.

Agreed.

> There are some people who believe that Clojure would benefit from a
> variant of "contains?" that views the input as a collection. But how
> to make this clear? That's why the proposal is for a new function
> called "seq-contains?". It's not so much a warning of "this thing has
> performance like a linear search", as much as, "it views your data as
> a sequence".

Yup.

...


> So, if things do move forward with seq-contains? to create a new
> semantic notion in the clojure core of how to test for containment, I
> absolutely agree that protocols should be used to make this new notion
> to continue to have good performance on those sorts of collections for
> which it is possible to have superior performance than a sequential
> search.

Perhaps protocols are a touchy subject and there is another mechanism
to use, but I haven't yet seen a coherent argument against some
mechanism for enhancing performance.

Thanks for the detailed analysis. I guess I had hoped it wouldn't've
been necessary. :)

-Doug

Meikel Brandmeyer

unread,
Apr 29, 2010, 1:48:23 AM4/29/10
to Clojure
Hi,

On 29 Apr., 01:38, Mark Engelberg <mark.engelb...@gmail.com> wrote:

> 1.  Don't include seq-contains?  The behavior you want can usually be
> achieved by using (some #{item} coll).  Disadvantage - if you're
> testing to see if the collection contains nil, that won't work.

Not entirely correct. (some #(= % item) coll) works always and is
basically how seq-contains? is implemented. However there are
opinions that some is not obvious enough.

Sincerely
Meikel

Douglas Philips

unread,
Apr 29, 2010, 1:49:55 AM4/29/10
to clo...@googlegroups.com
On 2010 Apr 28, at 11:04 PM, Rich Hickey wrote:
> It is an important aspect of Clojure that, in general, performance
> guarantees are part of the semantics of functions. In particular,
> functions are not supported on data structures where they are not
> performant.

Understood. That isn't the point/question I was trying to raise.
Again, back to the demo of protocols in Stuart's video, the question
isn't "is seq-contains? potentially lying about its performance" but
"why shouldn't any function, including seq-contains? be as fast as it
can be, when it can be?"

...<most of the rest is a slight tangent from my fundamental question,
however interesting it might be for other reasons>...



> And people have been rolling their own includes? etc for this job.
> Now everyone can use seq-contains? for that. Its poor performance is
> indicated by its name and documentation. Should someone try to take
> a piece of code based on seq-contains? and try to scale it, they
> will see that function as an obvious bottleneck.

Really? We've known for a long time that people suck at guessing where
programs have performance bottlenecks (1), and that taking a premature
optimization attitude "obvious bottleneck" is wrong headed (2).

(1) consing is bad!
(2) 20G of garbage per second and the CPU utilization barely breaks
10%? Hmmm, I guess consing isn't so bad and that thinking "consing is
bad" is just another old, preconceived notion proved wrong, so that
'obvious bottleneck' turns out not to be so obvious after all.

Shouldn't seq-contains? be parallelizable, esp. vectors where you can
easily create O(1) slices... or use other techniques when sequences of
certain kinds might have other exploitable properties?


What is the purpose, goal, design-rationale of not making seq-
contains? fast on maps or sets?



> Clojure is not particularly duck-type-y, in general things are not
> unified by the presence of same-named functions but by underlying
> shared abstractions. Just because protocols can be used to make
> things reach many types doesn't mean they should be used to unify
> things that aren't uniform.

Are saying that seq-contains? will not work on sets and maps because
they aren't uniform w.r.t. sequences or that they are uniform, but
protocols won't fit here as an optimization technique because _____
(being brand new, I don't know all the subtleties of protocols), or ???



> If you want to get a feel for how the abstractions in Clojure are
> organized, have a look at the slides in the middle of this
> presentation (about 30 pages in):
>
> http://clojure.googlegroups.com/web/tutorial.pdf

Actually I have seen those slides already, but had lost track of where
I had found them, thanks for the reminder pointer.



> While you may not agree with it (and everyone disagrees with
> contains? :), at least the design is a considered one

I generally agree with the design Clojure embodies as an outlook :).
As to contains? ... I agree with Mark Engelberg's detailed analysis
and conclusions.


-Doug

Meikel Brandmeyer

unread,
Apr 29, 2010, 1:57:23 AM4/29/10
to Clojure
Hi,

On 29 Apr., 01:43, Stuart Halloway <stuart.hallo...@gmail.com> wrote:

> I'll wait for Rich to maybe chime in on seq-contains?. Other than seq-
> contains? are people liking the new fns? Anybody having issues we  
> didn't anticipate?

I was a little bit surprised about map-indexed and keep-indexed.
Why specialised functions when (map f (indexed coll)) just works
as well, but I must confess that having (f idx item) is nicer compared
to (f [idx item]). On the other hand one could do (map #(apply f %)
(indexed coll)), but again this doesn't play nice with #() functions.

One thing I noticed in the implementation of keep-indexed.

keepi is defined as (letfn [(keepi [idx coll] ...)] ...). However in
the
non-chunked case it is called as (keepi f (inc idx) (rest s)) resp.
as (keepi f (rest s)). I think there is something goofy going on here.

Sincerely
Meikel

Mark Engelberg

unread,
Apr 29, 2010, 2:17:50 AM4/29/10
to clojure
On Wed, Apr 28, 2010 at 10:49 PM, Douglas Philips <dg...@mac.com> wrote:
> What is the purpose, goal, design-rationale of not making seq-contains? fast
> on maps or sets?

I think Rich's point is that if seq-contains? is sometimes fast and
sometimes slow, then it makes it harder to reason about a program that
uses seq-contains?. Is this one of the "slow" usages, or one of the
"fast" usages? On the other hand, with the current plan, you know
exactly what performance you're getting. contains? will always be
fast on sets and maps (albeit returning meaningless results for lists,
and counterintuitive results for vectors). seq-contains? always does
a linear search. So if you see seq-contains? in your code, this
serves as a big red flag that either you should be using a set data
structure and contains?, or you better be sure your sequence is
relatively small.

If seq-contains? sometimes had good performance, then people would
start using a mixture of seq-contains? and contains? for sets and
maps, and it would be less clear when you were using seq-contains?
"improperly".

> I generally agree with the design Clojure embodies as an outlook :).
> As to contains? ... I agree with Mark Engelberg's detailed analysis and
> conclusions.

Thanks. But I definitely see the logic of Rich's approach as well.

As a side note, maybe contains? should return an error message when
used on a non-keyed collection like a list. With the confusion that's
sure to result from having both contains? and seq-contains? in the
core with slightly different semantics, applicable inputs, and
performance, an error message for inappropriate uses of contains?
would be very valuable, I think.

ataggart

unread,
Apr 29, 2010, 4:21:54 AM4/29/10
to Clojure
I know it won't matter, but for posterity if nothing else...

Functions named contains-key? and contains-val? would make a lot more
sense to me than the current contains? and new seq-contains?. Anyone
looking at contains-val? should expect it to be O(n). The only
effective difference would be that the test value for contains-val? is
consistently a single value rather than a [key value] tuple for maps.

Lists:
(contains-key? '(:foo :bar) 0)
Exception
(contains-val? '(:foo :bar) :foo)
true

Vectors:
(contains-key? [:foo :bar] 0)
true
(get [:foo :bar] 0)
:foo
(contains-key? [:foo :bar] 2)
false
(contains-val? [:foo :bar] :foo)
true
(contains-val? [:foo :bar] :baz)
false

Maps:
(contains-key? {:foo :bar} :foo)
true
(get {:foo :bar} :foo)
:bar
(contains-key? {:foo :bar} :baz)
false
(contains-val? {:foo :bar} :bar)
true
(contains-val? {:foo :bar} :baz)
false

Sets:
(contains-key? #{:foo :bar} :foo)
true
(get #{:foo :bar} :foo)
:foo
(contains-key? #{:foo :bar} :baz)
false
(contains-val? #{:foo :bar} :foo)
true
(contains-val? #{:foo :bar} :baz)
false

Rich Hickey

unread,
Apr 29, 2010, 6:53:38 AM4/29/10
to clo...@googlegroups.com

On Apr 29, 2010, at 1:57 AM, Meikel Brandmeyer wrote:

> Hi,
>
> On 29 Apr., 01:43, Stuart Halloway <stuart.hallo...@gmail.com> wrote:
>
>> I'll wait for Rich to maybe chime in on seq-contains?. Other than
>> seq-
>> contains? are people liking the new fns? Anybody having issues we
>> didn't anticipate?
>
> I was a little bit surprised about map-indexed and keep-indexed.
> Why specialised functions when (map f (indexed coll)) just works
> as well, but I must confess that having (f idx item) is nicer compared
> to (f [idx item]). On the other hand one could do (map #(apply f %)
> (indexed coll)), but again this doesn't play nice with #() functions.
>
> One thing I noticed in the implementation of keep-indexed.
>
> keepi is defined as (letfn [(keepi [idx coll] ...)] ...). However in
> the
> non-chunked case it is called as (keepi f (inc idx) (rest s)) resp.
> as (keepi f (rest s)). I think there is something goofy going on here.
>

Fixed - thanks.

Rich

Mark J. Reed

unread,
Apr 29, 2010, 7:41:17 AM4/29/10
to clo...@googlegroups.com
I like this proposal. I'd make contains? an alias for contains-key?
with a deprecation warning, and just forget about seq-contains? in
favor of contains-val?
--
Mark J. Reed <mark...@gmail.com>

Laurent PETIT

unread,
Apr 29, 2010, 7:52:34 AM4/29/10
to clo...@googlegroups.com
2010/4/29 Mark J. Reed <mark...@gmail.com>:
> I like this proposal.  I'd make contains? an alias for contains-key?
> with a deprecation warning, and just forget about seq-contains? in
> favor of contains-val?

This makes a lot of sense to me.
(and have, as suggested by ataggart, contains-key? complain if passed a seq)

Stuart Halloway

unread,
Apr 29, 2010, 8:11:00 AM4/29/10
to clo...@googlegroups.com
Thinking about this one.

> I like this proposal. I'd make contains? an alias for contains-key?
> with a deprecation warning, and just forget about seq-contains? in
> favor of contains-val?

Douglas Philips

unread,
Apr 29, 2010, 8:56:49 AM4/29/10
to clo...@googlegroups.com
On 2010 Apr 29, at 2:17 AM, Mark Engelberg wrote:
> On Wed, Apr 28, 2010 at 10:49 PM, Douglas Philips wrote:
>> What is the purpose, goal, design-rationale of not making seq-
>> contains? fast
>> on maps or sets?
>
> I think Rich's point is that if seq-contains? is sometimes fast and
> sometimes slow, then it makes it harder to reason about a program that
> uses seq-contains?.

If I'm writing seq-friendly functions, what you're saying is that I am
guaranteed to be always slow?
That if the code using my function(s) passes in a set instead of a
vector there shouldn't be a speed up?



> seq-contains? always does a linear search.

I'm not arguing against that, but apparently it is such a popular
straw man that I don't have to. :)

Is the contract for seq-contains that it will always examine every
element, or is it permitted to stop early if it finds a match? Is it
forbidden for range and seq-contains? to exploit the kind of protocol-
ish benefits that Stuart showed in his video with reduce and strings?



> So if you see seq-contains? in your code, this
> serves as a big red flag that either you should be using a set data
> structure and contains?, or you better be sure your sequence is
> relatively small.

Maybe. But maybe it is a "red flag" that I'm accepting any kind of
seq, and that my code will work for whatever seq type makes sense for
the problem my function is being used to help solve.



> If seq-contains? sometimes had good performance, then people would
> start using a mixture of seq-contains? and contains? for sets and
> maps, and it would be less clear when you were using seq-contains?
> "improperly".

Oh, please. First you say I'm too stoopid to profile my code :), and
that I won't be able to reason about it, and now you're saying that I
will profile it but come to some broken conclusions? I already know
that if I use a "may have to check every element" containment function
that it might <dramatic music/> have to check every element. At least
have the guts to make the slow function guaranteed slow (it ALWAYS
checks every element) so that I don't ever get some potentially
unclear idea about what the performance will be, and that I should
never use it on infinite sequences. Sheesh.

-Doug

Douglas Philips

unread,
Apr 29, 2010, 9:06:59 AM4/29/10
to clo...@googlegroups.com
On 2010 Apr 29, at 4:21 AM, ataggart wrote:
> Functions named contains-key? and contains-val? would make a lot more
> sense to me than the current contains? and new seq-contains?. Anyone
> looking at contains-val? should expect it to be O(n). The only
> effective difference would be that the test value for contains-val? is
> consistently a single value rather than a [key value] tuple for maps.

+1

-Doug

Douglas Philips

unread,
Apr 29, 2010, 9:12:35 AM4/29/10
to clo...@googlegroups.com
On 2010 Apr 29, at 7:52 AM, Laurent PETIT wrote:
> 2010/4/29 Mark J. Reed <mark...@gmail.com>:
>> I like this proposal. I'd make contains? an alias for contains-key?
>> with a deprecation warning, and just forget about seq-contains? in
>> favor of contains-val?
>
> This makes a lot of sense to me.
> (and have, as suggested by ataggart, contains-key? complain if
> passed a seq)

user=> (assoc '(1 2 3) 3 4)
java.lang.ClassCastException: clojure.lang.PersistentList
(NO_SOURCE_FILE:0)

If contains-key? complains any less cryptically, then perhaps assoc
should to?
(I'm certainly willing to submit potential patches in this area...
assoc's failure to 'protect itself/provide a more useful complaint
message' seems duck-type-y...
it seems that contains-key? should keep in line with assoc, or visa-
versa, whichever is the more clojure-y approach.)

-Doug

Mark Hamstra

unread,
Apr 29, 2010, 10:05:28 AM4/29/10
to Clojure


On Apr 29, 2:17 am, Mark Engelberg <mark.engelb...@gmail.com> wrote:
> On Wed, Apr 28, 2010 at 10:49 PM, Douglas Philips <d...@mac.com> wrote:
> > What is the purpose, goal, design-rationale of not making seq-contains? fast
> > on maps or sets?
>
> I think Rich's point is that if seq-contains? is sometimes fast and
> sometimes slow, then it makes it harder to reason about a program that
> uses seq-contains?.  Is this one of the "slow" usages, or one of the
> "fast" usages?  On the other hand, with the current plan, you know
> exactly what performance you're getting.  contains? will always be
> fast on sets and maps (albeit returning meaningless results for lists,
> and counterintuitive results for vectors).  seq-contains? always does
> a linear search.  So if you see seq-contains? in your code, this
> serves as a big red flag that either you should be using a set data
> structure and contains?, or you better be sure your sequence is
> relatively small.

I don't fully agree with the reasoning here. It seems clear that part
of what contains? means should be "will be fast," so contains? must
only be available for types where this performance can be guaranteed.
What is less clear to me is whether part of what seq-contains? means
should be "will be slow/will be implemented as an O(n) sequential
search," or whether it should be "may be slow." I'm not convinced
that "may be slow" doesn't raise enough of a warning flag and makes
reasoning about a program harder than does "will be slow" -- and "will
be implemented as..." smells bad.

Sean Devlin

unread,
Apr 29, 2010, 10:37:57 AM4/29/10
to Clojure
(send contains-val? inc)

Michael Gardner

unread,
Apr 29, 2010, 12:49:55 PM4/29/10
to clo...@googlegroups.com
On Apr 29, 2010, at 3:21 AM, ataggart wrote:

> I know it won't matter, but for posterity if nothing else...
>
> Functions named contains-key? and contains-val? would make a lot more
> sense to me than the current contains? and new seq-contains?. Anyone
> looking at contains-val? should expect it to be O(n). The only
> effective difference would be that the test value for contains-val? is
> consistently a single value rather than a [key value] tuple for maps.

+1. I can't imagine any use case for looking up a whole [key, value] pair in a hash-map.

MarkSwanson

unread,
Apr 29, 2010, 2:19:38 PM4/29/10
to Clojure


On Apr 29, 4:21 am, ataggart <alex.tagg...@gmail.com> wrote:
> I know it won't matter, but for posterity if nothing else...
>
> Functions named contains-key? and contains-val? would make a lot more
> sense to me than the current contains? and new seq-contains?.  Anyone
> looking at contains-val? should expect it to be O(n).  The only
> effective difference would be that the test value for contains-val? is
> consistently a single value rather than a [key value] tuple for maps.

+1. This is super clear. I find nothing existing or proposed comes
close to this level of clarity.

Boris Mizhen - 迷阵

unread,
Apr 29, 2010, 5:44:01 PM4/29/10
to clo...@googlegroups.com
> +1. I can't imagine any use case for looking up a whole [key, value] pair in a hash-map.
Actually this is quite useful when you want to do something for each
value and need to know the key as well - for example copy some
key/value pairs to another map

Boris

Mark J. Reed

unread,
Apr 29, 2010, 6:38:16 PM4/29/10
to clo...@googlegroups.com
Iterating through the pairs is useful. Asking if a given [k, v] is
included is not - you can just ask if (= (assoc k) v) instead.

It'd be nice if (contains-val) returned the key(s) as its true result,
but probably not useful enough to warrant the complexity of dealing
with false keys, explicit true checks, etc. In CL I would totally
return the key list as a multivalue on top of t, though. :)
--
Mark J. Reed <mark...@gmail.com>

Rich Hickey

unread,
Apr 30, 2010, 7:33:35 AM4/30/10
to clo...@googlegroups.com

On Apr 29, 2010, at 4:21 AM, ataggart wrote:

> I know it won't matter, but for posterity if nothing else...
>
> Functions named contains-key? and contains-val? would make a lot more
> sense to me than the current contains? and new seq-contains?. Anyone
> looking at contains-val? should expect it to be O(n). The only
> effective difference would be that the test value for contains-val? is
> consistently a single value rather than a [key value] tuple for maps.
>


I disagree.

People don't consider sets, vectors, arrays or strings to have 'keys'.
But, like maps, they all support fast lookup of some sort.

Would contains-val? be fast for sets? As a user of sets, I consider
them collections of values, and I absolutely would reach for contains-
val? in any library that had it, for use with sets. If so, and I used
contains-val?, and I moved code from using sets to maps (happens
frequently), or vectors (also happens) my perf would suddenly stink.
If not fast on sets, why not? The reason isn't supported by the name.

The mismatch with the seq values of maps is also disconcerting for
something that would purport to be sequential, as the things returned
by (seq amap) are key+value pairs.

Just because you wouldn't reach for contains? for use with a known
vector doesn't mean your code, or other code built on the abstraction,
won't end up calling it with a vector. And don't think you never use
contains? on a vector/array - you rewrite it every time you write (if
(and (<= 0 i) (< i (count v))) ...)

'contains?' and 'get' abstract over fast lookup. They are polymorphic
on the collection type and on the nature of the looked-up thing. For
maps the looked-up thing is a key, for sets: a value, for vectors,
strings and arrays: an index. Calling it contains-key? doesn't make
them the same, nor add any value.

In Clojure, 'contains?' is about 'get' succeeding. Nothing more or
less. It is not a rummager.

(if (contains? coll x)
(get coll x)
(plan-b))

Renaming contains? is not on the table. For people that understand its
relationship with get, it makes perfect sense (and I don't think I'm
the only one :). And there is a lot of client code. And no one has
come up with a better name that doesn't include caveats. contains? is
internally consistent, is unfamiliar.

I do understand that this use of the word differs from that used in
e.g., Java. But I'll make the same argument to Java devs that I do to
the Lispers, who have seen many more of their prized words repurposed
in Clojure (assoc, loop, do et al):

The words can't mean the same thing forever without trapping us
in the same semantics forever, and there are only so many good words.

Everyone has to realize that this level of polymorphism in Clojure is
unusual. I haven't seen a library with equivalents to get and
contains?. Heck, in Java, Maps aren't even collections! So, should we
adopt their nomenclature because it is familiar?

I agree that contains?'s behavior on vectors is confusing for
newcomers. That's not a reason for it to be different. And that people
need a way to do that rummaging job. They of course can, with 'some'.
But I also agree that 'some', being a higher-order function, would not
necessarily be the tool newcomers would consider for the job of
rummaging for a value. Perhaps it's a good first lesson, as they are
not going to find filter-val etc either.

So, I pulled in 'includes?' from contrib and renamed it seq-contains?

The only options for right now are:

A) I remove seq-contains?
B) I rename seq-contains?

I'm inclined towards A so we can all stop wasting time and energy on
this unnecessary function.

Rich

Laurent PETIT

unread,
Apr 30, 2010, 8:18:05 AM4/30/10
to clo...@googlegroups.com
2010/4/30 Rich Hickey <richh...@gmail.com>:
>
> On Apr 29, 2010, at 4:21 AM, ataggart wrote:
>
>> I know it won't matter, but for posterity if nothing else...
>>
>> Functions named contains-key? and contains-val? would make a lot more
>> sense to me than the current contains? and new seq-contains?.  Anyone
>> looking at contains-val? should expect it to be O(n).  The only
>> effective difference would be that the test value for contains-val? is
>> consistently a single value rather than a [key value] tuple for maps.
>>
>
>
> I disagree.
>
> People don't consider sets, vectors, arrays or strings to have 'keys'. But,
> like maps, they all support fast lookup of some sort.
>
> Would contains-val? be fast for sets?  As a user of sets, I consider them
> collections of values, and I absolutely would reach for contains-val? in any
> library that had it, for use with sets. If so, and I used contains-val?, and
> I moved code from using sets to maps (happens frequently), or vectors (also
> happens) my perf would suddenly stink. If not fast on sets, why not? The
> reason isn't supported by the name.
>
> The mismatch with the seq values of maps is also disconcerting for something
> that would purport to be sequential, as the things returned by (seq amap)
> are key+value pairs.
>
> Just because you wouldn't reach for contains? for use with a known vector
> doesn't mean your code, or other code built on the abstraction, won't end up
> calling it with a vector. And don't think you never use contains? on a
> vector/array - you rewrite it every time you write (if (and (<= 0 i) (< i
> (count v))) ...)
>
> 'contains?' and 'get' abstract over fast lookup. They are polymorphic on the
> collection type and on the nature of the looked-up thing. For maps the
> looked-up thing is a key, for sets: a value, for vectors, strings and
> arrays: an index.  Calling it contains-key? doesn't make them the same, nor
> add any value.
>
> In Clojure, 'contains?' is about 'get' succeeding. Nothing more or less. It
> is not a rummager.

While it sounds soo evident now that you say that explicitly ( the
contains? / get pair ), it may be good to reflect that in the docs of
the functions rather than just keep this knowledge here ?

Stephen C. Gilardi

unread,
Apr 30, 2010, 9:44:48 AM4/30/10
to clo...@googlegroups.com

On Apr 29, 2010, at 2:19 PM, MarkSwanson wrote:

> On Apr 29, 4:21 am, ataggart <alex.tagg...@gmail.com> wrote:
>> I know it won't matter, but for posterity if nothing else...
>>
>> Functions named contains-key? and contains-val? would make a lot more
>> sense to me than the current contains? and new seq-contains?. Anyone
>> looking at contains-val? should expect it to be O(n). The only
>> effective difference would be that the test value for contains-val? is
>> consistently a single value rather than a [key value] tuple for maps.
>
> +1. This is super clear. I find nothing existing or proposed comes
> close to this level of clarity.

inc. those are great names.

--Steve

Christophe Grand

unread,
Apr 30, 2010, 10:06:37 AM4/30/10
to clo...@googlegroups.com
Hi,
On Thu, Apr 29, 2010 at 7:48 AM, Meikel Brandmeyer <m...@kotka.de> wrote:
On 29 Apr., 01:38, Mark Engelberg <mark.engelb...@gmail.com> wrote:

> 1.  Don't include seq-contains?  The behavior you want can usually be
> achieved by using (some #{item} coll).  Disadvantage - if you're
> testing to see if the collection contains nil, that won't work.

Not entirely correct. (some #(= % item) coll) works always and is
basically how seq-contains? is implemented. However there are
opinions that some is not obvious enough.

(some {item true} coll) works with nil too, but usally all the love goes to #(= % item, it's unfair! :-)

Christophe

Phil Hagelberg

unread,
Apr 30, 2010, 12:17:56 PM4/30/10
to clo...@googlegroups.com
On Fri, Apr 30, 2010 at 4:33 AM, Rich Hickey <richh...@gmail.com> wrote:
> On Apr 29, 2010, at 4:21 AM, ataggart wrote:
>> Functions named contains-key? and contains-val? would make a lot more
>> sense to me than the current contains? and new seq-contains?.  Anyone
>> looking at contains-val? should expect it to be O(n).  The only
>> effective difference would be that the test value for contains-val? is
>> consistently a single value rather than a [key value] tuple for maps.
>
> People don't consider sets, vectors, arrays or strings to have 'keys'. But,
> like maps, they all support fast lookup of some sort.

Actually I do consider sets to have keys, since internally they are
implemented using maps, so the exact same semantics apply for their
lookup. They're just maps where the key and value are the same thing:

protected APersistentSet(IPersistentMap impl){
this.impl = impl;
}

public boolean contains(Object key){
return impl.containsKey(key);
}

public Object get(Object key){
return impl.valAt(key);
}

Because of this I would also expect contains-val? to be constant-time
for sets, so I think seq-contains? is a better name.

Anyway, I'm a little weary of this discussion by now, (and I can't be
the only one) so I'd be perfectly happy leaving things as they are. I
am glad that a lot of thought goes into these names though; it
definitely shows.

-Phil

Mark Engelberg

unread,
Apr 30, 2010, 12:37:55 PM4/30/10
to clojure
On Fri, Apr 30, 2010 at 5:18 AM, Laurent PETIT <lauren...@gmail.com> wrote:
> While it sounds soo evident now that you say that explicitly ( the
> contains? / get pair ), it may be good to reflect that in the docs of
> the functions rather than just keep this knowledge here ?

Agreed. This explanation of the relationship between contains? and
get should be reflected in the docs. I *finally* understand why
contains? returns false for lists, rather than an error.

ataggart

unread,
Apr 30, 2010, 2:13:40 PM4/30/10
to Clojure


On Apr 30, 4:33 am, Rich Hickey <richhic...@gmail.com> wrote:
> People don't consider sets, vectors, arrays or strings to have 'keys'.  
> But, like maps, they all support fast lookup of some sort.

But of course we do. I point to the doc for contains? and get:

Usage: (contains? coll key)
Returns true if key is present in the given collection, otherwise
returns false.

Usage: (get map key)
(get map key not-found)
Returns the value mapped to key, not-found or nil if key not present.

Both reference the notion of a "key". If referencing keys in the
documentation doesn't hurt, and makes its usage more clear, I can't
see why that reasoning wouldn't follow to the function name itself,
especially if we allow for a contains-val? (which I know is also


> Would contains-val? be fast for sets?  
As with any abstract method/function the worst-case is what's
documented. Anyone looking at contains-val? should expect worst-case
O(n), just as anyone looking at contains-key? should expect (near)
O(1).

The deeper question is: if an abstract method/function is documented
to be worst-case O(n), then must all implementations be written to
ensure the worst-case performance, even when it could be implemented
in O(1) time?


>As a user of sets, I consider  
> them collections of values, and I absolutely would reach for contains-
> val? in any library that had it, for use with sets. If so, and I used  
> contains-val?, and I moved code from using sets to maps (happens  
> frequently), or vectors (also happens) my perf would suddenly stink.  
> If not fast on sets, why not? The reason isn't supported by the name.

Your perf stinking is a product of you testing for values, and not
keys. The alternative would have been to use contains? with your sets
and watch (contains? s "foo") blow up when you tried to use the same
code against a vector.



> The mismatch with the seq values of maps is also disconcerting for  
> something that would purport to be sequential, as the things returned  
> by (seq amap) are key+value pairs.

Only seq-contains? purports to be related to seq, whereas contains-
val? purports to check if the value is in the collection, and to do so
in worst-case O(n) time. Going back to your earlier point about
swapping out datastructures, (contains-val? v "foo") written against
vectors would work just fine when handed a map; seq-contains? would
give false negatives.


> Renaming contains? is not on the table.

Very well.

Douglas Philips

unread,
Apr 30, 2010, 2:37:55 PM4/30/10
to clo...@googlegroups.com
On 2010 Apr 30, at 7:33 AM, Rich Hickey wrote:
> People don't consider sets, vectors, arrays or strings to have
'keys'.

That is not consistent with the documentation:
Sets: http://clojure.org/data_structures:
Sets support 'removal' with disj, as well as contains? and get, the
latter returning the object that is held in the set which compares
equal to the key, if found ...

http://richhickey.github.com/clojure/clojure.core-api.html:
hash-set ... (hash-set & keys) ... Returns a new hash set with
supplied keys.

and similarly for sorted-set, sorted-set-by.

It rings hollow to say "People don't consider sets, vectors... to have
'keys'" when your own documentation says that keys are what are used
to build sets, or update vectors... and talks about those types using
that term.


>I agree that contains?'s behavior on vectors is confusing for
newcomers. That's not a reason for it to be different. And that people
need a way to do that rummaging job. They of course can, with 'some'.
But I also agree that 'some', being a higher-order function, would not
necessarily be the tool newcomers would consider for the job of
rummaging for a value. Perhaps it's a good first lesson, as they are
not going to find filter-val etc either.

But they will find zero? and wonder, WTF? There be a special function
for #(= 0 %) and not for searching through sequences? (some #(= val
%) ...)


>A) I remove seq-contains?
>B) I rename seq-contains?
Perhaps:
rummage function
Usage: (rummage coll val)
(rummage coll val not-found)
Scans coll looking for val. Returns val if found, else not-found (or
nil).

>I'm inclined towards A so we can all stop wasting time and energy on
this unnecessary function.

An unnecessary function that you just a few paragraphs prior
acknowledged that people need?

Yes, let's do stop "wasting time and energy".
Stuart says that this is FAQ #5, so let's just let it remain that.
And leave the docs as they are, so you can come back again and
thinking about keys and lookup and "misleading expensive operations"
in the wrong way because we only had your docs to read.

But what do I know, I'm just an amateur at wasting time on clojure, I
only have a few measly hours a week. :)

-Doug

Michael Gardner

unread,
Apr 30, 2010, 3:05:46 PM4/30/10
to clo...@googlegroups.com
On Apr 30, 2010, at 6:33 AM, Rich Hickey wrote:

> Would contains-val? be fast for sets? As a user of sets, I consider them collections of values, and I absolutely would reach for contains-val? in any library that had it, for use with sets. If so, and I used contains-val?, and I moved code from using sets to maps (happens frequently), or vectors (also happens) my perf would suddenly stink. If not fast on sets, why not? The reason isn't supported by the name.

You should understand the performance characteristics of the data structures you employ, under whatever operations you subject them to. Being surprised when your code that looks stuff up by value slows down when you switch from sets to something else doesn't strike me as reasonable.

> 'contains?' and 'get' abstract over fast lookup. They are polymorphic on the collection type and on the nature of the looked-up thing. For maps the looked-up thing is a key, for sets: a value, for vectors, strings and arrays: an index. Calling it contains-key? doesn't make them the same, nor add any value.

The objects in a set are both keys and values, and an index *is* a key. The documentation for get and contains? implies this, as others have noted.

> Renaming contains? is not on the table.

I hope you will reconsider this, if not now then at some point in the future.

> For people that understand its relationship with get, it makes perfect sense (and I don't think I'm the only one :).

As would contains-key?, I think.

> And there is a lot of client code.

No need to rename immediately; alias and deprecate.

> And no one has come up with a better name that doesn't include caveats.

I don't see any caveats for contains-key? other than the one addressed above.

> I do understand that this use of the word differs from that used in e.g., Java. But I'll make the same argument to Java devs that I do to the Lispers, who have seen many more of their prized words repurposed in Clojure (assoc, loop, do et al):
>
> The words can't mean the same thing forever without trapping us in the same semantics forever, and there are only so many good words.
>
> Everyone has to realize that this level of polymorphism in Clojure is unusual. I haven't seen a library with equivalents to get and contains?. Heck, in Java, Maps aren't even collections! So, should we adopt their nomenclature because it is familiar?

contains? isn't a bad name because it's used differently in Java. It's bad because it's ambiguous, and because there's a much clearer name available.

> I agree that contains?'s behavior on vectors is confusing for newcomers. That's not a reason for it to be different.

I disagree.

> The only options for right now are:
>
> A) I remove seq-contains?
> B) I rename seq-contains?

Something involving the word 'scan', perhaps?

Sophie

unread,
Apr 30, 2010, 4:49:48 PM4/30/10
to Clojure
On Apr 29, 3:21 am, ataggart <alex.tagg...@gmail.com> wrote:

> Functions named contains-key? and contains-val? would make a lot more
> sense to me than the current contains? and new seq-contains?.

Amen. Even independent of any performance expectations.

Steven E. Harris

unread,
Apr 30, 2010, 6:14:52 PM4/30/10
to clo...@googlegroups.com
Phil Hagelberg <ph...@hagelb.org> writes:

> Actually I do consider sets to have keys, since internally they are
> implemented using maps, so the exact same semantics apply for their
> lookup. They're just maps where the key and value are the same thing:

But that implementation is one of convenience, of possibly admirable
laziness, and it's none of our business. A key is something apart from a
value it refers, but in sets, there's no separate value being referred
to. The value is the only thing in play.

Where we get hung up in software is with the flexibility to define
"equality" or "sufficient sameness" in set implementations by taking
only part of the stored values into account. The same idea doesn't exist
in the mathematical view of sets. Our software would be much clearer if
sets didn't tolerate these "key comparison views", and would instead
force one to use a map in cases where such a "key comparison view"
(being something less than the value itself) is necessary.

--
Steven E. Harris

ataggart

unread,
Apr 30, 2010, 6:29:03 PM4/30/10
to Clojure
Clojure embraces this "laziness":

user=> (get #{:foo :bar} :foo)
:foo

'get uses a "key" to return a "value". A vector is not a map is not a
set, but all of them can have their values accessed in constant-time
using a "key".


On Apr 30, 3:14 pm, "Steven E. Harris" <s...@panix.com> wrote:

Heinz N. Gies

unread,
May 1, 2010, 4:48:00 AM5/1/10
to clo...@googlegroups.com
On Apr 30, 2010, at 14:33 , Rich Hickey wrote:

'contains?' and 'get' abstract over fast lookup. They are polymorphic on the collection type and on the nature of the looked-up thing. For maps the looked-up thing is a key, for sets: a value, for vectors, strings and arrays: an index.  Calling it contains-key? doesn't make them the same, nor add any value.

I think if contains? and get are in 'the same group' the names should reflect that, readability of code and expressive names are very important if someone reads (contains? [1 2 3] 3) he will assume it to be true and I bet about 90% of the people using clojure will make this miss assumption since people think english first and clojure second. if I they read (contains-key? [1 2 3] 3) they will think twice, 'key on a vec, oh index yes so it is false' if they read (contains-val? [1 2 3] 3) they will also think twice, 'val on a vec, ah yes means if it include that value'.


In Clojure, 'contains?' is about 'get' succeeding. Nothing more or less. It is not a rummager.

(if (contains? coll x)
(get coll x)
(plan-b))

Then ow about can-get? since contains does not actually look of the passed thing contains anything it is just looking of you can get stuff from it, you sayed it yourself:

(if (can-get? coll x)
(get coll x)
(plan-c-as-c4)


Renaming contains? is not on the table. For people that understand its relationship with get, it makes perfect sense (and I don't think I'm the only one :). And there is a lot of client code. And no one has come up with a better name that doesn't include caveats. contains? is internally consistent, is unfamiliar.

Sorry I disagree here, I now understand the relationship with get and it makes even less sense to me then before -.- if it has to do with get contains? is a very very bad name. Think about translating stuff to real world examples, if I ask you 'does your bag contains x' I don't want to know if the bag has more then x items I want to know if x is in there. Whenever we use contains we talk about the content in the real world. Even if we use it with indices as in 'does this book contains page 42' we clarify that we are talking about the index by adding 'page' in there, if we ask 'does this book contains 42' people will likely start to look if there is a 42 in the book, and if it is the hitchhikers guide to the galaxy the answer is yes, otherwise most likely no. So pretty pretty please reconsider the decision about renaming contains


I do understand that this use of the word differs from that used in e.g., Java. But I'll make the same argument to Java devs that I do to the Lispers, who have seen many more of their prized words repurposed in Clojure (assoc, loop, do et al):

   The words can't mean the same thing forever without trapping us in the same semantics forever, and there are only so many good words.

But choosing a misleading name just for the sake of chaining the semantics is not the way to go! 

Everyone has to realize that this level of polymorphism in Clojure is unusual. I haven't seen a library with equivalents to get and contains?. Heck, in Java, Maps aren't even collections! So, should we adopt their nomenclature because it is familiar?

Trust me I don't care how java uses contains? that isn't an argument I would raise, but I care about how the human mind, uses contains, and currently it is different to the usage in clojure.

I agree that contains?'s behavior on vectors is confusing for newcomers. That's not a reason for it to be different. And that people need a way to do that rummaging job. They of course can, with 'some'. But I also agree that 'some', being a higher-order function, would not necessarily be the tool newcomers would consider for the job of rummaging for a value. Perhaps it's a good first lesson, as they are not going to find filter-val etc either.

I am pretty certain the behaviour of contains? is confusing even to somewhat versed people, just that at some point people resign and put back comen sense and let clojure do it's thing with contains?, but this isn't a good thing in my  eyes :(. Also 'does this collection contains the value X' is a quite common idiom isn't it? so why force a higher order function onto that?

So, I pulled in 'includes?' from contrib and renamed it seq-contains?

The only options for right now are:

A) I remove seq-contains?
B) I rename seq-contains?
:.(

Regards a sad,
Heinz

Michał Marczyk

unread,
May 1, 2010, 6:01:17 AM5/1/10
to clo...@googlegroups.com
If "contains?" is a sensible name for that function, then surely
"seq-contains?" cannot be a sensible name for a function which checks
a totally different sort of "containment"?

Also, if it is possible to view "seq-contains?" as a sensible enough
name for a core library function with sequential semantics, then
surely "contains?" cannot be intrinsically evocative of a different
notion of "containment", even if it is possible to get used to this
word being attached to such a notion.

Incidentally, I have personally got used to the Clojure meaning of
"contains?" without great difficulty, so I was initially a bit
surprised by this thread exploding to such length... But then
"contains?" is a very desirable name and in hindsight, a controversy
over which op it should be attached to was to be expected. Not just
because it's cool to have such a good name at one's disposal, but
because any other notion of "containment" is bound to get a
substandard name (if indeed it slips into the core lib at all),
because every other "good" name will be totally confused with
contains? by everyone and is therefore out of the game (cf.
includes?).

Using a few variant names (like the proposed ones based on
"contains?", but with disambiguating affixes attached) could be a
reasonable compromise. (Although I'd be a little bit sad that the
"perfect" name is gone, its sacrifice would mean two "reasonable"
names become available.) If that is unacceptable, then perhaps
"seq-contains?" could at least get a name *not* including the
"contains?" part, so that it becomes natural not to assume that it
does roughly the same thing as "contains?". (There's precedent for
prefixes meaning "do mostly the same thing, but in a different
context" in Clojure, e.g. "enumeration-seq", "iterator-seq".)

To end on a punnish note, I'd sort of expect the predicate counterpart
of "get" to be called "got?"... Oh well.

At the end of the day, I'm confident that the overall greatly
considered design of Clojure (including naming conventions) is
internally consistent enough that it can be conveyed to a newcomer
with a reasonable amount of good technical writing. Apparently most
people think the same and are therefore quite prepared to move on. :-)

All the best,
Michał

Olivier Lefevre

unread,
May 3, 2010, 12:51:07 PM5/3/10
to clo...@googlegroups.com
On 4/29/2010 6:49 PM, Michael Gardner wrote:
> +1. I can't imagine any use case for looking up a whole [key, value] pair in a hash-map.

Agreed. The whole point of a map is to provide key-based lookup.
Iterating over all (key, value) pairs in the map makes sense,
not so much looking for one in particular.

Btw I am new to Clojure and I found this whole thread extremely
illuminating. It does seem to go to the heart of some aspects
of Clojure's design.

-- O.L.
Reply all
Reply to author
Forward
0 new messages