clojure 1.2 seq fn enhancement FAQ

512 views
Skip to first unread message

Stuart Halloway

unread,
Apr 28, 2010, 4:31:28 PM4/28/10
to clo...@googlegroups.com
After review, several seq functions from clojure-contrib have been
promoted to clojure [1], [2], [3]. Hopefully the FAQ below will answer
the major questions you may have:

1. Is this a breaking change to Clojure?
No. Rich is super-careful to grow Clojure by expansion, not by
breaking change.

2. Is this a breaking change to Contrib?
Yes. Contrib is a community-contribution area. Part of its mission is
to germinate ideas that may get promoted to Clojure, and when that
happens contrib users will have to adapt. Usually this will simply
mean removing a require/use, or changing a fn name.

3. Why did some of the fn names change?
Because they were bad names. :-) Clojure is careful to encourage the
use of fns that will have good performance characteristics, and make
it obvious which fns are "seq-y" (therefore O(n)). rand-elt became
rand-nth to make it clear that its performance is bound to the
performance of nth on the underlying collection. Similarly, includes?
became seq-contains? to make clear that the fn must walk the sequence.
Avoid seq-contains? when you have a set or map!

4. Why did some semantics change?
Because they were not the most generally-desirable semantics. The
contrib group-by returned a sorted map, when in most cases sorting is
overkill, so the new core fn returns a plain ol' map.

5. Couldn't you have left the old fns in place in contrib, to avoid
breakage?
No, for multiple reasons. Name collisions are one. But the bigger one
is that the fns in core have been audited by Rich for performance and
correctness. Leaving the old fns in contrib would be a recipe for
lower-performing (or even, in rare cases, buggy) programs.

6. Why did this break my project?
Hopefully it doesn't. But, it seems that a lot of people are tracking
their apps (via leiningen or maven) to build snapshots of both Clojure
and Contrib. This is a risky development practice, particularly in the
case of Contrib. As a community, we need to develop a saner way to
stay near the edge without being right on it.

7. How do I fix it?
Your code gets smaller. Remove requires or uses that refer to promoted
fns. You already have them in core! For the fns whose names have
changed, update the names in your code.

Comments and questions welcome!

Stu

[1] https://www.assembla.com/spaces/clojure/tickets/312-audit---promote-some-clojure-contrib-seq-fns
[2] http://github.com/richhickey/clojure/tree/a55df92faa0c51c634d93a8d991ccfd2638b108b
[3] http://github.com/richhickey/clojure-contrib/commit/78ee9b3e64c5ac6082fb223fc79292175e8e4f0c

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Douglas Philips

unread,
Apr 28, 2010, 5:06:21 PM4/28/10
to clo...@googlegroups.com
On 2010 Apr 28, at 4:31 PM, Stuart Halloway wrote:

> After review, several seq functions from clojure-contrib have been
> promoted to clojure [1], [2], [3]. Hopefully the FAQ below will
> answer the major questions you may have:

Cool!

> 3. Why did some of the fn names change?
> ...Similarly, includes? became seq-contains? to make clear that the
> fn must walk the sequence. Avoid seq-contains? when you have a set
> or map!

Wait, what? Why should seq-contains? be slower than contains? Isn't
this exactly the kind of thing that protocols are supposed to be
solving? Your nice video showed that very thing, right?



> 6. Why did this break my project?
> Hopefully it doesn't. But, it seems that a lot of people are
> tracking their apps (via leiningen or maven) to build snapshots of
> both Clojure and Contrib. This is a risky development practice,
> particularly in the case of Contrib. As a community, we need to
> develop a saner way to stay near the edge without being right on it.

Indeed. Tracking the bleeding edge is always dicey... :-(

Thanks!

-Doug

Meikel Brandmeyer

unread,
Apr 28, 2010, 5:14:14 PM4/28/10
to clo...@googlegroups.com
Hi,

On Wed, Apr 28, 2010 at 05:06:21PM -0400, Douglas Philips wrote:

> Wait, what? Why should seq-contains? be slower than contains? Isn't
> this exactly the kind of thing that protocols are supposed to be
> solving? Your nice video showed that very thing, right?

If you have a map or a set looking up a key is fast. So contains? is
fast (read O(1) resp. something like O(log32 N)). seq-contains? does
a linear search through a sequence. That is O(N). Protocols cannot
change this.

Sincerely
Meikel

Douglas Philips

unread,
Apr 28, 2010, 5:26:46 PM4/28/10
to clo...@googlegroups.com
On 2010 Apr 28, at 5:14 PM, Meikel Brandmeyer wrote:
> If you have a map or a set looking up a key is fast. So contains? is
> fast (read O(1) resp. something like O(log32 N)). seq-contains? does
> a linear search through a sequence. That is O(N). Protocols cannot
> change this.

How is having an optimized version of seq-contains? for sets/maps any
different from what Stuart showed in his video, where reduce can have
a specialization that is faster on some types?

-Doug

Meikel Brandmeyer

unread,
Apr 28, 2010, 5:32:06 PM4/28/10
to clo...@googlegroups.com
Hi,

On Wed, Apr 28, 2010 at 05:26:46PM -0400, Douglas Philips wrote:

> How is having an optimized version of seq-contains? for sets/maps
> any different from what Stuart showed in his video, where reduce can
> have a specialization that is faster on some types?

A faster reduce is still O(N). If you need fast lookup use the correct
data structure. If you do linear search it's gonna be slow, so make it
clear in the code! This makes the developer aware of the implications.

Sincerely
Meikel

Douglas Philips

unread,
Apr 28, 2010, 5:39:37 PM4/28/10
to clo...@googlegroups.com
On 2010 Apr 28, at 5:32 PM, Meikel Brandmeyer wrote:
> On Wed, Apr 28, 2010 at 05:26:46PM -0400, Douglas Philips wrote:
>
>> How is having an optimized version of seq-contains? for sets/maps
>> any different from what Stuart showed in his video, where reduce can
>> have a specialization that is faster on some types?
>
> A faster reduce is still O(N). If you need fast lookup use the correct
> data structure. If you do linear search it's gonna be slow, so make it
> clear in the code! This makes the developer aware of the implications.


Stuart's comment was to not use seq-contains? on maps or sets.
There is no reason that it cannot be the same speed as contains? if a
set or map is passed in.
Wasn't the whole point of having sequences as an abstraction to get
away from all the cruft that traditional lisp had built up around
concrete types? Why go back to that here? Can't protocols handle
optimizing this case?
If some function I call uses seq-contains? (so that it can get all the
wonderful seq-ness abstractness clojure offers) I should have to know
that internal detail and not pass in a map or set? or worse, fail to
get an optimization in that case? C'mon, this should be an easy
protocol win (or maybe I don't really understand the limits on what
protocols can do.)

-Doug

Meikel Brandmeyer

unread,
Apr 28, 2010, 6:06:54 PM4/28/10
to clo...@googlegroups.com
Hi,

On Wed, Apr 28, 2010 at 05:39:37PM -0400, Douglas Philips wrote:

> Stuart's comment was to not use seq-contains? on maps or sets.
> There is no reason that it cannot be the same speed as contains? if
> a set or map is passed in.

Ah, ok. I misunderstood what you were saying. But I think it doesn't
change the argumentation. If seq-contains? was fast on maps and sets,
people would abandon contains? because seq-contains? is "always right":
works on seqs, fast on maps and sets. And again we are in the situation,
that the developer does not make his intentions explicit.

Sincerely
Meikel

Stuart Halloway

unread,
Apr 28, 2010, 6:10:02 PM4/28/10
to clo...@googlegroups.com
The "seq" in "seq-contains?" says "I want a sequential search."
Protocols *could* make this do something different, but that would
violate the contract of this function.

Another way to put it: If you wrote a protocol to pick a fast
implementation based on type, then seq-contains? would be the fn that
the protocol would call if it couldn't find anything faster.

There have to be primitive things somewhere...

Stu

Douglas Philips

unread,
Apr 28, 2010, 6:37:44 PM4/28/10
to clo...@googlegroups.com
On 2010 Apr 28, at 6:10 PM, Stuart Halloway wrote:

> The "seq" in "seq-contains?" says "I want a sequential search."
> Protocols *could* make this do something different, but that would
> violate the contract of this function.

How would having it work faster if passed a set or map violate that?
There is no ordering guarantee for those types. Is this a contract to
be slow? Is there something about the comparison being done by seq-
contains? that is different from contains? would seq-contains? somehow
return a different answer? And what about the sorted collections,
where seq-contains? could be fast if the value sought was smaller or
larger than the values contained?

I don't see the distinction(s) you're trying to make here.


> Another way to put it: If you wrote a protocol to pick a fast
> implementation based on type, then seq-contains? would be the fn
> that the protocol would call if it couldn't find anything faster.
>
> There have to be primitive things somewhere...

If so, then why isn't there a vector-first and a list-first and abc-
first that I can use when I 'know' what type I'm working with?

Aren't they all subsumed under 'first' because Clojure understands
that premature optimization based on concrete types was the wrong
thing to do? At least that is the message I got from Rich's videos
explaining why tying all of the "old" lisp goodness to concrete types
was wrong (unnecessarily limiting).

Stuart Halloway

unread,
Apr 28, 2010, 6:55:51 PM4/28/10
to clo...@googlegroups.com
>> Another way to put it: If you wrote a protocol to pick a fast
>> implementation based on type, then seq-contains? would be the fn
>> that the protocol would call if it couldn't find anything faster.
>>
>> There have to be primitive things somewhere...
>
> If so, then why isn't there a vector-first and a list-first and abc-
> first that I can use when I 'know' what type I'm working with?

Specializations are available where there is a clear use case, e.g.
contains?, peek (instead of last), disj, etc.

> Aren't they all subsumed under 'first' because Clojure understands
> that premature optimization based on concrete types was the wrong
> thing to do? At least that is the message I got from Rich's videos
> explaining why tying all of the "old" lisp goodness to concrete
> types was wrong (unnecessarily limiting).

Tying to concrete types is limiting. *Never* having special purpose
fns that know about performance characteristics is also limiting.

contains?/seq-contains? mark a divide where it is very easy for a
language to quietly help programmers do the wrong thing.

Stu

Douglas Philips

unread,
Apr 28, 2010, 7:17:32 PM4/28/10
to clo...@googlegroups.com
On 2010 Apr 28, at 6:55 PM, Stuart Halloway wrote:
> Specializations are available where there is a clear use case, e.g.
> contains?, peek (instead of last), disj, etc.
...
> Tying to concrete types is limiting. *Never* having special purpose
> fns that know about performance characteristics is also limiting.
>
> contains?/seq-contains? mark a divide where it is very easy for a
> language to quietly help programmers do the wrong thing.

Such as littering my application code with type-checks to decide if it
should call contains? or seq-contains?
That seems exactly the wrong thing to do.

Please explain how making seq-contains? fast for sets and maps is
wrong (exactly how does it violate some contract and what contract,
explicitly is that a violation of) and application code type checking
is right, because I really don't understand why making seq-contains?
fast isn't just a huge win-win.

Douglas Philips

unread,
Apr 28, 2010, 7:27:11 PM4/28/10
to clo...@googlegroups.com
On 2010 Apr 28, at 6:06 PM, Meikel Brandmeyer wrote:
> Ah, ok. I misunderstood what you were saying. But I think it doesn't
> change the argumentation. If seq-contains? was fast on maps and sets,
> people would abandon contains? because seq-contains? is "always
> right":
> works on seqs, fast on maps and sets. And again we are in the
> situation,
> that the developer does not make his intentions explicit.

What is the intention to be made exlicit here? That some sequences are
better than others? (car/cdr/list from old-school lisps were exactly
this kind of thinking, whether deliberate or not.)

How is it a developer is supposed to say: "only use a set or map" here?
I'm new enough to Clojure that I'm not clear on how duck-type-y the
idioms are.
Should I use assert to make my intention clear?
raise an exception?
just call some function that cares and expect that it will either
assert or raise for me?
Isn't this just the kind of thing that protocols are for? Again, I
think Stuarts video example with reduce is quite apropos. For certain
things, we can make some operations faster, and we can do it, post-
facto, wihin clojure itself.

-Doug

Mark Engelberg

unread,
Apr 28, 2010, 7:38:31 PM4/28/10
to clojure
On Wed, Apr 28, 2010 at 2:39 PM, Douglas Philips <dg...@mac.com> wrote:
> If some function I call uses seq-contains? (so that it can get all the wonderful seq-ness abstractness clojure offers) I should have to know that internal detail and not pass in a map or set? or worse, fail to get an optimization in that case? C'mon, this should be an easy protocol win (or maybe I don't really understand the limits on what protocols can do.)

Doug, I also am not particularly enamored of the name seq-contains? I
agree that it conjures up notions of Scheme's proliferation of names
like list-ref, vector-ref, string-ref, etc.

But I think you're not fully appreciating the complexity of the
situation. This is not just about performance, but a question of two
different possible semantics for "contains?", which is why it can't
*only* be addressed with a protocol.

The fundamental problem is that some Clojure data structures have a
dual nature. For example, a vector can be thought of as a collection
of items, or you can think of it as a mapping from integers to items.
A hash map can be thought of as a map, or as a sequence of key-value
pairs.

The current "contains?" function is really a "contains-key?" function.
It treats the input as a data structure with keys (a map or a set),
and asks whether it contains a given key. This has been a tremendous
source of confusion to some people who apply the contains? function to
a vector, and are startled to discover that it is only asking whether
the item is a *key* in the vector, (i.e., in the case of a vector, a
valid index between 0 and length-1). Arguably, contains-key? would
have been a much better name, and made the semantics much more clear.

There are some people who believe that Clojure would benefit from a
variant of "contains?" that views the input as a collection. But how
to make this clear? That's why the proposal is for a new function
called "seq-contains?". It's not so much a warning of "this thing has
performance like a linear search", as much as, "it views your data as
a sequence". So for example, calling seq-contains? on a vector will
actually test whether the item is an element of the vector collection.
Calling seq-contains? on a map will actually test whether a given
[key value] pair is in the map (rather than just testing for the
existence of a key).

So what are some other options?:

1. Don't include seq-contains? The behavior you want can usually be
achieved by using (some #{item} coll). Disadvantage - if you're
testing to see if the collection contains nil, that won't work.
2. Rename contains? to contains-key? and make a function with the new
semantics called contains? Disadvantage - breaks existing code.
3. Keep one function called contains?, but use protocols to achieve
different semantics for different kind of data structures. So for
sets and maps, contains? would work like a contains-key? function, but
for vectors and other sequences, it would be a more intuitive
contains-item? function. Disadvantage - semantics depends on knowing
what kind of thing you're passing to the function, also might break
existing code if anyone is actually currently using contains? on
vectors.

I'm not crazy about the name seq-contains? (and I'm skeptical that it
even belongs in the core), but all the other solutions I can think of
also have major disadvantages.

That said, you make an excellent point about performance. Even though
the new seq-contains? is intended to treat the underlying data
structure as a sequence from a semantics standpoint, there is
absolutely no reason at all why its performance should be bound by
this. Clearly contains? and seq-contains? have identical semantics
for a set, so for a set data structure the new seq-contains? should
also use protocols to achieve the same speed, so seq-contains?
performs identically on a set as contains? Similarly, even though the
new seq-contains? will have a new meaning for hash maps (testing for a
key-value pair rather than just a key), clearly it is possible to use
protocols to perform this test far faster than a linear sequential
search.

So, if things do move forward with seq-contains? to create a new
semantic notion in the clojure core of how to test for containment, I
absolutely agree that protocols should be used to make this new notion
to continue to have good performance on those sorts of collections for
which it is possible to have superior performance than a sequential
search.

--Mark

Stuart Halloway

unread,
Apr 28, 2010, 7:43:47 PM4/28/10
to clo...@googlegroups.com
Wow, and I thought this was a sore subject before, when there was no
seq-contains? and its absence was always a Top 5 FAQ. :-)

I'll wait for Rich to maybe chime in on seq-contains?. Other than seq-
contains? are people liking the new fns? Anybody having issues we
didn't anticipate?

Stu

Michael Gardner

unread,
Apr 28, 2010, 7:57:22 PM4/28/10
to clo...@googlegroups.com
On Apr 28, 2010, at 6:38 PM, Mark Engelberg wrote:

> 2. Rename contains? to contains-key? and make a function with the new
> semantics called contains? Disadvantage - breaks existing code.

You could also alias contains? to contains-key?, use seq-contains? (or some other new name) for this new function, and deprecate contains? with an eye towards removal down the road. Deprecation has been used successfully even in mature languages, hasn't it?

Rich Hickey

unread,
Apr 28, 2010, 11:04:08 PM4/28/10
to clo...@googlegroups.com


On Apr 28, 7:17 pm, Douglas Philips <d...@mac.com> wrote:
> On 2010 Apr 28, at 6:55 PM, Stuart Halloway wrote:
>
> > Specializations are available where there is a clear use case, e.g.
> > contains?, peek (instead of last), disj, etc.
> ...
> > Tying to concrete types is limiting. *Never* having special purpose
> > fns that know about performance characteristics is also limiting.
>
> > contains?/seq-contains? mark a divide where it is very easy for a
> > language to quietly help programmers do the wrong thing.
>
> Such as littering my application code with type-checks to decide if
it
> should call contains? or seq-contains?
> That seems exactly the wrong thing to do.
>
> Please explain how making seq-contains? fast for sets and maps is
> wrong (exactly how does it violate some contract and what contract,
> explicitly is that a violation of) and application code type checking
> is right, because I really don't understand why making seq-contains?
> fast isn't just a huge win-win.
>

It is an important aspect of Clojure that, in general, performance
guarantees are part of the semantics of functions. In particular,
functions are not supported on data structures where they are not
performant.

You can't just swap out sequences or lists for maps or sets in a
program. So, you need to use the right data structure for the job, and
then the right functions will be available. When it makes sense, some
functions are maximally polymorphic (e.g. seq, or into). But lookup,
under any name, shouldn't be, IMO, so it isn't in Clojure. Similarly
there is no insert at the beginning of vectors, append to end of
lists, lookup for values of maps, insertion in the middle of sequences
etc. I simply don't want to provide the tools for people to write
programs in Clojure with N^2 performance - that benefits no one. But
I've seen such performance degradation in many a Lisp program as
people build upon O(n) functions on lists.

If you have a lookup table in your program, please use a set or map,
or, if by index, a vector. You will have get, contains? or nth
available, performance will be good, and that will be clear from the
code.

seq-contains? exists because sometimes, say in a macro, you have a
known short sequence and need to test for the presence of something.
There's no need to copy it into a set. And people have been rolling
their own includes? etc for this job. Now everyone can use seq-
contains? for that. Its poor performance is indicated by its name and
documentation. Should someone try to take a piece of code based on seq-
contains? and try to scale it, they will see that function as an
obvious bottleneck.

Your type-check argument is a straw man that doesn't hold up in
Clojure practice. People tend to use the right data structures for the
job and don't choose algorithms via type checks.

Clojure is not particularly duck-type-y, in general things are not
unified by the presence of same-named functions but by underlying
shared abstractions. Just because protocols can be used to make things
reach many types doesn't mean they should be used to unify things that
aren't uniform.

If you want to get a feel for how the abstractions in Clojure are
organized, have a look at the slides in the middle of this
presentation (about 30 pages in):

http://clojure.googlegroups.com/web/tutorial.pdf

While you may not agree with it (and everyone disagrees with
contains? :), at least the design is a considered one.

Rich

Douglas Philips

unread,
Apr 29, 2010, 1:39:10 AM4/29/10
to clo...@googlegroups.com
On 2010 Apr 28, at 7:38 PM, Mark Engelberg wrote:
> But I think you're not fully appreciating the complexity of the
> situation. This is not just about performance, but a question of two
> different possible semantics for "contains?", which is why it can't
> *only* be addressed with a protocol.

...

> The fundamental problem is that some Clojure data structures have a
> dual nature. For example, a vector can be thought of as a collection
> of items, or you can think of it as a mapping from integers to items.
> A hash map can be thought of as a map, or as a sequence of key-value
> pairs.

Agreed on the hash map. For vectors, Clojure has already decided that
'seq'ing them ignores their indexes.


> The current "contains?" function is really a "contains-key?" function.

Agreed.

> There are some people who believe that Clojure would benefit from a
> variant of "contains?" that views the input as a collection. But how
> to make this clear? That's why the proposal is for a new function
> called "seq-contains?". It's not so much a warning of "this thing has
> performance like a linear search", as much as, "it views your data as
> a sequence".

Yup.

...


> So, if things do move forward with seq-contains? to create a new
> semantic notion in the clojure core of how to test for containment, I
> absolutely agree that protocols should be used to make this new notion
> to continue to have good performance on those sorts of collections for
> which it is possible to have superior performance than a sequential
> search.

Perhaps protocols are a touchy subject and there is another mechanism
to use, but I haven't yet seen a coherent argument against some
mechanism for enhancing performance.

Thanks for the detailed analysis. I guess I had hoped it wouldn't've
been necessary. :)

-Doug

Meikel Brandmeyer

unread,
Apr 29, 2010, 1:48:23 AM4/29/10
to Clojure
Hi,

On 29 Apr., 01:38, Mark Engelberg <mark.engelb...@gmail.com> wrote:

> 1.  Don't include seq-contains?  The behavior you want can usually be
> achieved by using (some #{item} coll).  Disadvantage - if you're
> testing to see if the collection contains nil, that won't work.

Not entirely correct. (some #(= % item) coll) works always and is
basically how seq-contains? is implemented. However there are
opinions that some is not obvious enough.

Sincerely
Meikel

Douglas Philips

unread,
Apr 29, 2010, 1:49:55 AM4/29/10
to clo...@googlegroups.com
On 2010 Apr 28, at 11:04 PM, Rich Hickey wrote:
> It is an important aspect of Clojure that, in general, performance
> guarantees are part of the semantics of functions. In particular,
> functions are not supported on data structures where they are not
> performant.

Understood. That isn't the point/question I was trying to raise.
Again, back to the demo of protocols in Stuart's video, the question
isn't "is seq-contains? potentially lying about its performance" but
"why shouldn't any function, including seq-contains? be as fast as it
can be, when it can be?"

...<most of the rest is a slight tangent from my fundamental question,
however interesting it might be for other reasons>...



> And people have been rolling their own includes? etc for this job.
> Now everyone can use seq-contains? for that. Its poor performance is
> indicated by its name and documentation. Should someone try to take
> a piece of code based on seq-contains? and try to scale it, they
> will see that function as an obvious bottleneck.

Really? We've known for a long time that people suck at guessing where
programs have performance bottlenecks (1), and that taking a premature
optimization attitude "obvious bottleneck" is wrong headed (2).

(1) consing is bad!
(2) 20G of garbage per second and the CPU utilization barely breaks
10%? Hmmm, I guess consing isn't so bad and that thinking "consing is
bad" is just another old, preconceived notion proved wrong, so that
'obvious bottleneck' turns out not to be so obvious after all.

Shouldn't seq-contains? be parallelizable, esp. vectors where you can
easily create O(1) slices... or use other techniques when sequences of
certain kinds might have other exploitable properties?


What is the purpose, goal, design-rationale of not making seq-
contains? fast on maps or sets?



> Clojure is not particularly duck-type-y, in general things are not
> unified by the presence of same-named functions but by underlying
> shared abstractions. Just because protocols can be used to make
> things reach many types doesn't mean they should be used to unify
> things that aren't uniform.

Are saying that seq-contains? will not work on sets and maps because
they aren't uniform w.r.t. sequences or that they are uniform, but
protocols won't fit here as an optimization technique because _____
(being brand new, I don't know all the subtleties of protocols), or ???



> If you want to get a feel for how the abstractions in Clojure are
> organized, have a look at the slides in the middle of this
> presentation (about 30 pages in):
>
> http://clojure.googlegroups.com/web/tutorial.pdf

Actually I have seen those slides already, but had lost track of where
I had found them, thanks for the reminder pointer.



> While you may not agree with it (and everyone disagrees with
> contains? :), at least the design is a considered one

I generally agree with the design Clojure embodies as an outlook :).
As to contains? ... I agree with Mark Engelberg's detailed analysis
and conclusions.


-Doug

Meikel Brandmeyer

unread,
Apr 29, 2010, 1:57:23 AM4/29/10
to Clojure
Hi,

On 29 Apr., 01:43, Stuart Halloway <stuart.hallo...@gmail.com> wrote:

> I'll wait for Rich to maybe chime in on seq-contains?. Other than seq-
> contains? are people liking the new fns? Anybody having issues we  
> didn't anticipate?

I was a little bit surprised about map-indexed and keep-indexed.
Why specialised functions when (map f (indexed coll)) just works
as well, but I must confess that having (f idx item) is nicer compared
to (f [idx item]). On the other hand one could do (map #(apply f %)
(indexed coll)), but again this doesn't play nice with #() functions.

One thing I noticed in the implementation of keep-indexed.

keepi is defined as (letfn [(keepi [idx coll] ...)] ...). However in
the
non-chunked case it is called as (keepi f (inc idx) (rest s)) resp.
as (keepi f (rest s)). I think there is something goofy going on here.

Sincerely
Meikel

Mark Engelberg

unread,
Apr 29, 2010, 2:17:50 AM4/29/10
to clojure
On Wed, Apr 28, 2010 at 10:49 PM, Douglas Philips <dg...@mac.com> wrote:
> What is the purpose, goal, design-rationale of not making seq-contains? fast
> on maps or sets?

I think Rich's point is that if seq-contains? is sometimes fast and
sometimes slow, then it makes it harder to reason about a program that
uses seq-contains?. Is this one of the "slow" usages, or one of the
"fast" usages? On the other hand, with the current plan, you know
exactly what performance you're getting. contains? will always be
fast on sets and maps (albeit returning meaningless results for lists,
and counterintuitive results for vectors). seq-contains? always does
a linear search. So if you see seq-contains? in your code, this
serves as a big red flag that either you should be using a set data
structure and contains?, or you better be sure your sequence is
relatively small.

If seq-contains? sometimes had good performance, then people would
start using a mixture of seq-contains? and contains? for sets and
maps, and it would be less clear when you were using seq-contains?
"improperly".

> I generally agree with the design Clojure embodies as an outlook :).
> As to contains? ... I agree with Mark Engelberg's detailed analysis and
> conclusions.

Thanks. But I definitely see the logic of Rich's approach as well.

As a side note, maybe contains? should return an error message when
used on a non-keyed collection like a list. With the confusion that's
sure to result from having both contains? and seq-contains? in the
core with slightly different semantics, applicable inputs, and
performance, an error message for inappropriate uses of contains?
would be very valuable, I think.

ataggart

unread,
Apr 29, 2010, 4:21:54 AM4/29/10
to Clojure
I know it won't matter, but for posterity if nothing else...

Functions named contains-key? and contains-val? would make a lot more
sense to me than the current contains? and new seq-contains?. Anyone
looking at contains-val? should expect it to be O(n). The only
effective difference would be that the test value for contains-val? is
consistently a single value rather than a [key value] tuple for maps.

Lists:
(contains-key? '(:foo :bar) 0)
Exception
(contains-val? '(:foo :bar) :foo)
true

Vectors:
(contains-key? [:foo :bar] 0)
true
(get [:foo :bar] 0)
:foo
(contains-key? [:foo :bar] 2)
false
(contains-val? [:foo :bar] :foo)
true
(contains-val? [:foo :bar] :baz)
false

Maps:
(contains-key? {:foo :bar} :foo)
true
(get {:foo :bar} :foo)
:bar
(contains-key? {:foo :bar} :baz)
false
(contains-val? {:foo :bar} :bar)
true
(contains-val? {:foo :bar} :baz)
false

Sets:
(contains-key? #{:foo :bar} :foo)
true
(get #{:foo :bar} :foo)
:foo
(contains-key? #{:foo :bar} :baz)
false
(contains-val? #{:foo :bar} :foo)
true
(contains-val? #{:foo :bar} :baz)
false

Rich Hickey

unread,
Apr 29, 2010, 6:53:38 AM4/29/10
to clo...@googlegroups.com

On Apr 29, 2010, at 1:57 AM, Meikel Brandmeyer wrote:

> Hi,
>
> On 29 Apr., 01:43, Stuart Halloway <stuart.hallo...@gmail.com> wrote:
>
>> I'll wait for Rich to maybe chime in on seq-contains?. Other than
>> seq-
>> contains? are people liking the new fns? Anybody having issues we
>> didn't anticipate?
>
> I was a little bit surprised about map-indexed and keep-indexed.
> Why specialised functions when (map f (indexed coll)) just works
> as well, but I must confess that having (f idx item) is nicer compared
> to (f [idx item]). On the other hand one could do (map #(apply f %)
> (indexed coll)), but again this doesn't play nice with #() functions.
>
> One thing I noticed in the implementation of keep-indexed.
>
> keepi is defined as (letfn [(keepi [idx coll] ...)] ...). However in
> the
> non-chunked case it is called as (keepi f (inc idx) (rest s)) resp.
> as (keepi f (rest s)). I think there is something goofy going on here.
>

Fixed - thanks.

Rich

Mark J. Reed

unread,
Apr 29, 2010, 7:41:17 AM4/29/10
to clo...@googlegroups.com
I like this proposal. I'd make contains? an alias for contains-key?
with a deprecation warning, and just forget about seq-contains? in
favor of contains-val?
--
Mark J. Reed <mark...@gmail.com>

Laurent PETIT

unread,
Apr 29, 2010, 7:52:34 AM4/29/10
to clo...@googlegroups.com
2010/4/29 Mark J. Reed <mark...@gmail.com>:
> I like this proposal.  I'd make contains? an alias for contains-key?
> with a deprecation warning, and just forget about seq-contains? in
> favor of contains-val?

This makes a lot of sense to me.
(and have, as suggested by ataggart, contains-key? complain if passed a seq)

Stuart Halloway

unread,
Apr 29, 2010, 8:11:00 AM4/29/10
to clo...@googlegroups.com
Thinking about this one.

> I like this proposal. I'd make contains? an alias for contains-key?
> with a deprecation warning, and just forget about seq-contains? in
> favor of contains-val?

Douglas Philips

unread,
Apr 29, 2010, 8:56:49 AM4/29/10
to clo...@googlegroups.com
On 2010 Apr 29, at 2:17 AM, Mark Engelberg wrote:
> On Wed, Apr 28, 2010 at 10:49 PM, Douglas Philips wrote:
>> What is the purpose, goal, design-rationale of not making seq-
>> contains? fast
>> on maps or sets?
>
> I think Rich's point is that if seq-contains? is sometimes fast and
> sometimes slow, then it makes it harder to reason about a program that
> uses seq-contains?.

If I'm writing seq-friendly functions, what you're saying is that I am
guaranteed to be always slow?
That if the code using my function(s) passes in a set instead of a
vector there shouldn't be a speed up?



> seq-contains? always does a linear search.

I'm not arguing against that, but apparently it is such a popular
straw man that I don't have to. :)

Is the contract for seq-contains that it will always examine every
element, or is it permitted to stop early if it finds a match? Is it
forbidden for range and seq-contains? to exploit the kind of protocol-
ish benefits that Stuart showed in his video with reduce and strings?



> So if you see seq-contains? in your code, this
> serves as a big red flag that either you should be using a set data
> structure and contains?, or you better be sure your sequence is
> relatively small.

Maybe. But maybe it is a "red flag" that I'm accepting any kind of
seq, and that my code will work for whatever seq type makes sense for
the problem my function is being used to help solve.



> If seq-contains? sometimes had good performance, then people would
> start using a mixture of seq-contains? and contains? for sets and
> maps, and it would be less clear when you were using seq-contains?
> "improperly".

Oh, please. First you say I'm too stoopid to profile my code :), and
that I won't be able to reason about it, and now you're saying that I
will profile it but come to some broken conclusions? I already know
that if I use a "may have to check every element" containment function
that it might <dramatic music/> have to check every element. At least
have the guts to make the slow function guaranteed slow (it ALWAYS
checks every element) so that I don't ever get some potentially
unclear idea about what the performance will be, and that I should
never use it on infinite sequences. Sheesh.

-Doug

Douglas Philips

unread,
Apr 29, 2010, 9:06:59 AM4/29/10
to clo...@googlegroups.com
On 2010 Apr 29, at 4:21 AM, ataggart wrote:
> Functions named contains-key? and contains-val? would make a lot more
> sense to me than the current contains? and new seq-contains?. Anyone
> looking at contains-val? should expect it to be O(n). The only
> effective difference would be that the test value for contains-val? is
> consistently a single value rather than a [key value] tuple for maps.

+1

-Doug

Douglas Philips

unread,
Apr 29, 2010, 9:12:35 AM4/29/10
to clo...@googlegroups.com
On 2010 Apr 29, at 7:52 AM, Laurent PETIT wrote:
> 2010/4/29 Mark J. Reed <mark...@gmail.com>:
>> I like this proposal. I'd make contains? an alias for contains-key?
>> with a deprecation warning, and just forget about seq-contains? in
>> favor of contains-val?
>
> This makes a lot of sense to me.
> (and have, as suggested by ataggart, contains-key? complain if
> passed a seq)

user=> (assoc '(1 2 3) 3 4)
java.lang.ClassCastException: clojure.lang.PersistentList
(NO_SOURCE_FILE:0)

If contains-key? complains any less cryptically, then perhaps assoc
should to?
(I'm certainly willing to submit potential patches in this area...
assoc's failure to 'protect itself/provide a more useful complaint
message' seems duck-type-y...
it seems that contains-key? should keep in line with assoc, or visa-
versa, whichever is the more clojure-y approach.)

-Doug

Mark Hamstra

unread,
Apr 29, 2010, 10:05:28 AM4/29/10
to Clojure


On Apr 29, 2:17 am, Mark Engelberg <mark.engelb...@gmail.com> wrote:
> On Wed, Apr 28, 2010 at 10:49 PM, Douglas Philips <d...@mac.com> wrote:
> > What is the purpose, goal, design-rationale of not making seq-contains? fast
> > on maps or sets?
>
> I think Rich's point is that if seq-contains? is sometimes fast and
> sometimes slow, then it makes it harder to reason about a program that
> uses seq-contains?.  Is this one of the "slow" usages, or one of the
> "fast" usages?  On the other hand, with the current plan, you know
> exactly what performance you're getting.  contains? will always be
> fast on sets and maps (albeit returning meaningless results for lists,
> and counterintuitive results for vectors).  seq-contains? always does
> a linear search.  So if you see seq-contains? in your code, this
> serves as a big red flag that either you should be using a set data
> structure and contains?, or you better be sure your sequence is
> relatively small.

I don't fully agree with the reasoning here. It seems clear that part
of what contains? means should be "will be fast," so contains? must
only be available for types where this performance can be guaranteed.
What is less clear to me is whether part of what seq-contains? means
should be "will be slow/will be implemented as an O(n) sequential
search," or whether it should be "may be slow." I'm not convinced
that "may be slow" doesn't raise enough of a warning flag and makes
reasoning about a program harder than does "will be slow" -- and "will
be implemented as..." smells bad.

Sean Devlin

unread,
Apr 29, 2010, 10:37:57 AM4/29/10
to Clojure
(send contains-val? inc)

Michael Gardner

unread,
Apr 29, 2010, 12:49:55 PM4/29/10
to clo...@googlegroups.com
On Apr 29, 2010, at 3:21 AM, ataggart wrote:

> I know it won't matter, but for posterity if nothing else...
>
> Functions named contains-key? and contains-val? would make a lot more
> sense to me than the current contains? and new seq-contains?. Anyone
> looking at contains-val? should expect it to be O(n). The only
> effective difference would be that the test value for contains-val? is
> consistently a single value rather than a [key value] tuple for maps.

+1. I can't imagine any use case for looking up a whole [key, value] pair in a hash-map.

MarkSwanso