Question about sets

1,138 views
Skip to first unread message

Mark Engelberg

unread,
Aug 4, 2012, 9:43:34 PM8/4/12
to clojure
What is the reasoning behind the design decision that this generates an error:
(let [a :x b :x] #{a b})
rather than just returning #{:x} ?

I'm finding that in practice, it limits the utility of #{} notation if you have to somehow know in advance that all the things in the set are different.

Sean Corfield

unread,
Aug 4, 2012, 10:23:58 PM8/4/12
to clo...@googlegroups.com
On Sat, Aug 4, 2012 at 6:43 PM, Mark Engelberg <mark.en...@gmail.com> wrote:
> What is the reasoning behind the design decision that this generates an
> error:
> (let [a :x b :x] #{a b})
> rather than just returning #{:x} ?

My first reaction was that literals have to obey the rules of the
underlying type or else they are not valid literals:

#{1 2 1} ;; error
{:x 1 :y 2 :x 3} ;; error

I hadn't even thought of using the set literal syntax with variables
that might not have unique value. I guess I'd ask: why not use the set
function?

(let [a :x b :x] (set [a b])) ;; #{:x}
--
Sean A Corfield -- (904) 302-SEAN
An Architect's View -- http://corfield.org/
World Singles, LLC. -- http://worldsingles.com/

"Perfection is the enemy of the good."
-- Gustave Flaubert, French realist novelist (1821-1880)

Mark Engelberg

unread,
Aug 5, 2012, 1:52:53 AM8/5/12
to clo...@googlegroups.com
On Sat, Aug 4, 2012 at 7:23 PM, Sean Corfield <seanco...@gmail.com> wrote:
I hadn't even thought of using the set literal syntax with variables
that might not have unique value. I guess I'd ask: why not use the set
function?

(let [a :x b :x] (set [a b])) ;; #{:x}


Yes, my first thought was, "I can just work around this by changing every occurrence of something like #{a b} to (set [a b])."

My second thought was, "Oh crap.  I have to go through my entire codebase now, and inspect every use of set literal notation, and convert everything to something like (set [a b]) in order to guarantee I won't have a runtime error.  Furthermore, in the process of doing so, I'll hurt the readability of my code.  Not only is it longer, but the whole point of Clojure providing convenient syntax for hashes and sets is to fight the visual sea-of-sameness and make different data types pop out.  Now my sets become just another function call.  Ugh."

Mark Engelberg

unread,
Aug 5, 2012, 1:59:25 AM8/5/12
to clo...@googlegroups.com
Perhaps you don't use sets as much as I do, so to help you put it in perspective, imagine how you'd feel if I told you:

"Oh by the way, that nice vector notation you use to write things like [1 3 5 7 2].  Yeah, well, it's only reliable if there are constants inside.  That thing you've been doing where you write [a b c] -- that usually works, so you've probably been doing that a lot in your code.  But some small percentage of the time, that will generate a run-time error.  From now on, the only way to be safe is to replace every instance of a vector with non-constants with a call to vector, i.e., (vector a b c)."

You *could* fix your code and develop new habits, but how would you feel about it?

--Mark

Baishampayan Ghose

unread,
Aug 5, 2012, 2:05:00 AM8/5/12
to clo...@googlegroups.com
To add my two cents, IMHO the literal notation for sets, vectors &
maps are just that---they are the "literal" representations of those
data-structures and are not semantically equivalent to constructor
functions. As such, the literal notation should really be used with
constants; in every other case, you should use the constructor
functions.

I understand that you will have to probably change a lot of code, but
I think that's one (semantic) mistake that needs correcting.

Regards,
BG
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your
> first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en



--
Baishampayan Ghose
b.ghose at gmail.com

Sean Corfield

unread,
Aug 5, 2012, 2:14:15 AM8/5/12
to clo...@googlegroups.com
On Sat, Aug 4, 2012 at 10:52 PM, Mark Engelberg
<mark.en...@gmail.com> wrote:
> Yes, my first thought was, "I can just work around this by changing every
> occurrence of something like #{a b} to (set [a b])."

I guess my thinking is a literal doesn't contain variables :)

I'd have #{1 2 3} and #{:a :b :c} and #{'x 'y 'z} but I would never
have thought of #{var-a var-b}... I'm trying to think whether I've
even done that for regular maps or whether I've used the hash-map
function or into {} to create maps from "variables"...

> My second thought was, "Oh crap. I have to go through my entire codebase
> now, and inspect every use of set literal notation

This and your follow-up made me go back to my comment above and take a
look at my code... I pretty much only use the literal notation for
constants. I have just a couple of places where I use [ ] around
expressions, and even then it tends to be just pairs. Everywhere else
I already use vector - or vec - and where I have maps, the keys are
constants and only the values are expressions.

And while I was writing this, BG's post just came in which kinda fits
with my thinking too... Definitely interesting to see another way of
looking at those constructs tho'...

Mark Engelberg

unread,
Aug 5, 2012, 2:45:55 AM8/5/12
to clo...@googlegroups.com
On Sat, Aug 4, 2012 at 11:05 PM, Baishampayan Ghose <b.g...@gmail.com> wrote:
To add my two cents, IMHO the literal notation for sets, vectors &
maps are just that---they are the "literal" representations of those
data-structures and are not semantically equivalent to constructor
functions. As such, the literal notation should really be used with
constants; in every other case, you should use the constructor
functions.

My jaw is dropping here.  I can't believe anyone would seriously propose that the only things that go between [], {}, or #{} should be constants.

BTW, I'm well aware that literal representations are not equivalent to the constructor function.  This is precisely why it is *better* to use the literal representation -- it chooses intelligently among the appropriate constructor functions.

For example, compare:
(type {1 2})
(type (hash-map 1 2))

Note that the {} notation intelligently chose array map as the underlying representation because it is more appropriate for small maps.

In any case, Clojure is already able to detect when x and y are equal in something like #{x y} and report it as an error.  I fail to see how it's better for Clojure to crash my program in such an event rather than to do the intuitive thing and just make the intended set.

And if for some reason, the behavior absolutely has to be the way it is, I'd much rather the *compiler* warn for any use of a non-constant in a set literal or in the key position of a map.  Why allow variables if such use is prone to error?

Sean Corfield

unread,
Aug 5, 2012, 2:56:49 AM8/5/12
to clo...@googlegroups.com
On Sat, Aug 4, 2012 at 11:45 PM, Mark Engelberg
<mark.en...@gmail.com> wrote:
> In any case, Clojure is already able to detect when x and y are equal in
> something like #{x y} and report it as an error.

So do you think #{1 1} should not be an error? And {:a 1 :a 2}? These
seem like "obvious" programmer errors that I'd want the compiler to
catch.

> And if for some reason, the behavior absolutely has to be the way it is, I'd
> much rather the *compiler* warn for any use of a non-constant in a set
> literal or in the key position of a map.

That sounds like a good enhancement to me.

Mark Engelberg

unread,
Aug 5, 2012, 3:17:56 AM8/5/12
to clo...@googlegroups.com
On Sat, Aug 4, 2012 at 11:56 PM, Sean Corfield <seanco...@gmail.com> wrote:
On Sat, Aug 4, 2012 at 11:45 PM, Mark Engelberg
<mark.en...@gmail.com> wrote:
> In any case, Clojure is already able to detect when x and y are equal in
> something like #{x y} and report it as an error.

So do you think #{1 1} should not be an error? And {:a 1 :a 2}? These
seem like "obvious" programmer errors that I'd want the compiler to
catch.


I can see the argument for the compiler flagging {:a 1 :a 2} as a likely programmer error and reporting a warning.  But IMO such a thing shouldn't cause a program crash at runtime.  Runtime crashes are bad.  If such a thing were to occur as the result of some sort of dynamic runtime behavior, it is perfectly reasonable to assume that the standard map semantics should hold where the right-most occurrence of a key takes precedence.

My opinion is that #{1 1} should not be an error.  In math notation, with hundreds of years of precedence, {1, 1} = {1}.  Nevertheless, the same argument as above about warning versus error could hold here as well.

What do you feel is gained by making it a fatal error?  In my view, in the case where you stick to only using constants in your set and map notation, the benefit is roughly the same either way -- such use is likely to be flagged on your first run of the code and can be fixed quickly and easily regardless of whether it is a warning or error.  But when your set and map notation depends on some variable, a warning is much better than an error.

Alex Baranosky

unread,
Aug 5, 2012, 3:44:59 AM8/5/12
to clo...@googlegroups.com
I'm on Mark's side.  #{a b} potentially throwing a runtime exception is one of those quirky Clojure-isms that would be considered bugs to anyone not deeply entrenched in the Clojure world.  Let's remove the Clojure-colored glasses.

Vinzent

unread,
Aug 5, 2012, 6:33:33 AM8/5/12
to clo...@googlegroups.com
I agree with puzzler and Alex.

Also, I have a huge number of literal vectors with expressions inside (typical example: (let [[x y] coll] ...) is an equvalent to [(coll 0) (coll 1)]).

Chas Emerick

unread,
Aug 5, 2012, 10:33:12 AM8/5/12
to clo...@googlegroups.com
On Aug 5, 2012, at 2:56 AM, Sean Corfield wrote:

On Sat, Aug 4, 2012 at 11:45 PM, Mark Engelberg
<mark.en...@gmail.com> wrote:
In any case, Clojure is already able to detect when x and y are equal in
something like #{x y} and report it as an error.

So do you think #{1 1} should not be an error? And {:a 1 :a 2}? These
seem like "obvious" programmer errors that I'd want the compiler to
catch.

I hit exactly the problem the OP raised this past week.  I knew of the (good) restriction on e.g. #{1 1} being an error, and remember some of the conversations leading up to the changes that made it so, but had either forgotten or never quite internalized that #{a b} is also an error, where a and b are equivalent values.

First, some history:
Note that the .createWithCheck variations of all of the collections in question are used by their "constructor" functions as well, e.g. hash-set, hash-map, and array-map:

=> (hash-set 1 2 2)
IllegalArgumentException Duplicate key: 2  clojure.lang.PersistentHashSet.createWithCheck (PersistentHashSet.java:80)
=> (hash-map 1 2 1 3)
IllegalArgumentException Duplicate key: 1  clojure.lang.PersistentHashMap.createWithCheck (PersistentHashMap.java:92)
=> (array-map 1 2 1 3)
IllegalArgumentException Duplicate key: 1  clojure.lang.PersistentArrayMap.createWithCheck (PersistentArrayMap.java:70)

The only way to get around the checks here is to use `set` or `into`; note that there is no "constructor" function for an unsorted map that does not check that the provided keys are unique.

Interestingly, sorted maps and sets do *not* have the same restriction:

=> (sorted-map 1 2 1 3)
{1 3}
=> (sorted-set 1 2 1)
#{1 2}

Quoting Rich from the mailing list thread linked above:

These are bugs in user code. Map literals are in fact read as maps, so 
a literal map with duplicate keys isn't going to produce an evaluated 
map with distinct keys. If you create an array map with duplicate 
keys, bad things will happen.

In the end, even though I've been recently "bitten" by the checked creation of sets from a literal, I think it's a reasonable approach.  In #{a b}, you are specifying the creation of a set containing two and exactly two values, those named by a and b.  There's an explicit invariant being specified in that code.  Thinking over my various uses of #{} syntax, I remember times where I've expected it to enforce that invariant (and throw an error if dupes were provided) and times where I've expected it to implicitly apply `distinct` to the values provided; that indicates sloppiness on my part, not a place where the language should become psychic.

In contrast, `set` and `into` each accept a seqable collection of data, and are explicit about their support for sifting out duplicates (right in the docstring of `set`, and transitively so for `into` due to its use of `conj` and its semantics on sets).

Finally, for the sake of consistency, it seems like the same checks should be applied by the sorted map and set "constructor" functions, and that there should be a map corollary to `set` (i.e. a function that is the equivalent of #(into {} %), just as `set` is the equivalent of #(into #{} %)).  This last one is problematic in terms of naming, though.

Cheers,

- Chas

--
http://cemerick.com
[Clojure Programming from O'Reilly](http://www.clojurebook.com)

Sean Corfield

unread,
Aug 5, 2012, 12:28:09 PM8/5/12
to clo...@googlegroups.com
On Sun, Aug 5, 2012 at 3:33 AM, Vinzent <ru.vi...@gmail.com> wrote:
> Also, I have a huge number of literal vectors with expressions inside
> (typical example: (let [[x y] coll] ...) is an equvalent to [(coll 0) (coll
> 1)]).

A destructuring vector is not the same thing as is being discussed
here: (let [[x y] coll] ...) is equivalent to (let [x (coll 0) y (coll
1)] ...) which is not even a vector containing expressions.

Mark Engelberg

unread,
Aug 5, 2012, 2:50:08 PM8/5/12
to clo...@googlegroups.com
On Sun, Aug 5, 2012 at 7:33 AM, Chas Emerick <ch...@cemerick.com> wrote:
Note that the .createWithCheck variations of all of the collections in question are used by their "constructor" functions as well, e.g. hash-set, hash-map, and array-map:

I hadn't noticed that, but I think that is good evidence that this createWithCheck concept has been taken to an unintended extreme.  This is no longer about "literal" versions of maps/sets and "constructed" versions of maps/sets, so all the initial comments about how literal vectors/maps/sets should only have constants are no longer relevant here (and the decision to make constructors call the same builder as the literal syntax is actually further evidence in support of my claim that literals and constructed versions aren't intended to be hugely different in their semantics).  Instead, this is about how maps and sets are intended to be used, overall.

As I recall, the way constructed maps used to work was that when there were duplicate keys, the right-most version took precedence.  To the best of my knowledge, no one every complained about this.  It was intuitive, and consistent with the behavior of into.  The only complaint was that very small maps, built with the literal syntax, didn't do the intuitive thing and behave like larger maps, because behind the scenes they used array maps which didn't check for duplicate keys.  If you said something like {:a 1 :a 2}, Clojure would happily go ahead and build something that wouldn't behave the way you'd expect (e.g., it would return the wrong count and possibly the first key-value pair would take precedence over the last).  But the solution isn't to go ahead and alter the semantics of all forms of maps and sets!  All people really wanted was to bring ArrayMaps into accordance with the other forms of maps so that the behavior would be more predictable.

Furthermore, there's a *long* history of sets being used to reduce a collection with duplicates into something that has no duplicates.  You're creating a huge stumbling block for people if you create some arbitrary rule that with one particular set of syntaxes (i.e., (set [1 2 1]) (sorted-set 1 2 1)), sets do what you expect and reduce collections with duplicates to one that has no duplicates, but that with another set of syntaxes (i.e., (hash-set 1 2 1) or #{1 2 1}), it just breaks.  That's seriously confusing, and counter to the spirit of what sets are supposed to do, in my opinion.

It sounds to me like when the dev team thought through the issue about how to fix the problem with array maps, they were thinking specifically about the cases where maps and sets are comprised entirely of constants, and thought it would be a convenient place to try to catch human errors of the form {:apple 1, :apple 2} where someone unintentionally duplicated a constant key.  If somehow, the checking could only be limited to constants, I'd be perfectly happy with the error check.  But I suspect there is no easy way to make Clojure behave this way.

Therefore, it's important to be realistic and acknowledge that sets and maps are created in a variety of ways, both with constants and variables, and we want consistent semantics across these constructions.  Run-time errors for duplicate keys in certain kinds of constructions of maps and sets strikes me as running counter to Clojure's mantra of simplicity.

You talk about whether it's fair to expect Clojure to be a mind-reader.  Of course not.  And there are all sorts of things that Clojure can't conceivably check for me when I'm typing in data literals, for example, if I type {:apple 1, :banano 2}, it's not going to warn me that I spelled banana wrong, nor should it (if I wanted that kind of protection, I'd use a static-typed language where each type of data was hardcoded to only allowing certain fields).  The reality is that any sort of constant data entered into Clojure has to be carefully double-checked for typos, because most things are not checkable or mind-readable by Clojure.  Going to such error-checking extremes and creating complex semantics in order to protect the user from errors like {:apple 1 :apple 2} is unwarranted.

If my code doesn't work properly because I typed {:apple 1 :apple 2}, I blame only myself.  If my code breaks because somewhere in my code, I legitimately built a dynamic set that had duplicate entries and I expected the duplicates to be removed rather than an error thrown, I blame Clojure.

Vinzent

unread,
Aug 5, 2012, 3:36:18 PM8/5/12
to clo...@googlegroups.com
Oh, right; I've said something silly. What I had in mind is something like (let [[x y] (if cond [(coll 0) (coll 1)] [default (first coll)]), which is quite common in e.g. macros with optional arguments. Also, vectors containing expressions can be used by functions returning multiple values, or the application data itself can be represented by vector (consider a function "rand-color", which returns a vector of 3 random numbers between 0 and 255 (though, I'd rather use repeatedly in this case)).

воскресенье, 5 августа 2012 г., 22:28:09 UTC+6 пользователь Sean Corfield написал:

Mark Engelberg

unread,
Aug 5, 2012, 3:38:17 PM8/5/12
to clo...@googlegroups.com
On Sun, Aug 5, 2012 at 7:33 AM, Chas Emerick <ch...@cemerick.com> wrote:
Quoting Rich from the mailing list thread linked above:

These are bugs in user code. Map literals are in fact read as maps, so 
a literal map with duplicate keys isn't going to produce an evaluated 
map with distinct keys. If you create an array map with duplicate 
keys, bad things will happen.


One clarification about this quote is in order though.  If you read through the thread, you'll see that Rich was *not* saying this in the sense of, "{1 2, 1 3} is an error, and we need to throw an exception."  He was actually calling it an error in the sense of defending his desire to do no check at all.  In many places, Clojure exhibits a garbage-in-garbage-out philosophy, and this is actually what he was advocating.  He was just saying that if someone types {1 2, 1 3}, it's their fault, and they deserve whatever bad things come out of it.

Mark Engelberg

unread,
Aug 5, 2012, 4:14:26 PM8/5/12
to clo...@googlegroups.com
After reading through the links that Chas provided, here's a summary of the main points as I see them:

1.  Looking through the history of this issue, no one was actually asking to be protected from accidentally typing duplicate keys in a map or set.

2.  People *were* asking to be protected from the fact that ArrayMap's behavior was inconsistent with other maps, because this led to unpredictability (since map literal notation would choose different underlying implementations in a way that wasn't entirely transparent).

3.  Rich expressed that he wasn't sure the cost of a duplicate-key check was worth it to fix this issue.

4.  Someone eventually decided it was worth the cost of a duplicate-key check to fix the semantic mismatch between ArrayMaps and other maps, but instead of using this check to bring ArrayMaps into concordance with other maps, used the duplicate-key check to throw a hard error for a wide variety (but not all) of maps and sets.

5.  This adds semantic complexity and breaks code, in surprising ways, at run-time.  Not good.

6.  Solution: Put hash maps and hash sets back to the way they were -- they worked perfectly fine.  Use the duplicate key check in ArrayMap to make ArrayMaps behave like all the other maps, i.e., last instance of a key wins.

Stuart Halloway

unread,
Aug 5, 2012, 4:18:17 PM8/5/12
to clo...@googlegroups.com
My 2c:

1. Simplicity is partially about having orthogonal primitives. A duplicate-removing collection factory cannot be sensibly used to implement a throw-on-duplicates collection factory, nor vice versa, so both seem equally primitive to me.

2. Given that both flavors are useful, they should both be provided, with explicit docs distinguishing them. This could involve new variants of the sorted collections for completeness.

3. People may differ about which flavor constructor the literals should use, but I don't see any arguments here warranting a breaking change.

In short: can't this be fixed by fixing the docstrings?

Stu

P.S. I pre-disagree with Mark's recommendation that appeared as I was writing this. :-)

Mark Engelberg

unread,
Aug 5, 2012, 4:42:52 PM8/5/12
to clo...@googlegroups.com
On Sun, Aug 5, 2012 at 1:18 PM, Stuart Halloway <stuart....@gmail.com> wrote:
1. Simplicity is partially about having orthogonal primitives. A duplicate-removing collection factory cannot be sensibly used to implement a throw-on-duplicates collection factory, nor vice versa, so both seem equally primitive to me.

I think we're both in agreement that duplicates-cause-the-creation-of-nonsensical-garbage (i.e., ArrayMap's behavior in 1.2) is the least useful primitive :)  This is the thing that really needed to be changed.


2. Given that both flavors are useful, they should both be provided, with explicit docs distinguishing them. This could involve new variants of the sorted collections for completeness.

I don't see a lot of evidence that people were clamoring for throw-on-duplicates.  It strikes me as a misguided attempt to fix ArrayMap's problem, rather than a desire to fill an actual need for this specific kind of primitive.  Aren't changes to Clojure meant to be inspired by providing solutions for actual problems that people are having, rather than hypothetical ones?


3. People may differ about which flavor constructor the literals should use, but I don't see any arguments here warranting a breaking change.

Well, the problem is that the breaking change already happened.  And it did, in fact, break my code.  I want to *undo* the breaking change.
 

In short: can't this be fixed by fixing the docstrings?

I doubt it.  So much of the way maps and sets work in so many contexts (e.g., assoc) are built around the idea that duplicate keys overwrite, rather than triggering an error.  To provide equal support for throw-on-duplicate would require vast changes to all the collection functions, changes that are hardly warranted for the only use-case proposed so far, namely to protect a user from shooting himself in the foot with a very specific kind of typo.  And without that kind of consistency, throw-on-duplicate semantics will remain a rare, quirky thing, that bites users in very specific contexts.  Therefore, the only sensible solution is to undo the breaking change and fix ArrayMaps properly.

Mark Engelberg

unread,
Aug 5, 2012, 5:18:35 PM8/5/12
to clo...@googlegroups.com
On Sun, Aug 5, 2012 at 1:42 PM, Mark Engelberg <mark.en...@gmail.com> wrote:
3. People may differ about which flavor constructor the literals should use, but I don't see any arguments here warranting a breaking change.

Also, although it was a breaking change to add throw-on-duplicate behavior to many types of maps and sets, reverting back to 1.2 behavior could not possibly be a "breaking change" in the literal sense.  Anyone whose code works right now on 1.4, by definition, has no duplicate keys.  Therefore, relaxing the restriction on duplicate keys can't possibly affect their existing code and cause it to break.  So it's not really fair to use the label "breaking change" to resist rollback and punt on making an informed decision as to which flavor is more appropriate for literals and/or the most common constructors.  If your desire is conservatism, the safest thing is to rollback this change, because it will protect people upgrading from 1.2, without affecting 1.4 users at all.

Phil Hagelberg

unread,
Aug 5, 2012, 5:18:53 PM8/5/12
to clo...@googlegroups.com
On Sat, Aug 4, 2012 at 11:45 PM, Mark Engelberg
<mark.en...@gmail.com> wrote:
> My jaw is dropping here. I can't believe anyone would seriously propose
> that the only things that go between [], {}, or #{} should be constants.

For what it's worth, vectors behave this way in Emacs Lisp.

I couldn't believe it either when I first discovered it; it basically
renders literal vector notation 90% useless. Vectors are essentially
ignored in all but the most performance-sensitive Emacs Lisp. It's
horrid.

-Phil

Michael Gardner

unread,
Aug 5, 2012, 6:11:49 PM8/5/12
to clo...@googlegroups.com
On Aug 5, 2012, at 4:18 PM, Mark Engelberg wrote:

> Also, although it was a breaking change to add throw-on-duplicate behavior to many types of maps and sets, reverting back to 1.2 behavior could not possibly be a "breaking change" in the literal sense. Anyone whose code works right now on 1.4, by definition, has no duplicate keys. Therefore, relaxing the restriction on duplicate keys can't possibly affect their existing code and cause it to break.

Not quite true; imagine some code that tried to construct a set literal from some variables, catching IllegalArgumentException to deal with duplicate values.

But I doubt there's any code in the wild that relies on this behavior yet, and I agree that set literals should be reverted to the 1.2 behavior.

Jim - FooBar();

unread,
Aug 5, 2012, 6:38:40 PM8/5/12
to clo...@googlegroups.com
I don't think any of the 5 clojure books out there, mentions that
data-structure literals should not be used to generate vectors/maps/sets
dynamically at runtime...personally I've been doing this a lot
especially with vectors and maps ! never even suspected that it was
wrong simply because i don't recall getting any weird behaviour and so I
assumed this worked with sets as well...

Jim

Evan Gamble

unread,
Aug 5, 2012, 10:31:27 PM8/5/12
to clo...@googlegroups.com
Throwing a runtime error for duplicates in set literals is, to me, shockingly counterintuitive, regardless of whether constants or non-constants are in the literal. Mathematical set notation has a long history of admitting duplicates, for clarity in exposition, which are understood to collapse to single values. Perhaps the most common usage of sets in a programming language is reducing duplicates, so not allowing such duplicates is just weird to me.

One might argue that such an error should be raised only when the duplicates are constants, but that sort of special-casing of behavior would likely lead to irritating ambiguities. For example, should (read-string "#{1 1}") throw an error? What if the string being read were constructed by some other function? What about a macro that expands into #{1 1}, but doesn't actually have that literal in its definition? Should that be an error or not? If not, how would the compiler know not to throw an error?

The case for throwing an error on duplicate keys in literal maps (other than sets) is a slightly stronger, but strikes me as a bit nanny-state-ish, protecting ourselves from ourselves, while removing a potentially useful capability. Allowing the last duplicate key to override strikes me as often quite useful, especially when the keys and values are non-constants.

...oh, and regarding the people who argue we should not use literal syntax with non-constants, I can only shake my head in wonder. Why on Earth would anyone seriously propose not using this elegant, concise syntax?

Sean Corfield

unread,
Aug 6, 2012, 12:10:41 AM8/6/12
to clo...@googlegroups.com
On Sun, Aug 5, 2012 at 7:31 PM, Evan Gamble <solar...@gmail.com> wrote:
> ...oh, and regarding the people who argue we should not use literal syntax
> with non-constants, I can only shake my head in wonder. Why on Earth would
> anyone seriously propose not using this elegant, concise syntax?

Just to be clear, I'm not arguing the status quo is right - I'd just
never thought to use literal syntax for variable stuff so I'd never
run into this and was just surprised that folks weren't using 'set' or
'into' to construct sets and maps from variables... This whole
discussion is quite fascinating to me, to see the different ways
people think about these constructs!

Peter Taoussanis

unread,
Aug 6, 2012, 4:39:06 AM8/6/12
to clo...@googlegroups.com
Just throwing in my vote here (assuming anyone's keeping count): I agree with Mark that the current behavior is surprising and disagreeable.

And while there's clearly a practical argument to be made from both perspectives, my own feeling is that throwing a hard runtime exception here is excessive and therefore un-Clojure-like behavior. Granted, that's obviously highly subjective ;)

Anyway, I think he's done a good job of teasing apart some of the apparent history behind the current approach - and it seems to me that there are clearer advantages in removing this error than in keeping it. I appreciate that Clojure generally gives developers the benefit of the doubt, and I guess this feels to me like a place where that is possibly being given up for apparently no major benefit.

So barring any further revelations, yes to this:

Put hash maps and hash sets back to the way they were -- they worked perfectly fine.  Use the duplicate key check in ArrayMap to make ArrayMaps behave like all the other maps, i.e., last instance of a key wins.

Cheers :) 

 - Peter Taoussanis (@ptaoussanis)

Laurent PETIT

unread,
Aug 6, 2012, 8:24:31 AM8/6/12
to clo...@googlegroups.com
Fwiw, inc on what Marc said 


--

abp

unread,
Aug 7, 2012, 3:20:44 AM8/7/12
to clo...@googlegroups.com
I use literals for collection-construction from arbitrary values too. Just haven't run into that issue.

So, please:

Gary Trakhman

unread,
Aug 14, 2012, 10:26:42 AM8/14/12
to clo...@googlegroups.com
+1

Herwig Hochleitner

unread,
Aug 14, 2012, 11:05:08 AM8/14/12
to clo...@googlegroups.com
I agree that the current throw-on-duplicates behavior is broken, because it renders literals with variable keys useless.

1. Simplicity is partially about having orthogonal primitives. A duplicate-removing collection factory cannot be sensibly used to implement a throw-on-duplicates collection factory, nor vice versa, so both seem equally primitive to me.

Yes, but duplicate-removing is far more common. Also it's far more useful in terms of runtime behavior. Looking at the original thread, the array-map constructor should have been fixed, if anything.

The part where throw-on-dupes has value, are configuration maps, which don't tend to have embedded variables.

So my suggestion is:

- Let the reader do duplicate checking
- If people feel the need, make separate constructor with dupes check
- Have our cake and eat it too

vemv

unread,
Sep 2, 2012, 2:18:30 PM9/2/12
to clo...@googlegroups.com
This issue best illustrates how imperative and functional thinking differ.

When I write code such as

(map not [true false])

, I implicitly think imperatively: "compiler, please traverse this collection, applying 'not to each element...".
I could also word my thoughts functionally: "I desire the filtering of mapping of reducing..." but my mind just doesn't work like that. I suspect this applies to much people: it's not unusual to encounter the argument that "our view of the world is essentially imperative".

While imperativeness typically gets some things wrong -e.g. time/change-, I believe we should't deny our imperative nature.

Just as in my previous example, when I encounter a set:

#{a b}

My first impression is to think about it as code, rather than data. It sure is data, and Clojure would remain correct under its current approach of throwing-on-duplicates, but this is a very unforgiving attitude towards what I deem our natural way of thinking.

Mark Engelberg

unread,
Sep 3, 2012, 9:06:26 PM9/3/12
to clo...@googlegroups.com
In the early days of Clojure, it was clear that Rich was reading every post on the Clojure mailing list.  He didn't respond to every single thread, of course, but when new issues were raised, he would frequently chime in, "That's a good point, please create a patch for that" or "That's something that's never going to change."

This created a clear path for bug reports, feature requests, and improvement suggestions.  Basically, the path was to post on the mailing list.  If it was something that had been already discussed in the past, one could count on the community to point to the relevant thread.  If it was something new, one could count on it eventually being evaluated by Rich and an official judgment made.  The community was instructed not to submit any kind of patch without a go-ahead from Rich.

I don't know what the path is now.  I feel that in the past year, there have been several times where people have raised meaningful issues about Clojure and received no official response.  It's hard to know whether this is an intentional "rejection through ignoring", or whether it's just that those messages happened to slip beneath the radar.  Maybe Rich didn't see them, and without his go-ahead, no one moved forward with them.

As a recent example, consider the issue I raised last month about sets, which in 1.3 were changed so that via several methods of construction (either literal notation or the hash-set constructor), they now throw an error, breaking code that previously worked, reducing the utility of set notation, and imposing on users the need to remember the idiosyncrasies of which methods of set construction impose this constraint and which don't.  The majority of those who weighed in on the issue agreed with my complaint.

The set issue was even discussed on the Mostly Lazy podcast as an example of how, even though Clojure gets a lot of the "big ideas" right, there seem to be a lot of "little things" that Clojure still hasn't nailed. 

In any case, there was a great deal of useful discussion about the set issue, and then... silence.

There are a couple of points here:

1.  I use Clojure regularly.  The "little things" may be little, but when you use Clojure regularly, those little things do start to grate after a while.  I would very much like to see Clojure on a path to resolve the little things, so that the language becomes increasingly pleasurable to use.  To do this, the community would benefit for a very clear mechanism for raising, discussing, evaluating, and resolving these issues.  The "hope that Rich reads the thread" approach doesn't appear to be working any more.  For example, on whitehouse.gov, you can start a petition and if enough people sign the petition within a given length of time, the president's office will issue an official statement about it.  That's the kind of thing I'm thinking about.  Rich's time is valuable, but it would be nice to know that any issue that reaches a certain level of visibility will receive an official "yea" or "nay" rather than languish in silence.

2.  There was significant support for my suggestion to revert set behavior back to 1.2 and solve the problem which motivated the change by bringing array-maps into accord with the behavior of the other maps and sets.  This email is also my way of bumping the thread and bringing it again to everyone's attention.  This is something I'd very much like to see resolved.

--Mark

Sean Corfield

unread,
Sep 3, 2012, 10:49:21 PM9/3/12
to clo...@googlegroups.com
On Mon, Sep 3, 2012 at 6:06 PM, Mark Engelberg <mark.en...@gmail.com> wrote:
> I don't know what the path is now. I feel that in the past year, there have
> been several times where people have raised meaningful issues about Clojure
> and received no official response. It's hard to know whether this is an
> intentional "rejection through ignoring", or whether it's just that those
> messages happened to slip beneath the radar. Maybe Rich didn't see them,
> and without his go-ahead, no one moved forward with them.

My understanding is the sort of discussion you are referring to has
moved to clojure-dev by necessity because of the volume of posts on
this list. http://clojure.org/contributing hints as much.

My understanding is also that anyone can open an issue in JIRA for
something they believe is a bug.

> In any case, there was a great deal of useful discussion about the set
> issue, and then... silence.

Open an issue in JIRA. Ask the folks here who agreed with your point
of view to "vote" on the issue. All issues get raised on clojure-dev
one way or another (esp. if they have a patch attached).

> example, on whitehouse.gov, you can start a petition and if enough people
> sign the petition within a given length of time, the president's office will
> issue an official statement about it. That's the kind of thing I'm thinking

That would seem to match the "voting on JIRA issues" point above.

> 2. There was significant support for my suggestion to revert set behavior
> back to 1.2 and solve the problem which motivated the change by bringing
> array-maps into accord with the behavior of the other maps and sets. This
> email is also my way of bumping the thread and bringing it again to
> everyone's attention. This is something I'd very much like to see resolved.

Again, open an issue in JIRA with a patch (you have a signed CA on
file so there's no obstacle). That will guarantee the issue gets
reviewed.

Jim - FooBar();

unread,
Sep 4, 2012, 6:52:25 AM9/4/12
to clo...@googlegroups.com
On 04/09/12 02:06, Mark Engelberg wrote:
> This email is also my way of bumping the thread and bringing it again
> to everyone's attention. This is something I'd very much like to see
> resolved.

+1 ... this thread should not die!

Jim

Andy Fingerhut

unread,
Sep 4, 2012, 12:30:33 PM9/4/12
to clo...@googlegroups.com
I have created a dev page for this issue. It isn't a JIRA ticket because it isn't clear to me yet exactly what the changes should be.

http://dev.clojure.org/display/design/Allow+duplicate+map+keys+and+set+elements

A couple of questions there for people that dislike the current behavior.

You can always construct sets that quietly allow duplicates as follows. Is that good enough? Or perhaps the issue is that you prefer to use #{} notation for constructing sets, and do not want to have to use a different method if you want the silent-duplicate-elimination behavior? If so, I can understand that. I'm just trying to get the argument for change as clearly as possible.
(set [a b])

The story for creating maps that quietly use the later duplicate keys in preference to the earlier ones isn't as clean: both hash-map and array-map throw an exception on duplicate keys, although sorted-map does not for some reason (probably an oversight when the duplicate key checks were added?). The following works, but is a bit clunky:

(assoc {} a 5 b 7)

Thanks,
Andy

Jim - FooBar();

unread,
Sep 4, 2012, 1:08:07 PM9/4/12
to clo...@googlegroups.com
the issue here is that behaviour should be *consistent* across all forms
of ctor functions, so programmers don't have to remember which one
allows what or don't thus limiting code breaks...the literal syntax is
just too elegant to give up! I don't think anyone is against consistency...

Jim

ps: IMO sets should always remove duplicates quietly...that is the whole
point of using them programmatically!

Andy Fingerhut

unread,
Sep 4, 2012, 4:02:52 PM9/4/12
to clo...@googlegroups.com
But what if they all consistently throw exceptions when encountering duplicates, including (set [5 5])? That doesn't sound like what you want.

Also, it seems from this discussion that at least some people like the error-catching aspects of the current behavior.

Stuart Halloway mentioned the idea of having two kinds of set/map constructor functions, one kind which quietly eliminates duplicates, another which throws an exception on duplicates.

That still leaves open the question of what the set and map literals should do with duplicates.

What if the default behavior for set & map literals were to quietly allow duplicates, with a new compiler option like the following that would give the error, for those that like it?

(set! *error-on-duplicates* true)

Perhaps it is starting to look like feature creep, but I thought I'd throw out the idea to see what happens.

Andy

Maik Schünemann

unread,
Sep 4, 2012, 4:32:30 PM9/4/12
to clo...@googlegroups.com
+1 consistency, whether it throws an exception or not, removes complexity from the language.
(because the programmer don't have to know the complex rules which literal notation behaves which way

Jim - FooBar();

unread,
Sep 4, 2012, 7:53:05 PM9/4/12
to clo...@googlegroups.com
On 04/09/12 21:02, Andy Fingerhut wrote:
> But what if they all consistently throw exceptions when encountering duplicates, including (set [5 5])? That doesn't sound like what you want.

of course not...this also goes against set/map semantics from a
mathematics point of view...the mathematical guarantees of set ('there
will be no duplicates') are imposed by the set itself and not by the
person/program/whatever using it! the same with map... since the ctor
fns are considered the correct way of initialising sets/maps then I
assume the dev team agrees with this simply because these versions do
behave like true sets and impose the guarantees themselves. why not the
literals?
> Stuart Halloway mentioned the idea of having two kinds of set/map constructor functions, one kind which quietly eliminates duplicates, another which throws an exception on duplicates.

Is this not a source of inconsistency? are there performance issues
involved? why not the correct thing on both/all cases?


> What if the default behavior for set & map literals were to quietly allow duplicates, with a new compiler option like the following that would give the error, for those that like it?

Now that sounds reasonable...of course the default I think should be the
correct (no runtime exceptions)...


> Perhaps it is starting to look like feature creep, but I thought I'd throw out the idea to see what happens.

well, yes you're not wrong on this, but I feel this is more
serious/important than other dynamic vars in the clojure runtime (there
are a lot !)...to be honest i don't think i would never turn this on so
i would eventually forget that knob existed...again, assuming there are
no dramatic performance improvements from doing so...having said that, i
thought i would never use unchecked-math either but my board-game engine
has unchecked math in most namespaces! ok, the quest for performance
almost turned that project into a freakshow but at the end of the day i
did use that knob.

Jim



Jim - FooBar();

unread,
Sep 4, 2012, 8:03:19 PM9/4/12
to clo...@googlegroups.com
On 05/09/12 00:53, Jim - FooBar(); wrote:
> of course not...this also goes against set/map semantics from a
> mathematics point of view...the mathematical guarantees of set
> ('there will be no duplicates') are imposed by the set itself and not
> by the person/program/whatever using it! the same with map... since
> the ctor fns are considered the correct way of initialising sets/maps
> then I assume the dev team agrees with this simply because these
> versions do behave like true sets and impose the guarantees
> themselves. why not the literals?


in other words:

how useful are sets that cannot impose the guarantees of set? if you,
the programmer, has to know there are no duplicates, why not use a
vector? no, you want that feeling of security that *there will be no
duplicates* no matter what! and obviously a RTE is scary and doesn't
really give you that feeling of security does it? people use literals
all over the place (including me)...


Jim

Andy Fingerhut

unread,
Sep 4, 2012, 8:10:21 PM9/4/12
to clo...@googlegroups.com

On Sep 4, 2012, at 4:53 PM, Jim - FooBar(); wrote:

> On 04/09/12 21:02, Andy Fingerhut wrote:
>>
>> Stuart Halloway mentioned the idea of having two kinds of set/map constructor functions, one kind which quietly eliminates duplicates, another which throws an exception on duplicates.
>
> Is this not a source of inconsistency? are there performance issues involved? why not the correct thing on both/all cases?

I think the question arises more naturally for maps than for sets.

If someone types in the literal map {:a 5 :b 10 :c 13 :a -5}, what is the "correct thing"?

Some people might be thinking the correct thing is "I want the last key :a's value, -5, to win always, no matter if the key :a occurs more than once. I never want an error for code like this."

Others might be thinking "Oh, that is obviously a typo in my source code. I never intentionally want to specify the same key twice in any literal map. I want the compiler to flag that as an error so I don't have to spend lots of testing/debugging time to find that typo."

Personally, I can see both of those points of view as reasonable.

Either there needs to be a configurable knob to select the behavior, or one group of people is happy, and the other are not.

Andy

Herwig Hochleitner

unread,
Sep 4, 2012, 9:31:23 PM9/4/12
to clo...@googlegroups.com
2012/9/5 Andy Fingerhut <andy.fi...@gmail.com>
If someone types in the literal map {:a 5 :b 10 :c 13 :a -5}, what is the "correct thing"?

Some people might be thinking the correct thing is "I want the last key :a's value, -5, to win always, no matter if the key :a occurs more than once.  I never want an error for code like this."

Others might be thinking "Oh, that is obviously a typo in my source code.  I never intentionally want to specify the same key twice in any literal map.  I want the compiler to flag that as an error so I don't have to spend lots of testing/debugging time to find that typo."

Why not _only_ throw, if the keys are provably identical? The _reader_ can decide this on a purely syntactic basis.

So {:a 1 :a 2}, would throw. (let [a :a] {a 1 a 2}) would also throw.

(let [a1 :a a2 :a] {a1 1 a2 2}) would eliminate the duplicate key with last one in wins behavior, as it was before. Maybe it could print a warning, doing so.

As it happens, this catches all the obvious cases, like config-maps (where it has already saved me once), while leaving more involved semantics alone.

Can I cast a vote that this is the most desirable behavior?

I'll start with 

+1 for throwing on _syntactic_ duplicates

Mark Engelberg

unread,
Sep 4, 2012, 11:31:33 PM9/4/12
to clo...@googlegroups.com
On Tue, Sep 4, 2012 at 9:30 AM, Andy Fingerhut <andy.fi...@gmail.com> wrote:
I'm just trying to get the argument for change as clearly as possible.

The major bullet points:
1. "It's a bug that should be fixed."  The change to throw-on-duplicate behavior for sets in 1.3 was a breaking change that causes a runtime error in previously working, legitimate code. 

Looking through the history of the issue, one can see that no one was directly asking for throw-on-duplicate behavior.  The underlying problem was that array-maps with duplicate keys returned nonsensical objects; surely it would be more user-friendly to just block people from creating such nonsense by throwing an error.  This logic was extended to other types of maps and sets.

It's not entirely clear the degree to which the consequences of these changes were considered, but it seems likely that there was an implicit assumption that throw-on-duplicate behavior would only come into play in programs with some sort of syntactic error, when in fact it has semantic implications for working programs.  When a new "feature" causes unintentional breakage in working code, this is arguably a bug and needs to be reconsidered. 

2. "The current way of doing things is internally inconsistent and therefore complex."
(def a 1)
(def b 1)
(set [a b]) -> good
(hash-set a b) -> error
#{a b} -> error
(sorted-set a b) -> good
(into #{} a b) -> good

The cognitive load from having to remember which constructors do what is a bad thing. 

3. "Current behavior conflicts with the mathematical and intuitive notion of a set."
In math, {1, 1} = {1}.  In programming, sets are used as a means to eliminate duplicates.


Many people have +1'd and reiterated variations of the above arguments.  Now let's summarize the arguments that have been raised here in support of the status quo.

1. "Changing everything to throw-on-duplicate would be just as logically consistent as changing everything to use-last-in."

True, but that doesn't mean that both approaches would be equally useful.  It's readily apparent that an essential idea of sets is that they need to be able to gracefully absorb duplicates, so at least one such method of doing that is essential.  On the other hand, we can get along just fine without sets throwing errors in the event of a duplicate value.  So if you're looking for consistency, there's really only one practical option.

2.  "I like the idea that Clojure will protect me from accidentally from this kind of syntax error."

Clojure, as a dynamically typed language, is unable to protect you from the vast majority of data-entry syntax errors you're likely to make.

Let's say you want to type in {:apple 1, :banana 2}.  Even if Clojure can catch your mistake if you type {:apple 1, :apple 2}, there's no way it's ever going to catch you if you type {:apple 1, :banano 2}, and frankly, the latter error is one you're far more likely to make.

This is precisely why there's little evidence that anyone was asking for this kind of syntax error protection, and little evidence that anyone has benefited significantly from its addition -- its real-world utility is fairly minimal and dwarfed by the other kinds of errors one is likely to make.

3.  "Maybe we can do it both ways."

It's laudable to want to make everyone happy.  The danger, of course, is that such sentiment paints a picture that it would be a massive amount of work to please everyone, and therefore, we should do nothing.  Let's be practical about what is easily doable here with the greatest net benefit.  The current system has awkward and inconsistent semantics with little benefit.  Let's focus on fixing it. The easiest patch -- revert to 1.2 behavior, but bring array-map's semantics into alignment with the other associative collections.

Peter Taoussanis

unread,
Sep 5, 2012, 12:40:50 AM9/5/12
to clo...@googlegroups.com
+1 on Mark's most recent reply, that is:

* Revert to 1.2 behaviour.
* Consistency is good, but must be in favour of not throwing RTEs.
* No knobs.

It's clear that there's lots of directions that could be taken here, but getting caught up on trying to find a solution that pleases everyone 100% is, IMO, both a non-starter and inconsistent with Clojure's generally opinionated approach.

The pre-1.2 behaviour was sensible, consistent (both in terms of API and Clojure design idioms IMO) and didn't raise any complaints (as Mark has pointed out, the motivation behind the original change was actually for something unrelated). Mark's arguments on the relative value of RTE/no-RTE behaviour are also sound IMO.

My 2c: let's try not to over-analyse this thing. Revert the behaviour, and let's move on to more interesting ways of moving the language forward :)

abp

unread,
Sep 5, 2012, 4:18:28 AM9/5/12
to clo...@googlegroups.com
I too approve of Mark's reasoning and solution. Probably that should be moved into http://dev.clojure.org/display/design/Allow+duplicate+map+keys+and+set+elements 

Stuart Halloway

unread,
Sep 5, 2012, 9:41:45 AM9/5/12
to clo...@googlegroups.com
Hi Mark,

Thanks for extracting a summary of the conversation so far, and +1 for making sure this is on the wiki.

Stu

Andy Fingerhut

unread,
Sep 5, 2012, 1:57:43 PM9/5/12
to clo...@googlegroups.com
I've copied and pasted Mark's arguments to the Wiki page here:


Andy

Rich Hickey

unread,
Sep 7, 2012, 1:49:24 PM9/7/12
to clo...@googlegroups.com
Once again, thanks Andy!

I've added my feedback there (http://dev.clojure.org/display/design/Allow+duplicate+map+keys+and+set+elements)

Patches implementing that are welcome.

Rich

Sean Corfield

unread,
Sep 7, 2012, 3:35:59 PM9/7/12
to clo...@googlegroups.com
On Fri, Sep 7, 2012 at 10:49 AM, Rich Hickey <richh...@gmail.com> wrote:
> I've added my feedback there (http://dev.clojure.org/display/design/Allow+duplicate+map+keys+and+set+elements)

Thanx Rich! So the recommendation is:

* set/map literals with duplicates are invalid (status quo)

* hash-set/hash-map should change (to last key wins, as if conj'd/assoc'd)

* sorted-set/sorted-map should not change (last key wins, as if conj'd/assoc'd)

* array-map should not change (throws on dupes)?

Highlighting that last one since it's not mentioned on the wiki and
would then be the "odd one out" but perhaps there's a good reason?

Rich Hickey

unread,
Sep 7, 2012, 5:06:52 PM9/7/12
to clo...@googlegroups.com

On Sep 7, 2012, at 3:35 PM, Sean Corfield wrote:

> On Fri, Sep 7, 2012 at 10:49 AM, Rich Hickey <richh...@gmail.com> wrote:
>> I've added my feedback there (http://dev.clojure.org/display/design/Allow+duplicate+map+keys+and+set+elements)
>
> Thanx Rich! So the recommendation is:
>
> * set/map literals with duplicates are invalid (status quo)
>
> * hash-set/hash-map should change (to last key wins, as if conj'd/assoc'd)
>
> * sorted-set/sorted-map should not change (last key wins, as if conj'd/assoc'd)
>
> * array-map should not change (throws on dupes)?
>
> Highlighting that last one since it's not mentioned on the wiki and
> would then be the "odd one out" but perhaps there's a good reason?

No, array-map should be the same too.

Mark Engelberg

unread,
Sep 7, 2012, 7:13:34 PM9/7/12
to clo...@googlegroups.com
On the wiki page, Rich Hickey wrote:
* If you think a month is too long to get a response to your needs, from a bunch of very busy volunteers, you need to chill out
* just because you decided to bring it up doesn't mean everyone else needs to drop what they are doing

For the record, I don't really care how long it takes to get a response.  Unfortunately, there's no observable way to know whether an issue is in some queue to be considered at a future date, or whether an issue has merely gone unnoticed, or whether an issue has been deemed unworthy of further consideration.

In any case, thanks for the thoughtful, well-considered response to the issue on the wiki.

--Mark

Andy Fingerhut

unread,
Sep 8, 2012, 2:22:58 AM9/8/12
to clo...@googlegroups.com
The new ticket CLJ-1065 has a patch that I think implements the desired behavior on the dev wiki page.

i.e. set/map literals with duplicates are invalid (status quo)

All constructor functions for sets and maps allow duplicates, and for maps, always take the value associated with the last occurrence of the same key. All constructor functions explicitly say this in their doc strings.

Andy

Rich Hickey

unread,
Sep 8, 2012, 8:29:32 AM9/8/12
to clo...@googlegroups.com
Thanks!

I'm still interested in patch for recommendation #3:

Restore the fastest path possible for those cases where the keys are compile-time detectable unique constants

I'd like to see all three recommendations go into a release as a set.

Andy Fingerhut

unread,
Sep 9, 2012, 4:13:34 AM9/9/12
to clojure
I think I may have figured it out. New patch attached to ticket CLJ-1065 that should eliminate run-time checks for duplicate map keys, for those maps whose keys are all compile-time constants.

Andy

On Sep 8, 2012, at 4:38 PM, Andy Fingerhut wrote:

> Rich:
>
> I'm not sure what you mean by the not-fastest-path possible that exists in today's Clojure code, so if you get a chance, see if the below is what you mean.
>
> As far as I can tell (i.e. putting debug println's in the Java code of RT.map), when someone enters a map literal in, say, a function definition, and all keys *and* values are compile time constants, it calls RT.map() while the function is being compiled, but never again when the function is called.
>
> If I make a similar function with run-time variable keys or values, RT.map() is called every time the function is invoked. Each of these calls repeats the check that the keys are unique.
>
> Do you mean that you want a new code path where if the keys are compile time constants, but the values are variables at compile-time, then at run time this map should be created with a method that avoids the unnecessary check for unique keys?
>
> And by the word "restore" do you mean to imply that it was this way at one time before?
>
> Thanks,
> Andy

Rich Hickey

unread,
Sep 11, 2012, 3:15:57 PM9/11/12
to clo...@googlegroups.com
I understand your frustration.

But it is important to note that timeliness and feedback are a two-way street. There was a time when changes to Clojure were tried immediately by users, and I'd know within hours if not minutes if I'd introduced something that caused problems for someone. That matters as much to me now as it did then. But now it can be months or, as in this case, years before someone issues a complaint. That's quite frustrating as well.

It is equally important that we continue to presume the best of one another. I've had to reconcile myself to the fact that Clojure users are busy using Clojure, and primarily from release artifacts. They have less time to spend evaluating the latest code. I, too, am busy using Clojure, something that is quite good for Clojure, even though it may give me less time for participating in the threads.

Rest assured that the threads are being seen and considered. If you've had someone from core chime in on a thread (as Stu did, the day after your first post), then it's been seen. He and I have spoken about it several times. That doesn't mean it's become an action item, even if there's a long thread (long threads can mean little more than everyone has an opinion), or that the majority seems to agree (people happy with how things are are least motivated to chime in). The fact is, this was very old code, and if someone was catching fire because of it, presumably it would have come up already.

That said, I am very interested in the idea of getting more leverage out of the JIRA voting system. As the volume on the lists and in JIRA etc continues to grow (as it does), triage becomes critical, and more difficult. Any effort to organize and prioritize the work and opinions of the community is much appreciated (thanks Andy, et al). Voting can be a self-serve, self-tuning option. I've already made a JIRA view that orders tickets by votes. It will be essential though, that people be limited (or, more likely, limit themselves) to e.g. 3 active votes (i.e. on open tickets) at a time, else it's not triage, it's just a venting exercise :)

The simple fact is that people's desire to work with a stable Clojure, and to continue to produce a solid one, coupled with a growing community, means that things will take longer, and not everything can be personally addressed.

Rich

Andy Fingerhut

unread,
Sep 8, 2012, 7:38:12 PM9/8/12
to clo...@googlegroups.com
Rich:

I'm not sure what you mean by the not-fastest-path possible that exists in today's Clojure code, so if you get a chance, see if the below is what you mean.

As far as I can tell (i.e. putting debug println's in the Java code of RT.map), when someone enters a map literal in, say, a function definition, and all keys *and* values are compile time constants, it calls RT.map() while the function is being compiled, but never again when the function is called.

If I make a similar function with run-time variable keys or values, RT.map() is called every time the function is invoked. Each of these calls repeats the check that the keys are unique.

Do you mean that you want a new code path where if the keys are compile time constants, but the values are variables at compile-time, then at run time this map should be created with a method that avoids the unnecessary check for unique keys?

And by the word "restore" do you mean to imply that it was this way at one time before?

Thanks,
Andy


On Sep 8, 2012, at 5:29 AM, Rich Hickey wrote:

Rich Hickey

unread,
Sep 12, 2012, 6:22:42 AM9/12/12
to clo...@googlegroups.com

On Sep 8, 2012, at 7:38 PM, Andy Fingerhut wrote:

> Rich:
>
> I'm not sure what you mean by the not-fastest-path possible that exists in today's Clojure code, so if you get a chance, see if the below is what you mean.
>
> As far as I can tell (i.e. putting debug println's in the Java code of RT.map), when someone enters a map literal in, say, a function definition, and all keys *and* values are compile time constants, it calls RT.map() while the function is being compiled, but never again when the function is called.
>
> If I make a similar function with run-time variable keys or values, RT.map() is called every time the function is invoked. Each of these calls repeats the check that the keys are unique.
>
> Do you mean that you want a new code path where if the keys are compile time constants, but the values are variables at compile-time, then at run time this map should be created with a method that avoids the unnecessary check for unique keys?
>

Exactly.

> And by the word "restore" do you mean to imply that it was this way at one time before?
>

Nope. It was that fast, but did no compile-time checks.

Thanks

Rich

>

Andy Fingerhut

unread,
Oct 4, 2012, 4:34:41 PM10/4/12
to clo...@googlegroups.com
I just wanted to mention to those interested in the issues raised by this thread that a patch for CLJ-1065 was committed to Clojure master today, and is part of release clojure-1.5.0-alpha5:

http://dev.clojure.org/jira/browse/CLJ-1065

All of the set and map constructor functions now explicitly allow duplicate set elements/map keys. They handle the duplicates as if by repeated assoc calls, and this is mentioned in the doc strings for those functions.

Set and map literals still throw errors if there are duplicate set elements/map keys, as was the case before the CLJ-1065 patch was committed.

Andy

Jim foo.bar

unread,
Nov 12, 2012, 8:22:02 AM11/12/12
to clo...@googlegroups.com
Yes, this has been discussed extensively in the past....I think the
convention is to use the ctor functions if you're passing data
dynamically, otherwise if dealing with constants the literals should be
just fine...In the case just replace the set literal with (hash-set ...)
or (set ...).

hope that helps...

Jim

On 12/11/12 08:48, Antony Lee wrote:
> I may arrive at the party a little late but just to mention I got
> bitten by this too (while working on clojure-py, so I actually want to
> know about the weird edge cases...)
>
> user=> #{(rand-int 100) (rand-int 100)}
> IllegalArgumentException Duplicate key: (rand-int 100)
> clojure.lang.PersistentHashSet.createWithCheck
> (PersistentHashSet.java:68)
>
> Of course you can even replace rand-int by a function which is
> actually guaranteed to return different values on consecutive calls,
> e.g. a closure over an atom.

Jim foo.bar

unread,
Nov 12, 2012, 8:25:29 AM11/12/12
to clo...@googlegroups.com
sorry 'set' will convert from a coll to a set...use 'hash-set' ,
'sorted-set' etc etc...

Jim
Reply all
Reply to author
Forward
0 new messages