Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion Question about sets
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Chas Emerick  
View profile  
 More options Aug 5 2012, 10:33 am
From: Chas Emerick <c...@cemerick.com>
Date: Sun, 5 Aug 2012 10:33:12 -0400
Local: Sun, Aug 5 2012 10:33 am
Subject: Re: Question about sets

On Aug 5, 2012, at 2:56 AM, Sean Corfield wrote:

> On Sat, Aug 4, 2012 at 11:45 PM, Mark Engelberg
> <mark.engelb...@gmail.com> wrote:
>> In any case, Clojure is already able to detect when x and y are equal in
>> something like #{x y} and report it as an error.

> So do you think #{1 1} should not be an error? And {:a 1 :a 2}? These
> seem like "obvious" programmer errors that I'd want the compiler to
> catch.

I hit exactly the problem the OP raised this past week.  I knew of the (good) restriction on e.g. #{1 1} being an error, and remember some of the conversations leading up to the changes that made it so, but had either forgotten or never quite internalized that #{a b} is also an error, where a and b are equivalent values.

First, some history:
Looks like http://groups.google.com/group/clojure/browse_thread/thread/5a38a6b61... was the original thread where duplicate map keys first came up as an issue, although the differences between array-maps and hash-maps were the original impetus.
This led to http://dev.clojure.org/jira/browse/CLJ-87 being filed
Rich's first change was to make duplicate map keys an error, regardless of whether the values in question were literals themselves or evaluated from an expression: https://github.com/clojure/clojure/commit/e6e39d5931fbdf3dfa68cd2d059...
A brief discussion in irc ensued — http://clojure-log.n01se.net/date/2010-04-05.html#10:56a — where Rich suggested that sets should probably be subject to the same rules as keys of maps (which, of course they should be, whatever those rules may be, since a map's keys are always a set).
The final commit on the issue extended the error-checking of map keys to set values: https://github.com/clojure/clojure/commit/c733148ba0fb3ff7bbab133f537...
Note that the .createWithCheck variations of all of the collections in question are used by their "constructor" functions as well, e.g. hash-set, hash-map, and array-map:

=> (hash-set 1 2 2)
IllegalArgumentException Duplicate key: 2  clojure.lang.PersistentHashSet.createWithCheck (PersistentHashSet.java:80)
=> (hash-map 1 2 1 3)
IllegalArgumentException Duplicate key: 1  clojure.lang.PersistentHashMap.createWithCheck (PersistentHashMap.java:92)
=> (array-map 1 2 1 3)
IllegalArgumentException Duplicate key: 1  clojure.lang.PersistentArrayMap.createWithCheck (PersistentArrayMap.java:70)

The only way to get around the checks here is to use `set` or `into`; note that there is no "constructor" function for an unsorted map that does not check that the provided keys are unique.

Interestingly, sorted maps and sets do *not* have the same restriction:

=> (sorted-map 1 2 1 3)
{1 3}
=> (sorted-set 1 2 1)
#{1 2}

Quoting Rich from the mailing list thread linked above:

> These are bugs in user code. Map literals are in fact read as maps, so
> a literal map with duplicate keys isn't going to produce an evaluated
> map with distinct keys. If you create an array map with duplicate
> keys, bad things will happen.

In the end, even though I've been recently "bitten" by the checked creation of sets from a literal, I think it's a reasonable approach.  In #{a b}, you are specifying the creation of a set containing two and exactly two values, those named by a and b.  There's an explicit invariant being specified in that code.  Thinking over my various uses of #{} syntax, I remember times where I've expected it to enforce that invariant (and throw an error if dupes were provided) and times where I've expected it to implicitly apply `distinct` to the values provided; that indicates sloppiness on my part, not a place where the language should become psychic.

In contrast, `set` and `into` each accept a seqable collection of data, and are explicit about their support for sifting out duplicates (right in the docstring of `set`, and transitively so for `into` due to its use of `conj` and its semantics on sets).

Finally, for the sake of consistency, it seems like the same checks should be applied by the sorted map and set "constructor" functions, and that there should be a map corollary to `set` (i.e. a function that is the equivalent of #(into {} %), just as `set` is the equivalent of #(into #{} %)).  This last one is problematic in terms of naming, though.

Cheers,

- Chas

--
http://cemerick.com
[Clojure Programming from O'Reilly](http://www.clojurebook.com)


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.