Potential improvement to select-keys ?

444 views
Skip to first unread message

Don Jackson

unread,
Oct 31, 2013, 10:21:39 PM10/31/13
to clo...@googlegroups.com

select-keys currently returns a regular hash-map, regardless of the kind of map provided as the input argument.

Alternately, select-keys could call empty on it’s map argument to obtain the map it will return, thereby preserving the type of map provided.

FYI, Mark Engelberg recently pushed a change to data.priority-map that ensures that the specific flavor of priority-map is preserved through calls to empty…


Andy Fingerhut

unread,
Nov 1, 2013, 10:39:57 AM11/1/13
to clo...@googlegroups.com
Looks like a reasonable enhancement to me.  I have filed ticket CLJ-1287 with a proposed patch for it:

    http://dev.clojure.org/jira/browse/CLJ-1287

No guarantees when or if it will get into Clojure.  You can modify your own local copy of select-keys (or even alter-var-root its definition) if you would like the new behavior sooner, for yourself.

By the way, there is a funjible.set library published on Github that is like clojure.set except it has preconditions that its arguments are sets or maps, the functions have longer more explicit doc strings, and there are a few behavior changes like ensuring metadata or sortedness of return values.

    https://github.com/jafingerhut/funjible

Andy


--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Alex Miller

unread,
Nov 1, 2013, 2:20:05 PM11/1/13
to clo...@googlegroups.com
I'm not sure I agree with the intent of the patch (as I can certainly imagine times when I'd want a pure map from select-keys, not one that has the characteristics of the source map).

Also, due to empty not being supported on records (yet - separate ticket for that), I think this breaks select-keys for records, which would be bad as that is a common usage. [This would imply sequencing of changes.]

Mark Engelberg

unread,
Nov 1, 2013, 3:18:33 PM11/1/13
to clojure
On Fri, Nov 1, 2013 at 11:20 AM, Alex Miller <al...@puredanger.com> wrote:
I'm not sure I agree with the intent of the patch (as I can certainly imagine times when I'd want a pure map from select-keys, not one that has the characteristics of the source map).

I'm sure it's possible to imagine both needs, but if you have have a sorted-map, and you do a select-keys, don't you think the "principle of least surprise" is for it to stay a sorted-map?  I think intuitively, select-keys is about *restricting* the existing map in some way -- you expect things like the metadata and the type of the map to stay the same.  Similarly, my intuition tells me that if you want a new kind of map, you'd have to "pour" the contents of the map into the other type of map via into.

Cedric Greevey

unread,
Nov 1, 2013, 4:14:33 PM11/1/13
to clo...@googlegroups.com
I agree with Engelberg, except that records need to be treated specially. Select-keys (and empty) on a record preserving the type but omitting one of the record's predefined keys would make no sense, so I'd argue that records have to give rise to normal (unsorted) maps when these are used on them. So, sortedness and unsortedness (and any particular sort-key or sort-fn) would be preserved by these operations, but not exact type.

Besides, exact type isn't preserved by even conjing in the case of PersistentArrayMap/PersistentHashMap.


--

Michael Gardner

unread,
Nov 1, 2013, 5:02:08 PM11/1/13
to clo...@googlegroups.com
On Nov 1, 2013, at 14:18 , Mark Engelberg <mark.en...@gmail.com> wrote:

> I'm sure it's possible to imagine both needs, but if you have have a sorted-map, and you do a select-keys, don't you think the "principle of least surprise" is for it to stay a sorted-map?

I'm not sure about that, given how map and similar functions behave. And with the point Cedric raised about records, I think it's better to have consistent behavior (always return a regular map).

Alex Miller

unread,
Nov 1, 2013, 5:33:10 PM11/1/13
to clo...@googlegroups.com
I think about it as extracting a subset, and I don't particularly expect the subset container to have the properties of the original container - similar to keys or vals (and different from filter). Neither of these perspectives is right or wrong, but the presence of more than one leads me to believe that "intuitive" is dependent on the brain of the beholder in this case. :) 

From the perspective of "extraction" I feel like select-keys on a record returning a record feels even weirder to me. 

None of this is to say that I am correct, just sharing my opinion. :)  

Mark Engelberg

unread,
Nov 1, 2013, 6:57:08 PM11/1/13
to clojure
Having select-keys on records returning regular maps is *not* inconsistent with preserving the "type" of map in the sense of sorted-map, priority-map, etc.

Here's why:

If you think of select-keys as dissoc'ing all the irrelevant keys, that would preserve the type of sorted-map, priority-map, but it would also turn a record into a regular map (because as soon as you dissoc a required key from a record, it turns into a plain map).

If you think of select-keys as emptying out *all* the keys, and then building back up with the selected keys (a more accurate representation of what's going on under the covers), it still lends itself to exactly the same model.  Emptying out the keys preserves the type of a map, but converts records to regular maps (because that's what already happens when you get rid of required keys from a record).

So, I think the behavior that Cedric described is exactly what mental models would predict.

Alex Miller has said that he thinks of it as just nebulously selecting some sort of undefined subset of the map, but I think that's far too vague to base the design on.  I think the above two mental models are the ones that someone would settle on if they spent a few seconds trying to think about how it's probably implemented.  Both predict the same behavior, and that's the "principle of least surprise" we should be targeting.


Alex Miller

unread,
Nov 1, 2013, 7:59:26 PM11/1/13
to clo...@googlegroups.com


On Friday, November 1, 2013 5:57:08 PM UTC-5, puzzler wrote:
Having select-keys on records returning regular maps is *not* inconsistent with preserving the "type" of map in the sense of sorted-map, priority-map, etc.

Here's why:

If you think of select-keys as dissoc'ing all the irrelevant keys, that would preserve the type of sorted-map, priority-map, but it would also turn a record into a regular map (because as soon as you dissoc a required key from a record, it turns into a plain map).

The "irrelevant keys" would be the (infinite) complement of the set of specified keys, so this wouldn't work.

If you think of select-keys as emptying out *all* the keys, and then building back up with the selected keys (a more accurate representation of what's going on under the covers), it still lends itself to exactly the same model.  Emptying out the keys preserves the type of a map, but converts records to regular maps (because that's what already happens when you get rid of required keys from a record).

I think of it as creating a new map and filling it with entries from any keys in the specified set that exist in the map (which is essentially what it does now). My bias is purely that I'm used to and comfortable with that behavior.

Altering an existing function like select-keys is going to be harder to get accepted if it breaks existing code and that seems possible with the suggested alternative (certainly around records for now and possibly around performance). 

Instead, what if this was a new operation? Perhaps broader in scope would be a function that takes a map and a filter function (on key/value/both?) in that it filters a map based on a function of key/value/or both. This links back to a prior discussion (possibly on clojure-dev, can't remember now) about providing a richer set of operations that take and return maps. A similar function that does the inverse (removes based on a function) is another possibility - I see both of these re-implemented in many code bases in a variety of forms.

I personally would love to have a richer set of built-in map-oriented functions comparable to the seq functions but with awareness of key-value entries and optimized for performance in those cases. Would require some serious "hammock" time to determine the right set of fns.

Whatcha think?

Mark Engelberg

unread,
Nov 1, 2013, 7:59:36 PM11/1/13
to clojure
One other point:
Sometimes people use sorted maps and array maps specifically for scenarios in which the keys are not hashable and therefore hash maps would not apply.   Dumping the contents into a regular map in such cases doesn't make much sense.


Mark Engelberg

unread,
Nov 1, 2013, 8:04:24 PM11/1/13
to clojure
On Fri, Nov 1, 2013 at 4:59 PM, Alex Miller <al...@puredanger.com> wrote:
I think of it as creating a new map and filling it with entries from any keys in the specified set that exist in the map (which is essentially what it does now). My bias is purely that I'm used to and comfortable with that behavior.

Our emails crossed paths, timing-wise, but my response to this is that not all types of maps can safely be dumped into a regular map, so there's not much reason to think that the standard behavior would be to build up from scratch using a regular map.

Alex Miller

unread,
Nov 2, 2013, 9:00:01 AM11/2/13
to clo...@googlegroups.com

> Our emails crossed paths, timing-wise, but my response to this is that not all types of maps can safely be dumped into a regular map, so there's not much reason to think that the standard behavior would be to build up from scratch using a regular map.

Example? I can't think of one. Any key/value pair you retrieve from any map can be put in a regular map, right? Note that the current behavior is to do exactly this.

Alex Miller

unread,
Nov 2, 2013, 9:07:07 AM11/2/13
to clo...@googlegroups.com
> One other point:
> Sometimes people use sorted maps and array maps specifically for scenarios in which the keys are not hashable and therefore hash maps would not apply.  Dumping the contents into a regular map in such cases doesn't make much sense.

Everything is hashable, not sure what a non-hashable key means. Array maps use the hash of the key to determine the array bucket. If you get the hash code of a sorted map, it will get the hash of all keys and values.

Andy Fingerhut

unread,
Nov 2, 2013, 11:49:19 AM11/2/13
to clo...@googlegroups.com
I attached another patch to the ticket.  It builds up the answer from the empty map {} if the argument is a record (as the current select-keys does), but from (empty map) if it is not a record, so it will preserve sortedness of the argument.  Not sure if there are any other cases that are a problem, or if it does what everyone would expect, but it should be closer.

    http://dev.clojure.org/jira/browse/CLJ-1287

Andy


Mark Engelberg

unread,
Nov 2, 2013, 12:17:23 PM11/2/13
to clojure
Infinite sequences are not hashable.  They can be sorted lexicographically, provided you know in advance you're not working with two equal sequences.  I believe that array maps just store items in the order you put them in, never computing the hash.

Not sure that using infinite sequences as keys is a common use case, but when I wrote that, I was thinking that might be representative of a larger set of examples where it is impossible or impractical to compute the hashes of your keys.
 

Mark Engelberg

unread,
Nov 2, 2013, 12:27:47 PM11/2/13
to clojure
I seem to be a relatively lone voice thinking it makes sense to preserve the type of the map, and honestly, I'm not going to lose any sleep if this patch is never actually implemented, but here's my two cents on the "optimal" design for records.

Earlier in the thread, I mentioned two mental models about how select-keys works.  Even though the actual implementation involves starting from an empty map and building back up, I think the behavior of select-keys should mimic the mental model of discarding irrelevant keys.

The two mental models predict somewhat differently what would happen when working with records.  Records can hold additional keys.  So imagine someone dumps a whole lot of additional keys, and then wants to "recover" the original, true record, by calling select-keys with the original keys in the record.  I think this is a use case worth supporting.  This corresponds to the mental model of discarding unwanted keys.

To implement this for records:
If any of the original record keys do not appear in the selected keys, then, as with dissoc, we must revert back to a map as there is no valid way to represent this as a record.
Otherwise, maintain the record type -- the core record contents are preserved, and essentially you only need to call select-keys only on the extended map of extra keys stored in the record.

Don Jackson

unread,
Nov 3, 2013, 4:46:40 PM11/3/13
to clo...@googlegroups.com
On Nov 2, 2013, at 9:27 AM, Mark Engelberg <mark.en...@gmail.com> wrote:

I seem to be a relatively lone voice thinking it makes sense to preserve the type of the map, and honestly, I'm not going to lose any sleep if this patch is never actually implemented, but here's my two cents on the "optimal" design for records.

Earlier in the thread, I mentioned two mental models about how select-keys works.  Even though the actual implementation involves starting from an empty map and building back up, I think the behavior of select-keys should mimic the mental model of discarding irrelevant keys.

The two mental models predict somewhat differently what would happen when working with records.  Records can hold additional keys.  So imagine someone dumps a whole lot of additional keys, and then wants to "recover" the original, true record, by calling select-keys with the original keys in the record.  I think this is a use case worth supporting.  This corresponds to the mental model of discarding unwanted keys.

To implement this for records:
If any of the original record keys do not appear in the selected keys, then, as with dissoc, we must revert back to a map as there is no valid way to represent this as a record.
Otherwise, maintain the record type -- the core record contents are preserved, and essentially you only need to call select-keys only on the extended map of extra keys stored in the record.

I agree 100% with Mark’s comments above.
This is exactly the model I would expect select-keys to have.

Conceptually, this is how I imagine select-keys should work:

(defn select-keys++
  "Returns a new map of the same (hashed/sorted) type,
  that contains only those entries whose key is in keyseq"
  [map keyseq]
  (apply dissoc 
             map 
             (clojure.set/difference (set (keys map))
                                                 (set keyseq))))

I am not saying that this would be the best , or even a good, implementation.

Like Mark, I am not going to lose any sleep about this either, but I was using Mark’s data.priority-maps the other day,
did a select-keys to winnow down my priority-map, and was surprised by the result.
Perhaps I should not have been surprised….
In preparing this email, I carefully re-read the doc strings for both dissoc and select keys:

(defn dissoc
  "dissoc[iate]. Returns a new map of the same (hashed/sorted) type,
  that does not contain a mapping for key(s).”

(defn select-keys
  "Returns a map containing only those entries in map whose key is in keys"

Clearly dissoc makes the promise to return a map of the same type that select-keys doesn’t.
A potential non-code change to select-keys might be to emphasize that the return map will be
a hash-map, regardless of the kind/type of input map.

Don

Kendall Buchanan

unread,
Nov 7, 2014, 2:54:54 PM11/7/14
to clo...@googlegroups.com
I'll attempt an argument for returning the record's type in another way: I can always get a PersistentArrayMap from select-keys, if I want. Just add (into {} ...) to the chain. I cannot, however, retain the record type because select-keys *makes the decision for me*.

In other words, letting select-keys return the key/value pairs in the type I passed it, I retain control, rather than watching it be yanked from me.

(I also came across this thread because select-keys' return value surprised me. What's worse, though, there's no simple way to resolve it without re-introducing a constructor function in the middle of my flow.)
Reply all
Reply to author
Forward
0 new messages