Re: core/group-by with optional value-mapper function

256 views
Skip to first unread message

Alex Baranosky

unread,
Dec 17, 2012, 3:21:44 AM12/17/12
to clo...@googlegroups.com
I haven't run into this issue (yet).  My first devil's advocate thought was to suggest that you could map over the data after calling the group-by.

(->> (group-by :type animals)
     (map-vals #(map :name %)))

There are two problems with this.  One, it uses a custom util function `map-vals` so it is a bit of a cheat.  Two, even with that it still looks pretty clunky.  

How does the `identity` effect performance?  I wouldn't think much.

Alex

On Fri, Dec 14, 2012 at 9:58 AM, Daniel Dinnyes <dinn...@gmail.com> wrote:
Hi,

I would like to suggest an enhancement to the  clojure.core/group-by  function. The idea came from using Enumerable.GroupBy extension method in .NET quite much. It is really handy to have an optional value-mapper function which transforms the elements before adding them to the collection under the key. It is backward compatible, because calling the overload with 2 parameters can call the 3 parameter one with clojure.corj/identity as value-mapper function.

The implementation is easy-peasy (almost the same as the original):

(defn group-by
  ([f g coll]
     (persistent!
      (reduce
       (fn [ret x]
         (let [k (f x)]
           (assoc! ret k (conj (get ret k []) (g x)))))
       (transient {}) coll)))
  ([f coll]
     (group-by f identity coll)))

Without the value-mapper argument it is very awkward to achieve the same structure after the group-by call. Also, doing the transformation before the group-by is often impossible, because the key function depends on some property of the source element, which would be removed after the transformation.

To demonstrate the usage, check out the below calls:

(def animals [{:name "Betsy" :type :cow}
              {:name "Murmur" :type :cat}
              {:name "Lessie" :type :dog}
              {:name "Dingo" :type :dog}
              {:name "Rosie" :type :cat}
              {:name "Rex" :type :dog}
              {:name "Alf" :type :cat}])

(group-by :type animals) ; old usage
> ... ugly stuff

(group-by :type :name animals) ; new usage
> {:cow ["Betsy"], :cat ["Murmur" "Rosie" "Alf"], :dog ["Lessie" "Dingo" "Rex"]}

(group-by :type #(.toUpperCase (:name %)) animals) ; hell yeah!
> {:cow ["BETSY"], :cat ["MURMUR" "ROSIE" "ALF"], :dog ["LESSIE" "DINGO" "REX"]}


It would be so cool to have this in the core. What do you guys think?

Regards,
Daniel Dinnyes

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Daniel Dinnyes

unread,
Dec 17, 2012, 5:13:17 AM12/17/12
to clo...@googlegroups.com
Hi,

I expect the cost of calling `identity` to be negligible. Not for sure, but the JVM might even inline it at run-time, or there might be optimizations for it in clojure.core during compilation... I cannot comment on that. But even with a full virtual call, it should be faster than iterating the whole map again.

Also, that `map-vals` is still indeed clunkier ;) Different usages, but for me whenever I use `group-by` I very often find I prefer to map the values too (to get a nice streamlined data structure to be passed around for further processing). Just my experience. It was very handy in .NET, and I think it was there for this reason.

Regards,
Daniel

Daniel Dinnyes

unread,
Dec 17, 2012, 7:09:27 AM12/17/12
to clo...@googlegroups.com
Also note that I wrote in my first post that "Without the value-mapper argument it is very awkward to achieve the same structure after the group-by call". The `map-vals` function is almost the closest you can get to map values after a group-by in a streamlined and clean manner. There is `fmap` in the contrib which does a similar thing already though.

An even cleaner mapper would be something like a `map-multi-vals`, so that you can do something like this:

(->> (group-by :type animals)
     (map-multi-vals :name))

That's the cleanest one can get with a separate value-mapper. In my opinion that has little added benefit though, and possibly the performance is worse too. The only benefit would be separation of concern: you can map values of a multi-map without knowing how it was created. Now think about it: how often would you use `map-multi-vals` separately, not right after a group-by? My impression is that whenever an multi-map is created, it almost always involves in some way a `group-by` - which itself is a special case of `reduce`. There is always a `reduce` somewhere, whether an `into`, a `for`, or some imperative iteration. Only `group-by` is the simplest for this specific purpose of creating a multi-map.

My argument therefore is that whenever you need a multi-value mapping, it is always preceded by a group-by, and therefore I feel the right place for the value-mapper is as an optional parameter for `group-by` itself.

What do you think?

Cheers,
Daniel

László Török

unread,
Dec 17, 2012, 7:47:25 AM12/17/12
to clo...@googlegroups.com
Hi,

I have come across use cases in the past where an additional transformation step was indeed very handy and I wrote my own version of group-by, one identical to Daniel's.

Maybe a function worthwhile for c.c.incubator.

Las

2012/12/17 Daniel Dinnyes <dinn...@gmail.com>



--
László Török

Alex Baranosky

unread,
Dec 17, 2012, 4:20:32 PM12/17/12
to clo...@googlegroups.com
I think it sounds like a nice addition, after mulling it over a little.

Alex Walker

unread,
Dec 20, 2012, 12:19:01 PM12/20/12
to clo...@googlegroups.com
I like the idea of it being built-in and might prefer that approach, however, I wanted to share an alternative.


I needed to take a denormalized table of config data and create a nested lookup map, so that I wouldn't need to repeatedly filter the dataset while using it to process a stream of data.

Alex Walker

unread,
Dec 20, 2012, 8:42:47 PM12/20/12
to clo...@googlegroups.com
Realized I forgot to include an example of how the code I linked to can be used to solve the original problem, which will make it clearer how it is used.

https://gist.github.com/4349972

But when writing that example, I remembered why I created factory-style functions.  The group-by from Daniel is assuming we want our result with vectors as values, which is a case that I didn't want to be limited to, even though the existing group-by results in values that are a vector of maps.

Merging the two implementations yields:  https://gist.github.com/4350075

Basically, just giving more responsibility/power to the g fn, which makes producing the same result as the original look like this instead (a lot busier):

(x-group-by :type #(conj (or % []) (.toUpperCase (:name %2))) animals)

=> {:cow ["BETSY"], :cat ["MURMUR" "ROSIE" "ALF"], :dog ["LESSIE" "DINGO" "REX"]}
Reply all
Reply to author
Forward
0 new messages