FOLLOW UP:map-utils & table-utils

57 views
Skip to first unread message

Sean Devlin

unread,
Nov 2, 2009, 1:27:33 PM11/2/09
to Clojure Dev
Hey everyone,
This is a follow up to two threads I posted a few months ago.

The first was a proposed extension to map-utils:
http://groups.google.com/group/clojure-dev/browse_thread/thread/4b20e40d83095c67#

The second was a proposal for a new table library, table-utils:
http://groups.google.com/group/clojure-dev/browse_thread/thread/b60015723d81aa34#

You can see the code for both here:
http://github.com/francoisdevlin/devlinsf-clojure-utils/tree/master

After see some recent activity on the contrib assembla, I thought I'd
bring these up again. Is there any interest to adding anything from
these libraries to contrib now?

Thanks,
Sean

Timothy Pratley

unread,
Nov 3, 2009, 3:31:11 AM11/3/09
to Clojure Dev
On Nov 3, 5:27 am, Sean Devlin <francoisdev...@gmail.com> wrote:
> Is there any interest to adding anything from
> these libraries to contrib now?

Hi Sean,

I find this very interesting and hope this makes it in. +1 from me.

One specific comment: I think pivot could be replaced according to
some code posted by Alex Osborne:
http://groups.google.com/group/clojure/browse_thread/thread/7a17676521019237
He introduced the concept of reduce-by, which I feel is more general
and better named than 'pivot' (see appendix A for why).

(pivot sale :product-id #(* (:product-price %) (:quantity %)) +)
translates to
(reduce-by :product-id (fn [x row] (+ x (* (:product-price row)
(:quantity row)))) 0 sale)
which is still a little verbose and non-obvious, but can be simplified
further with a helper
(select-sum #(* (:product-price %) (:quantity %)) sale :product-id)
and instead of freq
(select-count sale :product-id)

These helpers are really just in place of being able to pass in + or
freq etc but are easier for me to get my head around because they are
direct sql translations:
SELECT COUNT(1) FROM sale GROUP-BY product-id

What do you think?


Appendix A: Pivot
==============

Pivot to me is about creating a table like
X Y Z
A 1 0 4
B 2 3 1
C 1 1 0
Instead of
A X 1
A Y 0
A Z 4
B X 2
B Y 3
B Z 1
C X 1
C Y 1
C Z 0

Which has no real analogy to me in the collection of maps
representation
(pivot table :axis-1 :axis-2) -> ???
(def sale2 #{{:year 2003, :quarter 1, :amount 1}
{:year 2003, :quarter 2, :amount 2}
{:year 2003, :quarter 3, :amount 3}
{:year 2003, :quarter 4, :amount 4}
{:year 2004, :quarter 1, :amount 5}
{:year 2004, :quarter 2, :amount 6}
{:year 2004, :quarter 3, :amount 7}
{:year 2004, :quarter 4, :amount 8}
{:year 2005, :quarter 1, :amount 9}
{:year 2005, :quarter 2, :amount 10}
{:year 2005, :quarter 3, :amount 0}
{:year 2005, :quarter 4, :amount 0}})
(pivot sale2 :year :quarter +)
2003 1 2 3 4
2004 5 6 7 8
2005 9 10 0 0
But I'm not sure how that should be represented!
Maybe:
#{{:year 2003, [:quarter 1] 1, [:quarter 2] 2, [:quarter 3] 3,
[:quarter 4] 4}
{:year 2004, [:quarter 1] 5, [:quarter 2] 6 ....}}
Yikes, ugly :(
The proposed definition of 'pivot' returns {2003 10, 2004 10, 2005
10}, which isn't really a pivot to me.
I feel that 'pivot' would be a useful operation but should be
different from the 'pivot' you describe.

Example taken from:
http://www.databasejournal.com/features/mssql/article.php/10894_3516331_2/SQL-Pivot-and-Cross-Tab.htm

I suspect reduce-by will be faster from past experience with merging
vs reducing.


Appendix B: Some code snippets
=========================

;Alex Osborne
(defn reduce-by
[grouper f val coll]
(reduce
(fn [m x]
(let [group (grouper x)]
(assoc m group (f (get m group val) x))))
{} coll))

;Alex Osborne
(defn group-by [f coll]
(reduce-by f conj [] coll))

(defn select-count
[coll grouper]
(reduce-by grouper (fn [x _] (inc x)) 0 coll))

(defn select-sum
[summer coll grouper]
(reduce-by grouper (fn [x row] (+ x (summer row))) 0 coll))

;; select-min and select-max are similarly easy and might be useful
;; select-mean select-mode are a bit tricker! could use a vector as
the value with count in one column and value in the other, and output
the result only
;; happy to fill these out if general idea seems sound


Regards,
Tim.

Sean Devlin

unread,
Nov 3, 2009, 9:49:36 AM11/3/09
to Clojure Dev
Tim,
Thanks for the feedback and support. Here are my thoughts

SUMMARY BULLET POINTS:
1. contrib is big enough for both pivot & reduce-by, let's consider
both functions
2. I am open to discussion for the name of pivot, but I think it
works.
3. I don't like select-count & select-sum, because of pivot's arity
overloading.

SUPPORTING PROSE:
reduce-by is certainly useful,and should be added. Contrib is a big
enough place for both functions, my only concern is I'm not sure if
reduce-by belongs in table-utils or seq-utils.

There is one thing I noticed that appears to be different between
reduce-by and pivot. pivot takes a mapping-fn in addition, as you can
see in the signature below.

(defn pivot [coll grouping-fn mapping-fn reduce-fn] ...)

PIVOT
As far as the name goes I can see you point. I would you my pivot
function in combination with juxt to produce a 2D table. Using your
sales data as an example, I would do the following:

(pivot (juxt :year :quarter) :amount +)

This returns the following map (admittedly doesn't do much, but that's
because of the source data) :

{ [2003 1] 1
[2003 2] 2
[2003 3] 3
[2003 4] 4
[2004 1] 5
[2004 2] 6
[2004 3] 7
[2004 4] 8
...}

If we do the following, we can get some interesting information.

;;Returns a hash-map
(def quarter-intervals (pivot sales2 (juxt :year :quarter) :amount +))

Ironically, you can index the data set in a very java-like way.
;;first quarter '03
(quarter-intervals [2003 1])

You could isolate information by year or quarter information like so
(def info-2003 (filter-map-keys (comp #{2003} first) quarter-
intervals))
(def info-q3 (filter-map-keys (comp #{2003} second) quarter-
intervals))

We can find annual and seasonal trends like so:

;;enumerate over entries
(def by-year (pivot quarter-intervals (comp first first) second +))
(def by-quarter (pivot quarter-intervals (comp second first) second
+))

Since it is based on juxt, it is easy enough to add extra
information. Suppose our info included :sales-rep and :store. We
simply add the info to the juxt closure.

(pivot (juxt :year :quarter :sales-rep :store) :amount +)

You can generalize it further from here.

SPECIAL FNS
I'm not sure how I feel about special functions select-count and
select-sum, specifically because of how the airity overloading works
with pivot. I've defined it as follows

(defn pivot
([coll grouping-fn] (pivot coll grouping-fn freq +))
([coll grouping-fn mapping-fn] (pivot coll grouping-fn mapping-fn
+))
([coll grouping-fn mapping-fn reduce-fn] ...) ;do heavy lifting
)

Still if you have an idea for other special functions I'd like to
discuss it.
Sean

Allen Rohner

unread,
Nov 4, 2009, 11:46:38 PM11/4/09
to Clojure Dev
I actually showed up on Clojure-dev tonight to suggest exactly map-
keys, map-vals, and saw this post. I like map-keys and map-vals as-is.
I don't really understand how filter-map is supposed to work. Is f
called on the seq pair? how does even? work on that?

In addition, I would also like filter-keys and filter-vals. In each
case, if f does not return true for a key/val, the pair is dropped
from the map.

So +1 for those functions. I have no opinion on the rest.

Allen

On Nov 2, 12:27 pm, Sean Devlin <francoisdev...@gmail.com> wrote:
> Hey everyone,
> This is a follow up to two threads I posted a few months ago.
>
> The first was a proposed extension to map-utils:http://groups.google.com/group/clojure-dev/browse_thread/thread/4b20e...
>
> The second was a proposal for a new table library, table-utils:http://groups.google.com/group/clojure-dev/browse_thread/thread/b6001...

Timothy Pratley

unread,
Nov 5, 2009, 3:09:19 AM11/5/09
to Clojure Dev
Hi Sean,

Thanks for taking the time to explain.

On Nov 4, 1:49 am, Sean Devlin <francoisdev...@gmail.com> wrote:
> Still if you have an idea for other special functions I'd like to
> discuss it.

The main thrust of my argument is that the pivot arities could be more
easily understood (by me!) if named more like this:

;; general queries
group-by [coll grouping-fn]
count-by [coll grouping-fn]
sum-by [coll grouping-fn mapping-fn]
max-by [coll grouping-fn mapping-fn]
min-by [coll grouping-fn mapping-fn]
avg-by [coll grouping-fn mapping-fn]
stats-by [coll grouping-fn mapping-fn]

;; customizable queries
reduce-by [coll grouping-fn reduce-fn]
reduce-mapped-by [coll grouping-fn reduce-fn mapping-fn]

ie: general queries mimic the default aggregate SQL operators. stats-
by is not a default aggregate operator but I list it because getting
[mean mode standard-deviation] is far more useful than avg. For
customizable queries I'm suggesting pivot is really a reduce-by with a
per item mapping. This name swap is based upon my notion that the
result is not a crosstab, but an aggregate (I'll expand on that
later).


> 2.  I am open to discussion for the name of pivot, but I think it
> works.

The above naming system is easier for me to understand but that is
because of my preconceptions. There is nothing wrong with 'pivot' if
that's what makes sense to you. Maybe it is indeed the perfectly
appropriate term. Clearly you are using it very adeptly and
successfully and I think you understand the term much better than I
do. As a consumer of the library I have the option of using my names
anyway with some trivial wrapping.


> 3.  I don't like select-count & select-sum, because of pivot's arity
> overloading.

I realize now that suggesting them as 'select' was not right at all -
that was just an artifact of trying to match the kinds of queries I
was thinking of - which are aggregates (or more correctly the
aggregation part of a query).


> (pivot (juxt :year :quarter) :amount +)

Yes, but that produces a list of aggregates, not a crosstab. This is
somewhat moot because it is easy to transform or display it as a
table... but in my mind pivot -> crosstab, aggregate -> list of
values. Not meaning to labor the point, just trying to be clear about
what I mean when I say the output is not a crosstab (or a 2D matrix).
Again those are just my preconceptions.


> (pivot (juxt :year :quarter :sales-rep :store) :amount +)

Would be equivalent to
(sum-by sales (juxt :year :quarter :sales-rep :store) :amount)


> Since it is based on juxt, it is easy enough to add extra
> information.  Suppose our info included :sales-rep and :store.  We
> simply add the info to the juxt closure.

Yup I totally agree it is a really neat operator...
and 'pivot' is certainly a far more succinct name than 'reduce-mapped-
by'
so in those terms it is a far better choice! :)



Regards,
Tim

Sean Devlin

unread,
Nov 5, 2009, 9:24:29 AM11/5/09
to Clojure Dev
Allen,
Thanks for taking the time to look at this.

To answer your question, filter-map works just like filtering a map
does now. Filtering/mapping a hash map currently applies the function
to each entry, which can be treated like a two element seq. It simply
wraps the resulting seq with a hash-map.

Sean Devlin

unread,
Nov 5, 2009, 10:49:28 AM11/5/09
to Clojure Dev
Tim,
I'm sold on specialized fns. Seeing them all in a list helped. Also,
I realized that the overloading I chose may be good for a personal
library, it could be confusing to someone new to contrib.

After reading Alex's points, there seem to be five key inputs to these
operations

coll - a collection
m-fn - a mapping fn
r-fn - a reducing fn
g-fn - a grouping fn
init - reduction initialization.

Of the five inputs, coll is the only one that will always be free.
I'll get back to this later.

As I see it, there are the following general things left to discuss:
1. Function name
Let's ignore this for now, and use the placeholder *gmr* (group map
reduce). Is that okay?

2. Function signature

Alex Osborne made the following point in a private email:

<blockquote>
Another question is whether the collection should go at the beginning
or the end of the argument list. I put it a the end as this is
consistent with map and reduce. This also makes it read clearer in
English:

(sum-by :type :cost products) ; "Sum by :type the :cost of products"

(sum-by products :type :cost) ; "Sum products by :type ... :cost ??"

It also means you can do things like (partial sum-by :type :cost).
</blockquote>

I love partial application. I think Alex's code could really be on to
something. His stuff is here:
http://github.com/ato/clojure-utils/blob/master/src/org/meshy/seq_utils.clj

I need some time to think about this, and really get a good feel for
the library. What variables should be fixed, what should be free, and
how does this change by arity?

I'll play with this over the weekend, and let you know what I came up
with.

3. What is 2D?

Just a fun thing to think about - what makes a data structure 2D? 3D?
nD? A 2D array in C is simply an array of arrays in continuous
memory, so that indexing is fast. As a thought exercise, design a
sparse matrix class in C/Java. The wiki article is a good place to
start.

http://en.wikipedia.org/wiki/Sparse_matrix

I think the juxt approach qualifies as 2D. Just an opinion :)

Oh, and I'm glad you like juxt. It's constantly surprising me with
its applications, too.

Timothy Pratley

unread,
Nov 6, 2009, 7:29:41 AM11/6/09
to Clojure Dev
Hi Sean,

> 3. What is 2D?

Indeed! Let's explore some data transforms...

Back to a silly example:
(def sale2 #{{:year 2003, :quarter 1, :amount 1}
{:year 2003, :quarter 2, :amount 2}
{:year 2003, :quarter 3, :amount 3}
{:year 2003, :quarter 4, :amount 4}
{:year 2004, :quarter 1, :amount 5}
{:year 2004, :quarter 2, :amount 6}
{:year 2004, :quarter 3, :amount 7}
{:year 2004, :quarter 4, :amount 8}
{:year 2005, :quarter 1, :amount 9}
{:year 2005, :quarter 2, :amount 10}
{:year 2005, :quarter 3, :amount 0}
{:year 2005, :quarter 4, :amount 0}})

; Where we want to analyse by year and quarter some value
(def ag (sum-by (juxt :year :quarter) :amount sale2))
(println "aggregate:" ag)
; Great! but the format is not very readable...
; Now we want to transform this to some sort of 2D table output,
; We can make a table by collecting all possible values for each axis
(def rows (reduce conj #{} (map first (keys ag))))
(println "rows:" rows)
(def cols (reduce conj #{} (map second (keys ag))))
(println "cols:" cols)
; Then build the table from the aggregates:
(use 'clojure.contrib.prxml)
(prxml
[:table
[:tr [:td ":year\\:quarter"] (for [c cols] [:td c])]
(for [r rows]
[:tr [:td r] (for [c cols] [:td (ag [r c])])])])

;;; TADA!
:year\:quarter 1 2 3 4
2003 1 2 3 4
2004 5 6 7 8
2005 9 10 0 0

Obviously this simple example could be generalized to n-dimensions,
represented as 3d heat-maps, or bitmaps over time, or 3d over time -
but I can't think of any real applications for > 2D data mining (just
a lack of imagination). But I do think the 2d case has a lot of
applications as people are familiar with it from excel and SQL and it
is easy to visualize.

Now the html printing is very view specific, we could easily rip that
out and replace it with a lazy sequence of lazy sequences, and it
would be to my mind a nice 2D abstraction of the results. Package that
up with a function that takes 2 axis inputs (or more if you wanted to
support n-dim) and a collection and I could do this:
(def ss (mystery-function :year :quarter sales2))
(prxml [:table (for [r ss] [:tr (for [c r] [:td c])])])

Very easy - very powerful :)


Regards,
Tim.

Sean Devlin

unread,
Nov 10, 2009, 11:50:38 PM11/10/09
to Clojure Dev
Well, this thread has generated lots of good discussion. I'd like to
thank everyone who's provided feedback so far.

At this point I'd like to do two things.
1. I've opened Assembla ticket 45 for c.c.map-utils. I'd like to work
on getting some of the previously mentioned functionality patched into
contrib. Personally, my first priority is putting this patch to bed.

2. Continue the discussion about table-utils. There's still some good
work to be done here, and I don't want to rush it. I'm surprised no
one has said anything about the join lib yet. In addition to that,
Rich still needs to decide if he wants this lib as part of contrib.

Thanks for everyone's help so far. I think this is moving in the
right direction.

Sean

Chouser

unread,
Nov 17, 2009, 11:15:46 PM11/17/09
to cloju...@googlegroups.com
On Tue, Nov 10, 2009 at 11:50 PM, Sean Devlin <francoi...@gmail.com> wrote:
> Well, this thread has generated lots of good discussion.  I'd like to
> thank everyone who's provided feedback so far.
>
> At this point I'd like to do two things.
> 1. I've opened Assembla ticket 45 for c.c.map-utils.  I'd like to work
> on getting some of the previously mentioned functionality patched into
> contrib. Personally, my first priority is putting this patch to bed.

I apologize for not jumping into this conversation earlier.

Are there sufficient use cases for both trans and trans*? If so,
can we come up with names that convey a bit more meaning?
I guess I'm not exactly sure when they're meant to be used -- is
there a reason they return functions instead of simply doing the
work the way, for example, 'reduce' and 'into' do?

I don't like deftrans or deftrans* -- if there's some context
that has to do a dozen of such things, let them define the macro
there. Seems like it would be rarely needed.

Similarly I'm not sure the various foo-map are worth having.
To me they seem to save little typing and even less in mental
complexity. Is there some domain where these would be commonly
used? Perhaps they could be defined in some lib more related to
that domain space?

map-vals is something I see people ask pretty often. It can be
written quite easily when needed:

(zipmap (keys coll) (map f (vals coll)))

...but it may be worth having in contrib anyway.

I'm a bit less certain about map-keys. Is it something people
need? To me it seems a bit unusual to be mapping across keys,
and again the non-merge case can be written easily with zipmap.
I guess I'm open to persuasion on this one.

Please don't take any of this to suggest I have any authority
that I don't. I'm simply talking about my willingness to be the
one to apply this patch to contrib. If some other committer
wants to take this patch as is or with different changes than I'm
suggesting, I'll stay out of the way.

And again I'm sorry for not having weighed in earlier in the
process. Thanks, Sean, for your work on this patch and for being
diligent and gracious in bringing it to my attention.

--Chouser

Sean Devlin

unread,
Nov 18, 2009, 12:55:46 AM11/18/09
to Clojure Dev
Chouser,
After reading your comments, I can see how this patch is a little
function heavy. I thought about it for a while, and I think the
following functions are core to my extension

map-vals
map-keys
trans/trans*

map-vals
map-vals does come up on the list fairly frequently. I think this is
worth standardizing because it's a wheel people constantly re-invent,
and as such prone to error. To pick on your implementation in
particular (sorry)

(let [f identity]
(zipmap (keys coll) (map f (vals coll))))

I suspect this is a bug waiting to happen, because I don't think that
(keys coll) is guaranteed to return the arguments in the same order as
(vals coll).

trans/trans*
Before I get to involved, the name of these closures is negotiable.
That being said, let's go...

trans is designed to be used for adapting maps. The idea was to
combine it with destructuring to increase re-use, amongst other
things. It's common to have some type of fn like this:

(defn my-common-fn [{a :a b :b}] (* a b))

However, NONE of the half dozen data sources you query mention :a
or :b. They call it :pizza :barbeque or :dog :cat. Write enough
business software and you actually see this.

So, why does tran return a closure, instead of doing the work
directly? This has to do with how I use trans. It's usually in a
situation like:

* function composition, e.g. (comp my-common-fn
(trans :a :pizza :b :barbeque))
* an argument in map (map (trans :a :pizza ...) ...)
* an argument in my join library

Also, you may have noticed trans is variadic. Since it returns a
closure, I don't have to worry about where to place arguments in
partial.

map-keys
This is the most unusual, but when it's useful, it's useful. Suppose
you have a database that have a column a_node. A query returns a list
of maps, and each entry looks something like

{:a_node "More Data"
...}

And you've also got an XML file, a dump from a database, and each node
is capitalized

<A_NODE > Data </A_NODE>...

Using clojure.xml/parse, this will return a map with keys

{:A_NODE "Data"
...}

Oh, an you already have a lot of code that works on the database/list
of maps version. It should be easy enough to reuse everything, but
you can't, because some XML library accidently capitalized
everything. Stupid XML.

map-keys to the rescue. For each hashmap you simply do the following

(map-keys (comp keyword #(.toLowerCase %) name) an-entry)

I also have a second use for map-keys, in my join library. In order
to explain it, I need to introduce a function into str-utils2. I'll
save that for another post... or two.

So, in summary
1. I agree that my patch can be reduced
2. I'd like to include map-vals, map-keys, and trans/trans*
3. trans' name is negotiable

Thanks for posting your concerns.
Sean

On Nov 17, 11:15 pm, Chouser <chou...@gmail.com> wrote:

Timothy Pratley

unread,
Nov 18, 2009, 6:51:06 AM11/18/09
to Clojure Dev


On Nov 18, 4:55 pm, Sean Devlin <francoisdev...@gmail.com> wrote:
> 3.  trans' name is negotiable

How about (as)?
Just brainstorming - I haven't thought it through much, but it might
work?

Chouser

unread,
Nov 18, 2009, 10:36:09 AM11/18/09
to cloju...@googlegroups.com
On Wed, Nov 18, 2009 at 12:55 AM, Sean Devlin <francoi...@gmail.com> wrote:
>
> map-vals
> map-vals does come up on the list fairly frequently.  I think this is
> worth standardizing because it's a wheel people constantly re-invent,
> and as such prone to error.  To pick on your implementation in
> particular (sorry)

Heh, no need to apologize. Pick away.

> (let [f identity]
>        (zipmap (keys coll) (map f (vals coll))))
>
> I suspect this is a bug waiting to happen, because I don't think that
> (keys coll) is guaranteed to return the arguments in the same order as
> (vals coll).

They will both return items in the same order as 'seq' would,
which is all the same order. Hash maps make no guarantee about
what that order *is* but it will be the same for the same
immutable hash-map object. So that particular potential bug
isn't there.

The benefits of factoring out common code are well known and
frequently discussed.

But there are a couple benefits of *not* having fns to wrap small
combinations of builtins like this:

1. People reading your code can need to be familiar with fewer
fns in order to make sense of what you're doing. zipmap,
key, map, and vals are all more commonly used and therefore
more likely known to the reader than map-vals is likely to
be.
2. If any of those builtins are not known to the reader, when
they learn about them they will be learning more broadly
useful functions. zipmap is useful in more different
scenarios than map-vals, so taking the time to learn what it
does will provide bigger bang for the buck.
3. When you're familiar with the builtins, the variety of ways
they can be used, and some common idioms for using them, you
have power to solve problems that are similar but slightly
different. For example, say you want to map on both keys
and values -- if you know the zipmap idiom above, the
solution is obvious. If you only know about map-keys and
map-vals you're likely to be driven to clumsier and less
efficient combination of the two.

So when writing my own helper functions, I try to weigh the
benefits of each. In general the larger and more complex the
repeated code, the more likely I am to favor factoring it out.
But honestly I'm frequently pretty ambivalent about my
conclusions, especially on these smallish functions.

> trans/trans*
> Before I get to involved, the name of these closures is negotiable.
> That being said, let's go...
>
> trans is designed to be used for adapting maps.  The idea was to
> combine it with destructuring to increase re-use, amongst other
> things.  It's common to have some type of fn like this:
>
> (defn my-common-fn [{a :a b :b}] (* a b))
>
> However, NONE of the half dozen data sources you query mention :a
> or :b.  They call it :pizza :barbeque or :dog :cat.  Write enough
> business software and you actually see this.

Ok, now we're talking. I was writing code like this just
a couple days ago. For business no less.

> So, why does tran return a closure, instead of doing the work
> directly?  This has to do with how I use trans.  It's usually in a
> situation like:
>
> * function composition, e.g. (comp my-common-fn
> (trans :a :pizza :b :barbeque))
> * an argument in map (map (trans :a :pizza ...) ...)
> * an argument in my join library
>
> Also, you may have noticed trans is variadic.  Since it returns a
> closure, I don't have to worry about where to place arguments in
> partial.

Hmmm... In my case I just wrote stuff like:

(use '[clojure.set :only (rename-keys)])

(-> input-map
(rename-keys {:id :new-id, :person-name :name})
(select-keys [:new-id :name :foo :bar]))

It seems to me the 'count' and 'inc' of key "a" from the examples
in your docs would fit into a -> form like this pretty nicely,
without generating extra closures.

On the other hand it would be wordier than your trans examples.
I guess if you find 'trans' compelling I'll quit whining about
it.

> map-keys
[snip]
> It should be easy enough to reuse everything, but
> you can't, because some XML library accidently capitalized
> everything.  Stupid XML.

ok ok, you've convinced me. stupid xml.

> map-keys to the rescue.  For each hashmap you simply do the following
>
> (map-keys (comp keyword #(.toLowerCase %) name) an-entry)

To me, the merge-with case is the most compelling. If you don't
need merge-with, zipmap will again work nicely and pretty
succinctly:

(zipmap (map (comp keyword #(.toLowerCase %) name) (keys x)) (vals x))

But if you're going to have key collisions, zipmap's not going to
cut it and you need to rework your whole expression to use
merge-with instead. ...which is also a good reason to have one
fn (map-keys) that can handle both cases without disruption.

It might be worth calling out explicitly in the docstring that
f applies to keys but merge-fn applies to values.

--Chouser

Sean Devlin

unread,
Nov 24, 2009, 10:04:09 AM11/24/09
to Clojure Dev
Why have an idiom when you can have a function instead? Isn't that
the whole point of Lisp?

I've been thinking for a week. I think my previous map-utils proposal
is exactly the WRONG way to go. I'd like to propose an entirely new
technique, based on some of the work I've done recently.

I posted this in the main thread yesterday:

http://groups.google.com/group/clojure/browse_thread/thread/7bb01c23596f6636

Let's use a visitor pattern instead. Way less code, way more
flexible. Way more future-proof.

I posted an example of predicate visitors yesterday. Here they are
again in terms of my visitor function:

;Works w/ predicate functions
(def keys-pred (visitor #(comp % key) (partial into {}))
(def vals-pred (visitor #(comp % val) (partial into {}))

Here's how the set of map functions would look.

(def keys-entry
(visitor
#(juxt (comp % key) val)
(partial into {})))

(def vals-entry
(visitor
#(juxt key (comp % val))
(partial into {})))

The merge case is a little more complicated:

(defn keys-entry-merge
"Like visit keys, but takes a merge function to resolve keys
collisions."
[merge-fn & args]
(apply (visitor
#(juxt (comp % key) val)
(comp
(partial apply merge-with merge-fn)
(partial map (p apply hash-map))))
args))

So, what's this look like in use? Let's start w/ predicate fns.

user=>(def test-map {"a" 1 "b" 2 "c" 3})

user=>(vals-pred filter odd? test-map)
{"a" 1 "c" 3}

user=>(vals-pred remove odd? test-map)
{"b" 2}

user=>(keys-pred filter #{"a"} test-map)
{"a" 1}

user=>(keys-pred remove #{"a"} test-map)
{"b" 2 "c" 3}

This also works with take-while/drop-while, the visitor pattern lets
you do that. I don't know WHY one would use them, but you could.

Here's how the entry visitors look

user=>(vals-entry map inc test-map)
{"a" 2 "b" 3 "c" 4}

user=>(keys-entry map #(.toUpperCase %) test-map)
{"A" 1 "B" 2 "C" 3}

And here's the merge case

user=>(keys-entry-merge + map (constantly "Example") test-map)
{"Example" 6}

I like that this doesn't actually produce any new sequence functions,
rather it decorates ones that already exist. I think it's a clear win
in the predicate versions. However, I'm not 100% sure about the entry
versions.

Can anyone think of a use case besides mapping operations?

Thanks,
Sean

On Nov 18, 10:36 am, Chouser <chou...@gmail.com> wrote:
Reply all
Reply to author
Forward
0 new messages