algo.generic.functor/fmap

31 views
Skip to first unread message

Andy L

unread,
Aug 5, 2015, 11:07:00 PM8/5/15
to PigPen Support
This time a quick question - is there an equivalent of algo.generic.functor/fmap in PigPen.

Andy

Matt Bossenbroek

unread,
Aug 6, 2015, 12:39:39 PM8/6/15
to Andy L, PigPen Support
Since the return type of pig/map is ultimately determined by the platform (but is really a multi-set of sorts), I don't know that this could be possible in pigpen.

Do you have an example of what you'd like to see it do?

-Matt

On Wednesday, August 5, 2015 at 8:06 PM, Andy L wrote:

This time a quick question - is there an equivalent of algo.generic.functor/fmap in PigPen.

Andy

--
You received this message because you are subscribed to the Google Groups "PigPen Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pigpen-suppor...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Andy L

unread,
Aug 7, 2015, 9:06:14 PM8/7/15
to PigPen Support, core....@gmail.com

 
Do you have an example of what you'd like to see it do?

Here would be an example (it should run in all Clojure REPLs without any dependencies):

(def content (slurp "http://samplecsvs.s3.amazonaws.com/Sacramentorealestatetransactions.csv"))
(def sales (->> (clojure.string/split content #"\r") (rest) (map #(clojure.string/split % #","))))
(defn price[t] (Double/parseDouble (t 10)))
(defn fmap* [f m] (into (empty m) (for [[k v] m] [k (f v)])))
(->> sales (group-by second) (fmap* #(reduce + (map price %))) pprint)

where `fmap*` is essentially this: https://github.com/clojure/algo.generic/blob/master/src/main/clojure/clojure/algo/generic/functor.clj#L33 case of `fmap`.

I found myself using above pattern quite often, grouping by a key and then performing a reduce on a specific field of a grouped elements.

There is probably a more idiomatic way of doing that in Pig/PigPen though. Will keep studying.

Thanks,
Andy
 

Matt Bossenbroek

unread,
Aug 10, 2015, 12:28:16 PM8/10/15
to Andy L, PigPen Support
Gotcha. The pigpen.core/group-by is slightly different than clojure.core/group-by. The pigpen version is effectively (comp seq group-by) because in most cases you don't need the whole map at once and being able to compute it in parallel is an advantage. Thus the result of calling pig/group-by on a relation is a sequence of [k v] tuples. You could write your query like this:

=> (->>

     (pig/return sales)

     (pig/group-by second)

     (pig/map (fn [[k v]] [k (reduce + (map price v))]))

     (pig/dump)

     pprint)

(["ROCKLIN" 659.6588399999999]

 ["ROSEVILLE" 1860.739053]

 ["SLOUGHHOUSE" 38.490447]

 ["WEST SACRAMENTO" 115.71755300000001]



And you'd get a sequence of group->sum tuples instead of a map. If you really want a single row that's a map, you can use pig/into to achieve that, but I wouldn't recommend this for large datasets since it consolidates all of the data to a single machine:


=> (->>

     (pig/return sales)

     (pig/group-by second)

     (pig/map (fn [[k v]] [k (reduce + (map price v))]))

     (pig/into {})

     (pig/dump)

     pprint)

({"ROCKLIN" 659.6588399999999,

  "ROSEVILLE" 1860.739053,

  "SLOUGHHOUSE" 38.490447,

  "WEST SACRAMENTO" 115.71755300000001,



You can also use the pigpen.fold namespace to reduce the groupings as well. Here's an example of your query written as a fold:


=> (->>

     (pig/return sales)

     (pig/group-by second {:fold (fold/sum (fold/map price))})

     (pig/dump)

     pprint)

(["ROCKLIN" 659.6588399999999]

 ["ROSEVILLE" 1860.739053]

 ["SLOUGHHOUSE" 38.490447]

 ["WEST SACRAMENTO" 115.71755300000001]


Not everything can be written as a fold, but most interesting things can. Be cautious with using fold however, if the number of values in each group is small, the overhead of doing a fold is greater than the benefit.


See more here: https://github.com/Netflix/PigPen/wiki/Folding-Data



Let me know if that's closer to what you were looking for.



-Matt

Andy L

unread,
Aug 11, 2015, 12:13:21 AM8/11/15
to PigPen Support, core....@gmail.com


On Monday, August 10, 2015 at 9:28:16 AM UTC-7, Matt Bossenbroek wrote:
Gotcha. The pigpen.core/group-by is slightly different than clojure.core/group-by.
 
Thanks. I think I got all building blocks now. It is time to write some code now.

Best regards,
Andy
Reply all
Reply to author
Forward
0 new messages