Nested 'Group-By's

23 views
Skip to first unread message

Punit Naik

unread,
Mar 18, 2016, 5:48:20 AM3/18/16
to PigPen Support
I have some data in which I want to perform multiple 'group-by's and they are nested. Can anyone please provide me with a one-liner or anything else for nested groups-by?

Matt Bossenbroek

unread,
Mar 18, 2016, 11:34:22 AM3/18/16
to Punit Naik, PigPen Support
That's a pretty vague request. Do you have an example (maybe in regular clojure) as to what you'd like it to do?

-Matt

On Friday, March 18, 2016 at 2:48 AM, Punit Naik wrote:

I have some data in which I want to perform multiple 'group-by's and they are nested. Can anyone please provide me with a one-liner or anything else for nested groups-by?

--
You received this message because you are subscribed to the Google Groups "PigPen Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pigpen-suppor...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Punit Naik

unread,
Mar 19, 2016, 7:28:19 AM3/19/16
to PigPen Support, naik.p...@gmail.com
Okay. So I did the first group-by on userType like this:

(def data (pig/load-json "/path"))

(def grouped-by-type (pig/group-by :ga_userType data))

And then inside it I wanted to do another group-by so I did this:

(def grouped-by-type-id (pig/map (fn [[type d]] (->> d (pig/group-by :ga_userId))) grouped-by-type))

But when I dumped 'grouped-by-type-id', it gave me this error:

AssertionError Assert failed: (map? relation)  pigpen.raw/eval3191/command--3192/fn--3195 (raw.clj:42)

So how do I do this?

Matt Bossenbroek

unread,
Mar 21, 2016, 12:37:17 AM3/21/16
to Punit Naik, PigPen Support
When you’re inside any user function, it’s just normal clojure. The `pig/` versions are only for the top level script generation.

Just drop the pig/ inside the pig/map fn to use clojure.core/group-by:

(def grouped-by-type-id (pig/map (fn [[type d]] (->> d (group-by :ga_userId))) grouped-by-type))

-Matt

Punit Naik

unread,
Mar 21, 2016, 12:58:54 AM3/21/16
to PigPen Support, naik.p...@gmail.com
Thanks a lot Matt.

Punit Naik

unread,
Mar 22, 2016, 8:05:13 AM3/22/16
to PigPen Support, naik.p...@gmail.com
Hi Matt

So your fix worked. But since I am using group-bys inside group-bys, my final output is only a single line which is a list of lists like this:
[[{"userType":"F","userId":"902785","modId":"9","total_time_spent":16}],[{"userType":"F","userId":"1179688","modId":"9","total_time_spent":207}]]

I wanted the program to write a single map per line.

How do I do this?

Matt Bossenbroek

unread,
Mar 22, 2016, 10:36:20 AM3/22/16
to Punit Naik, PigPen Support
You either need to combine those maps using some aggregation, or use mapcat to flatten the results. If you have a regular clojure example to work with, that would help a lot.

-Matt

Punit Naik

unread,
Mar 23, 2016, 1:12:46 AM3/23/16
to PigPen Support, naik.p...@gmail.com
Thanks again Matt. As my data structure was a list of lists, I did:

->>(

(pig/mapcat concat)

(pig/map #(into {} %)))

And it properly denormalised it.
Reply all
Reply to author
Forward
0 new messages