Denormalizing cascading flow

14 views
Skip to first unread message

Ron Gonzalez

unread,
Nov 16, 2016, 3:31:15 PM11/16/16
to cascading-user
Hi,
  If I have two avro files:

  Table X:

  Id, Name

  a, Value1
  b. Value2
  c, Value3

  Table Y:
  
  Id, Name, X_Id
  d, Value4, a
  e, Value5, a
  f, Value6, b
  g, Value7, b
  h, Value8, b


  Then I want to get

  a, Value1, [ { d, Value4 }, { e, Value5 }]
  b, Value2, [ { f, Value6 }, { g, Value7 }, { h, Value8 }]

 How could I do this with cascading? Schema of Y is fixed across all rows, so creating a target avro schema is acceptable.

Thanks,
Ron

Prabodh Mhalgi

unread,
Nov 17, 2016, 5:51:28 AM11/17/16
to cascading-user
How about using a GroupBy with X.Id as the key field. A custom aggregator will be required to implement a custom logic to derive the third field required in the output.

Thanks,
Prabodh

Ron Gonzalez

unread,
Nov 19, 2016, 8:04:19 PM11/19/16
to cascading-user
Thanks Prabodh.
I was able to solve it by doing a group by and a custom Buffer implementation.
For the cascading avro piece, if it's a list, then you have to make it a list of Tuple instances for cascading avro to support it...

--Ron
Reply all
Reply to author
Forward
0 new messages