Nested Fields and Hierarchical Data

24 views
Skip to first unread message

Chris K Wensel

unread,
Oct 20, 2016, 12:07:47 AM10/20/16
to cascadi...@googlegroups.com
Hey all

So there have been a few on and off discussions of adding direct support for data structures like JSON or nested domain objects. 

There has always been support for custom objects within a Tuple, but if a Tuple held a Person object, there was no simple means to pop the ‘name’ field off of the Person instance into a custom Function as an argument.

new Each( previous, new Fields( “father.name” ), new CustomFunction( ) );

Or to join two streams on a nested value

new CoGroup( lhs, new Fields( “father.name.last” ), rhs, new Fields( “mother.name.last” ) );

I’ve started a draft of a document open for comment.


Feel free to reply to this thread asking for clarity or proposing alternate use-cases we should support. Or comment in the doc.

After the draft represents what we think can be implemented effectively, I will open a request for a show of hand of those who would benefit from the feature as proposed to see if its worth the effort to implement.

ckw


Dusty OBrien

unread,
Dec 2, 2019, 2:52:16 PM12/2/19
to cascading-user
Hey Chris, this sounds pretty interesting - I've been searching the forum.

We have some data, e.g. from 2 sources with 2 different schemas, but with at least a bare minimum common set of columns that we want to GroupBy (A, B).  All the other columns are different.  I'm trying to imagine how I could use this in:

1. Convert data into a common/merged result set - e.g. maybe with a flexible schema like (type, A, B, originalRecord) -- where type might be a String "someType1" or "someOtherType2" and originalRecord is perhaps JSON representing the whole record of data with all columns in it.  Essentially I'm imagining extracting the common columns A, B into a structured schema, and used type as a static field to describe how I can access the arbitrary data in originalRecord and deal with the different schemas there. e.g.
[ "mother", A1, B1, "{'A':'A1','B':'B1','fieldX':value, 'fieldY':value}" ]
[ "father", A1, B1, "{'A':A1', 'B':'B1', 'fieldZ':'somethingElse', 'fieldAAAAA':{object}}" ]

2. During my Every() (now walking in groups of A,B) -- based on the "type" of a record, use various type-specific accessors into the originalRecord data to accomplish what I need. E.g.
if type==mother then extract fieldX
if type==father then etract fieldAAAAA.subtype

#2 sounds like it's addressed in the document.  Did it ever get implemented?  Also do you have any tips on how to convert a whole record (of Avro) into a single column value ala originalRecord that would be accessible in an Every?

Thanks,
Dusty

Chris K Wensel

unread,
Dec 2, 2019, 4:54:18 PM12/2/19
to cascadi...@googlegroups.com
I didn’t implement this proposal fully.

Cascading 4 does have native support for JSON now. So you can use the JSON operators to extract from and build new JSON objects. 

what wasn’t implemented was the ability to use a hierarchical field name to reach inside the json, but that shouldn’t be necessary here (there are operators for that).

the Cascading operations are based on https://github.com/Heretical/pointer-path

so you should be able to merge parts of multiple json docs into a single normalized doc for grouping. 

unsure of the status of grouping on json objects, serialization is done with the Amazon ION library. 
 
does that help?

ckw


-- 
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/cb207f01-34b8-4645-9467-35aef1199f8a%40googlegroups.com.

Chris K Wensel



Dusty OBrien

unread,
Dec 3, 2019, 10:32:45 AM12/3/19
to cascading-user
Yeah that does help.  I figure I could reach into the JSON myself if I use the whole originalRecord field in the Every.  And now that I think about it more, maybe I can even build the originalRecord field by stuffing every field/value from the Avro object into a new field, and just access it with the existing accessors.  I found a link to the 4.0 WIP release including JSON support - and I'll look at Heretical.  Thanks for the pointers!

Dusty
To unsubscribe from this group and stop receiving emails from it, send an email to cascadi...@googlegroups.com.

Chris K Wensel



Reply all
Reply to author
Forward
0 new messages