Nested Fields and Hierarchical Data

Chris K Wensel

unread,

Oct 20, 2016, 12:07:47 AM10/20/16

to cascadi...@googlegroups.com

Hey all

So there have been a few on and off discussions of adding direct support for data structures like JSON or nested domain objects.

There has always been support for custom objects within a Tuple, but if a Tuple held a Person object, there was no simple means to pop the ‘name’ field off of the Person instance into a custom Function as an argument.

new Each( previous, new Fields( “father.name” ), new CustomFunction( ) );

Or to join two streams on a nested value

new CoGroup( lhs, new Fields( “father.name.last” ), rhs, new Fields( “mother.name.last” ) );

I’ve started a draft of a document open for comment.

https://docs.google.com/document/d/1eQLboFoz2URL9jYvU7KC4wBETCowzdErRMS78Adc380/edit?usp=sharing

Feel free to reply to this thread asking for clarity or proposing alternate use-cases we should support. Or comment in the doc.

After the draft represents what we think can be implemented effectively, I will open a request for a show of hand of those who would benefit from the feature as proposed to see if its worth the effort to implement.

ckw

—

Chris K Wensel

415-203-5022

ch...@wensel.net

https://www.linkedin.com/in/cwensel

Dusty OBrien

unread,

Dec 2, 2019, 2:52:16 PM12/2/19

to cascading-user

Hey Chris, this sounds pretty interesting - I've been searching the forum.

We have some data, e.g. from 2 sources with 2 different schemas, but with at least a bare minimum common set of columns that we want to GroupBy (A, B). All the other columns are different. I'm trying to imagine how I could use this in:

1. Convert data into a common/merged result set - e.g. maybe with a flexible schema like (type, A, B, originalRecord) -- where type might be a String "someType1" or "someOtherType2" and originalRecord is perhaps JSON representing the whole record of data with all columns in it. Essentially I'm imagining extracting the common columns A, B into a structured schema, and used type as a static field to describe how I can access the arbitrary data in originalRecord and deal with the different schemas there. e.g.

[ "mother", A1, B1, "{'A':'A1','B':'B1','fieldX':value, 'fieldY':value}" ]

[ "father", A1, B1, "{'A':A1', 'B':'B1', 'fieldZ':'somethingElse', 'fieldAAAAA':{object}}" ]

2. During my Every() (now walking in groups of A,B) -- based on the "type" of a record, use various type-specific accessors into the originalRecord data to accomplish what I need. E.g.

if type==mother then extract fieldX

if type==father then etract fieldAAAAA.subtype

#2 sounds like it's addressed in the document. Did it ever get implemented? Also do you have any tips on how to convert a whole record (of Avro) into a single column value ala originalRecord that would be accessible in an Every?

Thanks,
Dusty

Chris K Wensel

unread,

Dec 2, 2019, 4:54:18 PM12/2/19

to cascadi...@googlegroups.com

I didn’t implement this proposal fully.

Cascading 4 does have native support for JSON now. So you can use the JSON operators to extract from and build new JSON objects.

what wasn’t implemented was the ability to use a hierarchical field name to reach inside the json, but that shouldn’t be necessary here (there are operators for that).

the Cascading operations are based on https://github.com/Heretical/pointer-path

so you should be able to merge parts of multiple json docs into a single normalized doc for grouping.

unsure of the status of grouping on json objects, serialization is done with the Amazon ION library.

does that help?

ckw

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/cb207f01-34b8-4645-9467-35aef1199f8a%40googlegroups.com.

—

Chris K Wensel

ch...@wensel.net

Dusty OBrien

unread,

Dec 3, 2019, 10:32:45 AM12/3/19

to cascading-user

Yeah that does help. I figure I could reach into the JSON myself if I use the whole originalRecord field in the Every. And now that I think about it more, maybe I can even build the originalRecord field by stuffing every field/value from the Avro object into a new field, and just access it with the existing accessors. I found a link to the 4.0 WIP release including JSON support - and I'll look at Heretical. Thanks for the pointers!

Dusty

To unsubscribe from this group and stop receiving emails from it, send an email to cascadi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/cb207f01-34b8-4645-9467-35aef1199f8a%40googlegroups.com.

—
Chris K Wensel
ch...@wensel.net

Reply all

Reply to author

Forward