Mapping output of Hourglss jobs to hive tables

9 views
Skip to first unread message

Abhishek Gayakwad

unread,
Feb 12, 2014, 5:06:50 AM2/12/14
to dat...@googlegroups.com
Hello,

After running a partition collapsing or preserving job, the generated container file has schema as PartitionPreservingIncrementalJobOutput/PartitionCollapsingIncrementalJobOutput which further has key and value record types in it. When I create hive tables using this data, it has two columns for key and value of struct type. This takes away readability and is not what I want. I want to store only value object in output file. I there any way where I can get rid off Partition*JobOutput schema and avoid writing keys as well ?

Thanks 
Abhishek

Matthew Hayes

unread,
Feb 12, 2014, 12:20:44 PM2/12/14
to dat...@googlegroups.com, DataFu
The jobs have methods getOutputSchemaName() and getOutputSchemaNamespace() that can be overridden.  By default the strings are being derived from the class and its package.  Just extend PartitionCollapsingIncrementalJob for example and override them.  I just filed DATAFU-32 to make it easier to override the defaults.

Regarding your other question about the key, when you construct the hive table can you not ignore the key? 


--
You received this message because you are subscribed to the Google Groups "DataFu" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datafu+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply all
Reply to author
Forward
0 new messages