Partitioning based on Date field in Cascading Hive

26 views
Skip to first unread message

abhishek korpe

unread,
Jun 17, 2016, 3:01:03 AM6/17/16
to cascading-user

Hello,

 

While working on Cascading Hive project, I was trying to create partitions based on date field. I can see that partitions are getting created based on long value of date and not the date value.

 

Cascading job -

public static void main(String[] args) {

                final String INPUT_PATH = "C:\\AllHiveParquetScenarios\\Input\\HiveParquetScenarios.txt";

                String[] colNames = { "f1", "f2", "f3" };

                String[] partitionKey = { "f3" };

                String[] colTypes = { "string", "boolean", "date" };

                DateType dt = new DateType("yyyy-MM-dd");

                Type[] types = { String.class, Boolean.class, dt };

                Fields inputFields = new Fields(colNames, types);

 

                HiveTableDescriptor hiveTableDescriptor = new HiveTableDescriptor(

                                                "bitwise", "DatePartition", colNames, colTypes, partitionKey,

                                                "|");

                Tap inputTap = new Hfs(new TextDelimited(inputFields, false, ","),INPUT_PATH);

                HiveTap hiveTap = new HiveTap(hiveTableDescriptor,

                                                hiveTableDescriptor.toScheme(), SinkMode.REPLACE, false);

 

                HivePartitionTap hivePartitionTap = new HivePartitionTap(hiveTap);

                Pipe pipe = new Pipe("pipe");

                Flow flow = new Hadoop2MR1FlowConnector().connect(inputTap, hivePartitionTap, pipe);

                flow.complete();

}

 

The input records are like -

aaaa,true,1996-05-04

,false,1996-05-01

cccc,false,1996-04-02

dddd,true,1996-06-02

aaaa,,1996-05-07

 

The output folders are created like below -

f3=828403200000

f3=830908800000

f3=831168000000

f3=831427200000

f3=833673600000

 

Below are findings when I debug in source code –

The code tupleEntry.getString( fieldName ) from toPartition() method in cascading.tap.hive.HivePartition class is returning a long value(ideally actual date value should have been returned), which is causing the issue.


Can someone please validate this issue?

 

Thanks,

Abhishek Korpe

Andre Kelpe

unread,
Jun 17, 2016, 6:21:53 AM6/17/16
to cascading-user
Hi,

this looks like we have to coerce the types before returning the partition. Can you open an github issue, so that I don't forget it? Or better, if you have some bandwidth, could you send me a patch?

https://github.com/cascading/cascading-hive


Thanks!

- André

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/b2d8c931-9f5f-4fd9-b875-68bc96014459%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Reply all
Reply to author
Forward
0 new messages