I am using HiveTap/PartitionHiveTap (from cascading-hive project) in my use case. I am trying to read the partition table from hive in parquet format. While reading I am getting exception. Below is the stacktrace:
2015-09-16 16:07:41,054 ERROR [main] cascading.flow.stream.element.SourceStage: caught throwable
cascading.tap.TapException: unable to parse partition given parent: hdfs://Ubuntu:8020/user/hive/warehouse/parhivenew and child: null
at cascading.tap.partition.PartitionTupleEntryIterator.<init>(PartitionTupleEntryIterator.java:53)
at cascading.tap.partition.BasePartitionTap$PartitionIterator.createPartitionEntryIterator(BasePartitionTap.java:90)
at cascading.tap.partition.BasePartitionTap$PartitionIterator.<init>(BasePartitionTap.java:73)
at cascading.tap.partition.BasePartitionTap.openForRead(BasePartitionTap.java:343)
at cascading.tap.hadoop.PartitionTap.openForRead(PartitionTap.java:214)
at cascading.tap.hadoop.PartitionTap.openForRead(PartitionTap.java:79)
at cascading.flow.stream.element.SourceStage.map(SourceStage.java:82)
at cascading.flow.stream.element.SourceStage.run(SourceStage.java:66)
at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:142)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
2015-09-16 16:07:41,065 INFO [main] cascading.flow.hadoop.FlowMapper: flow node id: A3AB50DBB4564950AD35906C6F824DCF, mem on close (mb), free: 89, total: 216
I came to know that PartitionHiveTap is not able to read the child directory name from table directory. Then I tried to look into the source code, I looked at PartitionTupleEntryIterator.java and found that it childIdentifier is null and hence I am getting exception.
Then I have tried to run the same code for partition table stored in text format. For that code is working fine and I don't get the exception.
Is there anything wrong in my code? Can someone please help me in resolving this issue. Below is my code:
Fields fields = new Fields("b", "c", "a").applyTypes(new Type[] {
String.class, String.class, String.class });
HiveTableDescriptor hiveTableDescriptor = new HiveTableDescriptor(
"partition_hive", new String[] { "b", "c", "a" }, new String[] {
"string", "string", "string" }, new String[] { "a" });
HiveTap hiveTap = new HiveTap(hiveTableDescriptor,
new ParquetTupleScheme(hiveTableDescriptor),
Tap source = new HivePartitionTap(hiveTap);
Tap sink = new Hfs(new TextDelimited( false, ","), "data/file_out");.
Pipe pipe = new Pipe("pipe");
FlowDef def = FlowDef.flowDef().addSource(pipe, source)
.addTailSink(pipe, sink);
new Hadoop2MR1FlowConnector().connect(def).complete();
Thanks.