cascading.tap.TapException: unable to parse partition given parent: hdfs://Ubuntu:8020/user/hive/warehouse/parhivenew and child: null

gurdit singh

unread,

Sep 17, 2015, 9:55:10 AM9/17/15

to cascading-user

Hi,

I am using HiveTap/PartitionHiveTap (from cascading-hive project) in my use case. I am trying to read the partition table from hive in parquet format. While reading I am getting exception. Below is the stacktrace:

2015-09-16 16:07:41,054 ERROR [main] cascading.flow.stream.element.SourceStage: caught throwable
cascading.tap.TapException: unable to parse partition given parent: hdfs://Ubuntu:8020/user/hive/warehouse/parhivenew and child: null
at cascading.tap.partition.PartitionTupleEntryIterator.<init>(PartitionTupleEntryIterator.java:53)
at cascading.tap.partition.BasePartitionTap$PartitionIterator.createPartitionEntryIterator(BasePartitionTap.java:90)
at cascading.tap.partition.BasePartitionTap$PartitionIterator.<init>(BasePartitionTap.java:73)
at cascading.tap.partition.BasePartitionTap.openForRead(BasePartitionTap.java:343)
at cascading.tap.hadoop.PartitionTap.openForRead(PartitionTap.java:214)
at cascading.tap.hadoop.PartitionTap.openForRead(PartitionTap.java:79)
at cascading.flow.stream.element.SourceStage.map(SourceStage.java:82)
at cascading.flow.stream.element.SourceStage.run(SourceStage.java:66)
at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:142)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
2015-09-16 16:07:41,065 INFO [main] cascading.flow.hadoop.FlowMapper: flow node id: A3AB50DBB4564950AD35906C6F824DCF, mem on close (mb), free: 89, total: 216
, max: 729

I came to know that PartitionHiveTap is not able to read the child directory name from table directory. Then I tried to look into the source code, I looked at PartitionTupleEntryIterator.java and found that it childIdentifier is null and hence I am getting exception.

Then I have tried to run the same code for partition table stored in text format. For that code is working fine and I don't get the exception.

Is there anything wrong in my code? Can someone please help me in resolving this issue. Below is my code:

Fields fields = new Fields("b", "c", "a").applyTypes(new Type[] {
String.class, String.class, String.class });

HiveTableDescriptor hiveTableDescriptor = new HiveTableDescriptor(
"partition_hive", new String[] { "b", "c", "a" }, new String[] {
"string", "string", "string" }, new String[] { "a" });

HiveTap hiveTap = new HiveTap(hiveTableDescriptor,
new ParquetTupleScheme(hiveTableDescriptor),
SinkMode.REPLACE, true);
Tap source = new HivePartitionTap(hiveTap);

Tap sink = new Hfs(new TextDelimited( false, ","), "data/file_out");.

Pipe pipe = new Pipe("pipe");
FlowDef def = FlowDef.flowDef().addSource(pipe, source)
.addTailSink(pipe, sink);

new Hadoop2MR1FlowConnector().connect(def).complete();

Thanks.

gurdit singh

unread,

Sep 17, 2015, 9:59:37 AM9/17/15

to cascading-user

Missed to add some more details about below issue.

I am using cascading-version - 3.0

hive version - 1.2

hadoop version - Hadoop 2.6.0-cdh5.4.2

Andre Kelpe

unread,

Sep 17, 2015, 11:08:51 AM9/17/15

to cascading-user

Can you share a listing of the hdfs://Ubuntu:8020/user/hive/warehouse/parhivenew directory?

- André

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/c11dc0a0-d461-4808-af66-2598d3866ef0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

André Kelpe
an...@concurrentinc.com
http://concurrentinc.com

gurdit singh

unread,

Sep 17, 2015, 12:10:37 PM9/17/15

to cascading-user

Below is my directory structure:

user@Ubuntu:~$ hadoop fs -ls /user/hive/warehouse/parhivenew
Found 3 items
drwxr-xr-x - user hadoop 0 2015-09-16 14:58 /user/hive/warehouse/parhivenew/a=india
drwxr-xr-x - user hadoop 0 2015-09-16 14:58 /user/hive/warehouse/parhivenew/a=la
drwxr-xr-x - user hadoop 0 2015-09-16 14:58 /user/hive/warehouse/parhivenew/a=us

Below is the query for the same table:

hive> select * from parhivenew;
OK
2001-03-10 3 india
2001-05-15 1 india
2001-01-12 4 la
2001-02-23 2 us
Time taken: 0.686 seconds,

hive> describe parhivenew;
OK
b string created by Cascading
c string created by Cascading
a string

# Partition Information
# col_name data_type comment

a string
Time taken: 0.119 seconds, Fetched: 8 row(s)

gurdit singh

unread,

Sep 18, 2015, 6:53:07 AM9/18/15

to cascading-user

Hi,

Can someone please suggest some solution for this.

Reply all

Reply to author

Forward