cascading.tap.TapException: unable to parse partition given parent: hdfs://Ubuntu:8020/user/hive/warehouse/parhivenew and child: null

119 views
Skip to first unread message

gurdit singh

unread,
Sep 17, 2015, 9:55:10 AM9/17/15
to cascading-user
Hi,
I am using HiveTap/PartitionHiveTap (from cascading-hive project) in my use case. I am trying to read the partition table from hive in parquet format. While reading I am getting exception. Below is the stacktrace:

2015-09-16 16:07:41,054 ERROR [main] cascading.flow.stream.element.SourceStage: caught throwable
cascading.tap.TapException: unable to parse partition given parent: hdfs://Ubuntu:8020/user/hive/warehouse/parhivenew and child: null
        at cascading.tap.partition.PartitionTupleEntryIterator.<init>(PartitionTupleEntryIterator.java:53)
        at cascading.tap.partition.BasePartitionTap$PartitionIterator.createPartitionEntryIterator(BasePartitionTap.java:90)
        at cascading.tap.partition.BasePartitionTap$PartitionIterator.<init>(BasePartitionTap.java:73)
        at cascading.tap.partition.BasePartitionTap.openForRead(BasePartitionTap.java:343)
        at cascading.tap.hadoop.PartitionTap.openForRead(PartitionTap.java:214)
        at cascading.tap.hadoop.PartitionTap.openForRead(PartitionTap.java:79)
        at cascading.flow.stream.element.SourceStage.map(SourceStage.java:82)
        at cascading.flow.stream.element.SourceStage.run(SourceStage.java:66)
        at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
2015-09-16 16:07:41,065 INFO [main] cascading.flow.hadoop.FlowMapper: flow node id: A3AB50DBB4564950AD35906C6F824DCF, mem on close (mb), free: 89, total: 216
, max: 729

I came to know that PartitionHiveTap is not able to read the child directory name from table directory. Then I tried to look into the source code, I looked at PartitionTupleEntryIterator.java and found that it childIdentifier is null and hence I am getting exception. 

Then I have tried to run the same code for partition table stored in text format. For that code is working fine and I don't get the exception.

Is there anything wrong in my code? Can someone please help me in resolving this issue. Below is my code:

                 Fields fields = new Fields("b", "c", "a").applyTypes(new Type[] {
                   String.class, String.class, String.class });

HiveTableDescriptor hiveTableDescriptor = new HiveTableDescriptor(
"partition_hive", new String[] { "b", "c", "a" }, new String[] {
"string", "string", "string" }, new String[] { "a" });

HiveTap hiveTap = new HiveTap(hiveTableDescriptor,
new ParquetTupleScheme(hiveTableDescriptor),
SinkMode.REPLACE, true);
Tap source = new HivePartitionTap(hiveTap);
Tap sink = new Hfs(new TextDelimited( false, ","), "data/file_out");.
Pipe pipe = new Pipe("pipe");
FlowDef def = FlowDef.flowDef().addSource(pipe, source)
.addTailSink(pipe, sink);

new Hadoop2MR1FlowConnector().connect(def).complete();
 

Thanks.

gurdit singh

unread,
Sep 17, 2015, 9:59:37 AM9/17/15
to cascading-user
Missed to add some more details about below issue.
I am using cascading-version - 3.0
hive version - 1.2
hadoop version - Hadoop 2.6.0-cdh5.4.2

Andre Kelpe

unread,
Sep 17, 2015, 11:08:51 AM9/17/15
to cascading-user
Can you share a listing of the hdfs://Ubuntu:8020/user/hive/warehouse/parhivenew directory?

- André

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/c11dc0a0-d461-4808-af66-2598d3866ef0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

gurdit singh

unread,
Sep 17, 2015, 12:10:37 PM9/17/15
to cascading-user
Below is my directory structure:
user@Ubuntu:~$ hadoop fs -ls /user/hive/warehouse/parhivenew
Found 3 items
drwxr-xr-x   - user hadoop          0 2015-09-16 14:58 /user/hive/warehouse/parhivenew/a=india
drwxr-xr-x   - user hadoop          0 2015-09-16 14:58 /user/hive/warehouse/parhivenew/a=la
drwxr-xr-x   - user hadoop          0 2015-09-16 14:58 /user/hive/warehouse/parhivenew/a=us

Below is the query for the same table:
hive> select * from  parhivenew;
OK
2001-03-10      3       india
2001-05-15      1       india
2001-01-12      4       la
2001-02-23      2       us
Time taken: 0.686 seconds,

hive> describe parhivenew;
OK
b                       string                  created by Cascading
c                       string                  created by Cascading
a                       string                                      
                 
# Partition Information          
# col_name              data_type               comment             
                 
a                       string                                      
Time taken: 0.119 seconds, Fetched: 8 row(s) 

gurdit singh

unread,
Sep 18, 2015, 6:53:07 AM9/18/15
to cascading-user
Hi,

Can someone please suggest some solution for this.
Reply all
Reply to author
Forward
0 new messages