Partition key creates directory twice in case of space in value of partition key

48 views
Skip to first unread message

santlal gupta

unread,
Aug 29, 2016, 10:30:21 AM8/29/16
to cascading-user
Hi,

I am trying to create partition table in hive through cascading. Everything is working fine but there is an issue associated with the value having space in partition key. Below is the example for this:

Input : 
f1,f2,f3
1,2,mumbai pune
2,3,pune IND

partition directory :  (Cascading Hive)
C:\user\hive\warehouse\partitiontest\f3=mumbai pune
C:\user\hive\warehouse\partitiontest\f3=mumbai%20pune
C:\user\hive\warehouse\partitiontest\f3=pune IND
C:\user\hive\warehouse\partitiontest\f3=pune%20US

Here, it creates partition directory twice for each partition key.

The above issue occurs only when I run any job on local (Windows) machine. This perfectly works fine on cluster.

I am attaching sample cascading source code, snapshot of created partition directory and input file. 

I also found that when I am creating same partition table though hive cli then it creates only one directory  for each partition(cluster). 

Partition Directory: (Hive cli)
C:\user\hive\warehouse\partitiontest\f3=mumbai pune
C:\user\hive\warehouse\partitiontest\f3=pune IND

Can someone please help me in resolving this?

I have used below version of jar : 
cascading-hadoop2-mr1-3.1.0.jar
cascading-hive-2.0.0.jar
cascading-local-3.1.0.jar
hive-exec-1.2.0.jar
hive-metastore-1.2.0.jar
hive-shims-1.2.0.jar
hive-serde-1.2.0.jar
hadoop-mapreduce-client-common-2.6.0.jar


Thanks
Santlal Gupta

HivePartitionsnapshot.png
PartitionTest.java
partitionInput.txt

Andre Kelpe

unread,
Aug 30, 2016, 6:36:23 AM8/30/16
to cascading-user
This must be related to a windows problem... Since you mention, that
you it works cluster side, there is no real problem. How many
mappers/reducers do you get in your local run? What confuses me, is
that you even get the directories twice with only 2 lines of input.
That would indicate a double execution.

- André
> --
> You received this message because you are subscribed to the Google Groups
> "cascading-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cascading-use...@googlegroups.com.
> To post to this group, send email to cascadi...@googlegroups.com.
> Visit this group at https://groups.google.com/group/cascading-user.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/cascading-user/078e0b6f-5b73-4ba3-9abc-81e18b72b97b%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com

santlal gupta

unread,
Oct 5, 2016, 5:48:26 AM10/5/16
to cascading-user
hi Andre,

I found one mapper is used for local run. when i run it on local and remote i found below difference in log. 

On Window : 
....
INFO io.TapOutputCollector: creating path: f_string2=mumbai pune//part-00000-00000
...
INFO common.FileUtils: Creating directory if it doesn't exist: file:/user/hive/warehouse/bitwise.db/countrywisehivetext3/f_string2=mumbai%20pune
....
INFO io.TapOutputCollector: closing tap collector for: file:/user/hive/warehouse/bitwise.db/countrywisehivetext3/f_string2=mumbai pune/part-00000-00000

On remote : 
.....
INFO [main] cascading.tap.hadoop.io.TapOutputCollector: creating path: f_string2=mumbai pune//part-00000-00000
.....
INFO [main] cascading.tap.hadoop.io.TapOutputCollector: closing tap collector for: /user/hive/warehouse/bitwise.db/countrywisehivetext3/f_string2=mumbai pune/part-00000-00000

Thanks
Santlal Gupta 

 
Reply all
Reply to author
Forward
0 new messages