HivePartitionTap is not working with cascading flink

74 views
Skip to first unread message

santlal gupta

unread,
Oct 12, 2016, 10:36:50 AM10/12/16
to cascading-user
Hi,

I am new to cascading-flink. 
I am trying to use HivePartitionTap  with FlinkFlowConnector on local,  but it fails and gives SQLIntegrityConstraintViolationException and AlreadyExistsException.
When I had tried same example without HivePartitionTap i.e. using HiveTap then it works fine.

Also when i had tried same example by using HivePartitionTap with Hadoop2Mr1FlowConnector then it runs successfully. 

For your reference i am attaching source example and input.

Exception : 

java.sql.SQLIntegrityConstraintViolationException: The statement was aborted because it would have caused a duplicate key value in a unique or primary key constraint or unique index identified by 'UNIQUETABLE' defined on 'TBLS'.
at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)
.....
Caused by: ERROR 23505: The statement was aborted because it would have caused a duplicate key value in a unique or primary key constraint or unique index identified by 'UNIQUETABLE' defined on 'TBLS'.
at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
.....
[DataSink (/user/hive/warehouse/partitiontest_flink) (2/2)] ERROR org.apache.hadoop.hive.metastore.RetryingHMSHandler  - AlreadyExistsException(message:Table partitiontest_flink already exists)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1370)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1447)
.....
ERROR org.apache.flink.runtime.operators.DataSinkTask  - Error in user code: java.io.IOException: AlreadyExistsException(message:Table partitiontest_flink already exists):  DataSink (/user/hive/warehouse/partitiontest_flink)
(2/2)
cascading.CascadingException: java.io.IOException: AlreadyExistsException(message:Table partitiontest_flink already exists)
at cascading.tap.hive.HivePartitionTap$HivePartitionCollector.closeCollector(HivePartitionTap.java:156)
at cascading.tap.partition.BasePartitionTap$PartitionCollector.close(BasePartitionTap.java:188)

I am using below version of jar : 

cascading          :  3.1.0
cascading-flink :  0.1
hadoop              :  2.6.0
hive-exec          :  1.2.0

Can you help me to debug this issue.

Thanks
Santlal Gupta
PartitionTest.java
partitionInput.txt

Ken Krugler

unread,
Oct 12, 2016, 10:51:31 AM10/12/16
to cascadi...@googlegroups.com
Hi Santial,

I haven’t tried the HivePartitionTap with cascading-flink, but based on issues I’d run into with the cascading-flink planner previously, I’d guess there’s some implicit contract between Cascading and a sink tap that’s being violated by the cascading-flink planner.

When I debugged similar issues I had to run locally w/breakpoints set in the tap to figure out what was different between calling patterns with the different planners.

— Ken


--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/545b0662-d29c-454e-af56-ff1a78d17aaa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<PartitionTest.java><partitionInput.txt>

--------------------------
Ken Krugler
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr



santlal gupta

unread,
Oct 13, 2016, 1:01:00 AM10/13/16
to cascading-user
Hi Ken,
 
That's right. 

If you see the input file, then there is only 6 record and 3 partition should be created out of it.  

for your reference i am attaching log file of the job. 

In log i saw create table is called thrice in it. I think as flink process data in memory, so calling create table multiple time might be reason to give SQLIntegrityConstraintViolationException and AlreadyExistsException.

Thanks
Santlal Gupta 
log.txt

Ken Krugler

unread,
Oct 13, 2016, 11:35:15 AM10/13/16
to cascadi...@googlegroups.com
On Oct 12, 2016, at 10:01pm, santlal gupta <santla...@gmail.com> wrote:

Hi Ken,
 
That's right. 

If you see the input file, then there is only 6 record and 3 partition should be created out of it.  

for your reference i am attaching log file of the job. 

Well, the first error I see is:

25512 [flink-akka.actor.default-dispatcher-5] ERROR org.apache.flink.runtime.instance.Hardware  - Cannot determine the size of the physical memory for Windows host (using 'wmic memorychip'): Cannot run program "wmic": CreateProcess error=2, The system cannot find the file specified
java.io.IOException: Cannot run program "wmic": CreateProcess error=2, The system cannot find the file specified

So looks like you’re trying to run this on a Windows machine - if so, I’d suggest trying with Linux.

After that the next error is:

Caused by: ERROR 23505: The statement was aborted because it would have caused a duplicate key value in a unique or primary key constraint or unique index identified by 'UNIQUETABLE' defined on 'TBLS'.
at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)

This is where Hive is trying to create the metadata entry for one of the partitions.

But someone with more background in Hive would need to help debug this, I haven’t used the HivePartitionTap.

— Ken


In log i saw create table is called thrice in it. I think as flink process data in memory, so calling create table multiple time might be reason to give SQLIntegrityConstraintViolationException and AlreadyExistsException.

Thanks
Santlal Gupta 

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/cascading-user.

For more options, visit https://groups.google.com/d/optout.
<log.txt>

Ken Krugler

unread,
Oct 13, 2016, 11:42:49 AM10/13/16
to cascadi...@googlegroups.com
On Oct 12, 2016, at 10:01pm, santlal gupta <santla...@gmail.com> wrote:

Hi Ken,
 
That's right. 

If you see the input file, then there is only 6 record and 3 partition should be created out of it.  

for your reference i am attaching log file of the job. 
Well, the first error I see is:
25512 [flink-akka.actor.default-dispatcher-5] ERROR org.apache.flink.runtime.instance.Hardware  - Cannot determine the size of the physical memory for Windows host (using 'wmic memorychip'): Cannot run program "wmic": CreateProcess error=2, The system cannot find the file specified
java.io.IOException: Cannot run program "wmic": CreateProcess error=2, The system cannot find the file specified

So looks like you’re trying to run this on a Windows machine - if so, I’d suggest trying with Linux.

After that the next error is:

Caused by: ERROR 23505: The statement was aborted because it would have caused a duplicate key value in a unique or primary key constraint or unique index identified by 'UNIQUETABLE' defined on 'TBLS'.
at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)

This is where Hive is trying to create the metadata entry for one of the partitions.

But someone with more background in 
In log i saw create table is called thrice in it. I think as flink process data in memory, so calling create table multiple time might be reason to give SQLIntegrityConstraintViolationException and AlreadyExistsException.

Thanks
Santlal Gupta 
--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/cascading-user.

For more options, visit https://groups.google.com/d/optout.
<log.txt>
Reply all
Reply to author
Forward
0 new messages