Insert Rows in existing table(append)

19 views
Skip to first unread message

Runjhun Gaur

unread,
Nov 8, 2017, 4:28:10 AM11/8/17
to cascading-user
Hi,

I want to insert rows in existing hive table using hive flow, cascading.
It should not overwrite the existing data of the hive table.

I ran below query in hive terminal, it works:
insert into targetTableDb. targetTable  select  *  from  srcTableDb.srcTable 

However , when i run the same query using hive flow.

I am getting error  "org.apache.hadoop.mapred.FileAlreadyExistsException" for targetTable.
targetTable should be there, I need to insert rows in it.

I have attached the code snippet.
Please help!!
 

InsertIntoHiveTable.java

Ken Krugler

unread,
Nov 8, 2017, 9:25:20 AM11/8/17
to cascadi...@googlegroups.com
Stack trace?

> I have attached the code snippet.

You’re using SinkMode.UPDATE, so the typical problem isn’t there. I assume you tried KEEP as well, and had the same problem?

With the stack trace, Chris Wensel could probably help.

— Ken


public static void insertIntoTable(RunEnvironment env, Properties properties, String targetTableDb, String targetTable,
String srcTableDb, String srcTable) throws Exception {

final String[] hiveFlowQuery = {"insert into " + targetTableDb + "." + targetTable + " select * from " +
srcTableDb +"."+srcTable };

LOGGER.info("Insert query.." + hiveFlowQuery);

final ETLTableSchema etlTableSchemaSink = ETLSchemaHelper.getInstance().getTableSchema(srcTableDb, srcTable);
final Hadoop2MR1FlowConnector flowConnector = new Hadoop2MR1FlowConnector(properties);
final HiveTap tapTempsource = TapFactory.getHiveTap(srcTableDb, srcTable);

final Pipe pipe = new Pipe("pipe_partition");
//have tried using both UPDATE & KEEP mode
final HiveTap sinkTap = TapFactory.getHiveTapSink(targetTableDb, targetTable,
etlTableSchemaSink.getFields(), etlTableSchemaSink.getTypes(), SinkMode.UPDATE);
/*final HiveTap sinkTap = TapFactory.getHiveTapSink(targetTableDb, targetTable,
etlTableSchemaSink.getFields(), etlTableSchemaSink.getTypes(), SinkMode.KEEP);*/

final HiveFlow flow2 =
new HiveFlow(hiveflow, hiveFlowQuery, Arrays.<Tap> asList(tapTempsource), sinkTap);

final FlowDef flowdef2 = FlowDef.flowDef();

flowdef2.addSource(pipe, tapTempsource).addTailSink(pipe, sinkTap);

final Flow<?> flow3 = flowConnector.connect(flowdef2);
flow2.complete();
flow3.complete();
}



--------------------------------------------
http://about.me/kkrugler
+1 530-210-6378

Runjhun Gaur

unread,
Nov 9, 2017, 12:24:40 AM11/9/17
to cascadi...@googlegroups.com
Hi Ken,

Thanks for the prompt response.

I have figured out the solution, I was using 2 flows in the code.
Only 1 flow ie. flow2 is suffice, which runs the hive flow.
It is working using UPDATE mode.

Thanks,
Runjhun


--
You received this message because you are subscribed to a topic in the Google Groups "cascading-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cascading-user/s30k2wykofs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cascading-user+unsubscribe@googlegroups.com.
To post to this group, send email to cascading-user@googlegroups.com.
Visit this group at https://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/BFE388A1-5DDA-43B2-B067-3FB0EEADBF6E%40krugler.org.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages