incorrect data while using hiveTap with combinedInput = true

22 views
Skip to first unread message

Ajit Mehra

unread,
Nov 17, 2017, 2:05:40 AM11/17/17
to cascading-user
Hi

I am using hive tap as source and sink in cascading flow, there is no pipe level operation. Just copying data from one table to another. In sink table, I have observed few records missed. I have tested it against two different tables.

Table 1.
source count : 2032472
sink count : 2032472

I have a unique column in table. 2 records missed in sink. Counts for source/sink are same because other 2 records got duplicated.

Table 2.
source count: 20607550
sink count: 20607550

38 records missed in this case.


Everything works fine without combineInput.

Is there anything that I need to add while using combinedInput??

Chris K Wensel

unread,
Nov 17, 2017, 11:55:49 AM11/17/17
to cascadi...@googlegroups.com
if you can provide a test case, that would be great.


the combined input format code/feature was contributed by Twitter, so I’m surprised this is an issue. Hopefully they can chime in after seeing a test case.

ckw

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/11fb389e-684e-49d9-8974-5d99080979b8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Reply all
Reply to author
Forward
0 new messages