Kite Dataset sink to handle updates

10 views
Skip to first unread message

Buntu Dev

unread,
Mar 20, 2015, 3:19:21 PM3/20/15
to cdk...@cloudera.org
I'm using the Kite Dataset sink with Flume (CDH 5.3.2) and wanted to know if there is any way to update an event that is already written to the dataset via Flume?

I know Hive introduced the ACID support in v0.14 but don't know how this can be done via morphlines or Kite dataset sink.

Thanks!

Joey Echeverria

unread,
Mar 20, 2015, 4:02:27 PM3/20/15
to Buntu Dev, cdk...@cloudera.org
Unfortunately this isn't available for HDFS-based datasets (including
Hive). Hive's update support only works with Orc files and as far as I
know it requires that you issue an UPDATE SQL query to work.

In general, how do you see your process working? How do you know a
Flume event is an update versus a new record?
> --
> You received this message because you are subscribed to the Google Groups
> "CDK Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cdk-dev+u...@cloudera.org.
> For more options, visit https://groups.google.com/a/cloudera.org/d/optout.



--
Joey Echeverria
Senior Infrastructure Engineer

Buntu Dev

unread,
Mar 20, 2015, 4:08:17 PM3/20/15
to Joey Echeverria, cdk...@cloudera.org
These events are not live events as such but rather generated daily/weekly for new or existing set of users that will be streamed to Kafka->Flume->Kite dataset sink. Based on the user's registration date, we can identify if the user is new or is an existing user.
Reply all
Reply to author
Forward
0 new messages