Implementing RandomBatchAccessDataset to support batch operations in Kite

7 views
Skip to first unread message

Prasanna Rajaperumal

unread,
Apr 24, 2015, 7:11:19 PM4/24/15
to cdk...@cloudera.org, Adam Warrington
Hello,

My team has a need for doing batch puts to HBase. Much of the code for this is already in HBase implementation, but relevant interfaces are not present. I was planning to implement this in Kite. 
Here is a proposal (Interface with documentation) on what I was thinking. Would really appreciate if someone from here looked at it and gave their opinions. 


I am particularly not sure about failure semantics for a batch put operation. Current semantics would mean that records until the error would be flushed to the dataset. Is it okay not to provide ALL or NOTHING transaction semantics for a batch operation like this.  

-Prasanna

Prasanna Rajaperumal

unread,
Apr 30, 2015, 5:28:16 PM4/30/15
to cdk...@cloudera.org
Ping. Did anyone had a chance to look at this?
Thanks.

- Prasanna

Ryan Blue

unread,
May 11, 2015, 2:07:34 PM5/11/15
to Prasanna Rajaperumal, cdk...@cloudera.org, Adam Warrington
Hey Prasanna,

I've been out for the last week and a half, sorry about that. I'll take
a look shortly.

rb
> --
> You received this message because you are subscribed to the Google
> Groups "CDK Development" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to cdk-dev+u...@cloudera.org
> <mailto:cdk-dev+u...@cloudera.org>.
> For more options, visit https://groups.google.com/a/cloudera.org/d/optout.


--
Ryan Blue
Software Engineer
Cloudera, Inc.

Ryan Blue

unread,
May 11, 2015, 2:14:38 PM5/11/15
to Prasanna Rajaperumal, cdk...@cloudera.org, Adam Warrington
On 04/24/2015 04:11 PM, Prasanna Rajaperumal wrote:
> Hello,
>
> My team has a need for doing batch puts to HBase. Much of the code for
> this is already in HBase implementation, but relevant interfaces are not
> present. I was planning to implement this in Kite.
> Here is a proposal (Interface with documentation) on what I was
> thinking. Would really appreciate if someone from here looked at it and
> gave their opinions.
>
> https://gist.github.com/prazanna/b5d5d7e248b2076dfe30

I like the put and get methods, but I think the delete could be a little
more clear. What about delete(Iterable<E> entities) and
deleteByKey(Iterable<Key> keys)?

> I am particularly not sure about failure semantics for a batch put
> operation. Current semantics would mean that records until the error
> would be flushed to the dataset. Is it okay not to provide ALL or
> NOTHING transaction semantics for a batch operation like this.
>
> -Prasanna

I like how your `put` operation returns an iterable of booleans to
indicate whether the individual put operations succeeded. As long as we
can satisfy that API, I think the partial success makes sense. Might
need to do the same thing for the delete variants.

rb
Reply all
Reply to author
Forward
0 new messages