Query abstraction.

John Cohen

unread,

Nov 25, 2011, 6:05:56 PM11/25/11

to HBaseHUT - HBase High Update Throughput

I would be nice to have a query abstraction on top by using a simple
xml structure. A way to define:

(1) the event for which a counter increment would get triggered
(2) the tables/fields/counter that you are going to increment

Use case: scan log files (or other Scribe, Rest, etc), find the
information of interest, increment a counter in HBase.

Alex Baranau

unread,

Nov 30, 2011, 1:46:32 PM11/30/11

to HBaseHUT - HBase High Update Throughput

Hello John,

First, thanks for your interest in the project!

Not sure I fully follow your suggestion. Are you talking about the
better way of defining the records updating logic? Or may be your are
talking about the higher level wrapping which can be built on top of
HBaseHUT and could be used for processing data (incl. streaming)?
Could you please "draw" (i.e. describe) a higher level picture (with
data flow) with HBaseHUT tool/lib in particular place in it.

Alex Baranau

John Cohen

unread,

Nov 30, 2011, 2:26:50 PM11/30/11

to hbas...@googlegroups.com

Hi I'm working in a different things trying to research and implement real-time analytics.

I was looking at you project as a way to defer the write to HBase, and see if you also thinking in streaming, The use case could be like this: you have data coming to HDFS, big file, so the time associated to the network traffic is big too. This means that the data will not be available for processing until the file is completed transferred. Another way of doing this is using HDFS's API to stream the data blocks as soon as they array and sending these blocks strait to HBaseHUT for aggregation (counters for each flag/token found on the file that you are after).

Does this sound reasonable?

thanks

--john

Reply all

Reply to author

Forward