Query abstraction.

20 views
Skip to first unread message

John Cohen

unread,
Nov 25, 2011, 6:05:56 PM11/25/11
to HBaseHUT - HBase High Update Throughput
I would be nice to have a query abstraction on top by using a simple
xml structure. A way to define:

(1) the event for which a counter increment would get triggered
(2) the tables/fields/counter that you are going to increment

Use case: scan log files (or other Scribe, Rest, etc), find the
information of interest, increment a counter in HBase.

Alex Baranau

unread,
Nov 30, 2011, 1:46:32 PM11/30/11
to HBaseHUT - HBase High Update Throughput
Hello John,

First, thanks for your interest in the project!

Not sure I fully follow your suggestion. Are you talking about the
better way of defining the records updating logic? Or may be your are
talking about the higher level wrapping which can be built on top of
HBaseHUT and could be used for processing data (incl. streaming)?
Could you please "draw" (i.e. describe) a higher level picture (with
data flow) with HBaseHUT tool/lib in particular place in it.

Alex Baranau

John Cohen

unread,
Nov 30, 2011, 2:26:50 PM11/30/11
to hbas...@googlegroups.com
Hi I'm working in a different things trying to research and implement real-time analytics.
I was looking at you project as a way to defer the write to HBase, and see if you also thinking in streaming, The use case could be like this:  you have data coming to HDFS, big file, so the time associated to the network traffic is big too.  This means that the data will not be available for processing until the file is completed transferred.  Another way of doing this is using HDFS's API to stream the data blocks as soon as they array and sending these blocks strait to HBaseHUT for aggregation  (counters for each flag/token  found on the file that you are after).
Does this sound reasonable?

thanks
--john
Reply all
Reply to author
Forward
0 new messages