Otis
On Dec 12, 6:06 pm, John Cohen <john.java.w...@gmail.com> wrote:
> Hi Alex,
> Looking into this subject now. Nothing in code yet. Keep you posted
>
> On Mon, Dec 12, 2011 at 6:03 PM, Alex Baranau <alex.barano...@gmail.com>wrote:
>
>
>
>
>
>
>
> > Hi John,
>
> > Yeah, adding standard "update processors" like MIN, MAX, etc. is the
> > natural next step. It may require some APIs changes to make that really
> > flexible & usable. I have in mind (and in my notes) some ideas of how to do
> > that in a best way. But for sake of not pushing my way of thinking, I'd
> > love to listen your ideas first: in what form it is best to have them in
> > API, any use-case (with the pseudo-code/code involved) would be great to
> > look at.
>
> > Alex.
>
Hi Mike!Yeah, not much was done since then as this part of HBaseHUT wasn't in my main focus. Would be really great if you can participate. Please find some thoughts below and let me know what you think.1. How?What we need to do with aggregating functions I think boils down to the following:* Implement UpdateProcessor that can take list of agg functions and apply them to columns
* Implement aggregation functions. I'd start with basic ones: max/min/avg/sum/count and then added more complex: percentile, disctinct #, etc.
Basically, the use-case is the following (unless you have smth different/more specific in mind). There are bunch of input records with columns that have e.g. numeric values, say row_key1=>{column1=5.5, column2=14.3}, row_key2=>{column1=5.5, column2=14.3}. There might be multiple input records with the same key, based on which stored data should be updated. For each key we need to keep aggregated values. So:1) with HBaseHUT we replacing updates with appends (simple puts) which makes writing much faster2) with the help of agg functions library (implementing which is in the focus here) needed aggregates should be calculated2. Where to start?There is olap-agg branch that was created specifically for this work. Feel free to checkout and start from there.There's a MaxFunction implementation there and unit-test for it. It isn't quite what we are looking for (we need to change it towards above points), but is a good place to start looking at. Will give you idea of HBaseHUT usage.
After short discussion we will add issue(s) to github issue-tracking system.