Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

69 views

Skip to first unread message

Alex Baranau

unread,

May 19, 2011, 9:14:33 AM5/19/11

to us...@hbase.apache.org, d...@hbase.apache.org, hba...@googlegroups.com

Implemented RowKeyDistributorByHashPrefix. From README:

Another useful RowKeyDistributor is RowKeyDistributorByHashPrefix. Please see
example below. It creates "distributed key" based on original key value
so that later when you have original key and want to update the record you can
calculate distributed key without roundtrip to HBase.

AbstractRowKeyDistributor keyDistributor =
        new RowKeyDistributorByHashPrefix(
                  new RowKeyDistributorByHashPrefix.OneByteSimpleHash(15));

You can use your own hashing logic here by implementing simple interface:

public static interface Hasher extends Parametrizable {
byte[] getHashPrefix(byte[] originalKey);
byte[][] getAllPossiblePrefixes();
}

OneByteSimpleHash implements very simple hash algorythm: simple sum of all bytes in row key % maxBuckets. In example above 15 is maxBuckets count. You can use buckets count # up to 255. Please, use wisely, as (the same thing as with byOneByte prefix) Disctributed scanner will instantiate this number of scans under the hood.

With this row key hash-based distributor, you can find out the distributed key (and use it to update the record) without roundtrip to HBase. From unit-test:

    // Testing simple get
    byte[] originalKey = new byte[] {123, 124, 122};
    Put put = new Put(keyDistributor.getDistributedKey(originalKey));
    put.add(CF, QUAL, Bytes.toBytes("some"));
    hTable.put(put);

    byte[] distributedKey = keyDistributor.getDistributedKey(originalKey);
    Result result = hTable.get(new Get(distributedKey));
    Assert.assertArrayEquals(originalKey, keyDistributor.getOriginalKey(result.getRow()));
    Assert.assertArrayEquals(Bytes.toBytes("some"), result.getValue(CF, QUAL));

NOTE: This feature is included in hbasewd-0.1.0-SNAPSHOT-2011.05.19.jar (downloadable from https://github.com/sematext/HBaseWD)

Alex Baranau
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase

P.S.
> Can you summarize HBaseWD in your blog
That is on my todo list! You pushed it higher to the top priority items ;)

On Thu, May 19, 2011 at 6:50 AM, Weishung Chung <weis...@gmail.com> wrote:

I have another question about option 2. It seems like I need to handle the
distributed scan differently to read from start row to end row, assuming 1
byte hash of the original key is used as prefix since the order of the
original key range is different from the resulting distributed key range.

On Wed, May 18, 2011 at 6:18 PM, Ted Yu <yuzh...@gmail.com> wrote:

> Alex:
> Can you summarize HBaseWD in your blog, including points 1 and 2 below ?
>
> Thanks
>
> On Wed, May 18, 2011 at 8:03 AM, Alex Baranau <alex.ba...@gmail.com
> >wrote:
>
> > There are several options here. E.g.:
> >
> > 1) Given that you have "original key" of the record, you can fetch the
> > stored record key from HBase and use it to create Put with updated (or
> new)
> > cells.
> >
> > Currently you'll need to use distributes scan for that, there's not
> > analogue
> > for Get operation yet (see https://github.com/sematext/HBaseWD/issues/1
> ).
> >
> > Note: you need to first find out the real key of stored record by
> fetching
> > data from HBase in case you use included in current lib
> > RowKeyDistributorByOneBytePrefix. Alternatively, see next option:
> >
> > 2) You can create your own RowKeyDistributor implementation which will
> > create "distributed key" based on original key value so that later when
> you
> > have original key and want to update the record you can calculate
> > distributed key without roundtrip to HBase.
> >
> > E.g. your RowKeyDistributor implementation you can calculate 1-byte hash
> of
> > original key (https://github.com/sematext/HBaseWD/issues/2).
> >
> >
> >
> > In either way you don't need to delete record to update some cells of it
> or
> > add new cells.
> >
> > Please let me know if you have more Qs!
> >
> > Alex Baranau
> > ----
> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
> HBase
> >
> > On Wed, May 18, 2011 at 1:19 AM, Weishung Chung <weis...@gmail.com>
> > wrote:
> >
> > > I have another question. For overwriting, do I need to delete the
> > existing
> > > one before re-writing it?
> > >
> > > On Sat, May 14, 2011 at 10:17 AM, Weishung Chung <weis...@gmail.com>
> > > wrote:
> > >
> > > > Yes, it's simple yet useful. I am integrating it. Thanks alot :)
> > > >
> > > >
> > > > On Fri, May 13, 2011 at 3:12 PM, Alex Baranau <
> > alex.ba...@gmail.com
> > > >wrote:
> > > >
> > > >> Thanks for the interest!
> > > >>
> > > >> We are using it in production. It is simple and hence quite stable.
> > > Though
> > > >> some minor pieces are missing (like
> > > >> https://github.com/sematext/HBaseWD/issues/1) this doesn't affect
> > > >> stability
> > > >> and/or major functionality.
> > > >>
> > > >> Alex Baranau
> > > >> ----
> > > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop
> -
> > > >> HBase
> > > >>
> > > >> On Fri, May 13, 2011 at 10:45 AM, Weishung Chung <
> weis...@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > What's the status on this package? Is it mature enough?
> > > >> > I am using it in my project, tried out the write method yesterday
> > and
> > > >> > going
> > > >> > to incorporate into read method tomorrow.
> > > >> >
> > > >> > On Wed, May 11, 2011 at 3:41 PM, Alex Baranau <
> > > alex.ba...@gmail.com
> > > >> > >wrote:
> > > >> >
> > > >> > > > The start/end rows may be written twice.
> > > >> > >
> > > >> > > Yeah, I know. I meant that size of startRow+stopRow data is
> > > "bearable"
> > > >> in
> > > >> > > attribute value no matter how long are they (keys), since we
> > already
> > > >> OK
> > > >> > > with
> > > >> > > transferring them initially (i.e. we should be OK with
> > transferring
> > > 2x
> > > >> > > times
> > > >> > > more).
> > > >> > >
> > > >> > > So, what about the suggestion of sourceScan attribute value I
> > > >> mentioned?
> > > >> > If
> > > >> > > you can tell why it isn't sufficient in your case, I'd have more
> > > info
> > > >> to
> > > >> > > think about better suggestion ;)
> > > >> > >
> > > >> > > > It is Okay to keep all versions of your patch in the JIRA.
> > > >> > > > Maybe the second should be named HBASE-3811-v2.patch<
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
> > > >> > > >?
> > > >> > >
> > > >> > > np. Can do that. Just thought that they (patches) can be sorted
> by
> > > >> date
> > > >> > to
> > > >> > > find out the final one (aka "convention over naming-rules").
> > > >> > >
> > > >> > > Alex.
> > > >> > >
> > > >> > > On Wed, May 11, 2011 at 11:13 PM, Ted Yu <yuzh...@gmail.com>
> > > wrote:
> > > >> > >
> > > >> > > > >> Though it might be ok, since we anyways "transfer"
> start/stop
> > > >> rows
> > > >> > > with
> > > >> > > > Scan object.
> > > >> > > > In write() method, we now have:
> > > >> > > > Bytes.writeByteArray(out, this.startRow);
> > > >> > > > Bytes.writeByteArray(out, this.stopRow);
> > > >> > > > ...
> > > >> > > > for (Map.Entry<String, byte[]> attr :
> > > >> this.attributes.entrySet())
> > > >> > {
> > > >> > > > WritableUtils.writeString(out, attr.getKey());
> > > >> > > > Bytes.writeByteArray(out, attr.getValue());
> > > >> > > > }
> > > >> > > > The start/end rows may be written twice.
> > > >> > > >
> > > >> > > > Of course, you have full control over how to generate the
> unique
> > > ID
> > > >> for
> > > >> > > > "sourceScan" attribute.
> > > >> > > >
> > > >> > > > It is Okay to keep all versions of your patch in the JIRA.
> Maybe
> > > the
> > > >> > > second
> > > >> > > > should be named HBASE-3811-v2.patch<
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
> > > >> > > >?
> > > >> > > >
> > > >> > > > Thanks
> > > >> > > >
> > > >> > > >
> > > >> > > > On Wed, May 11, 2011 at 1:01 PM, Alex Baranau <
> > > >> > alex.ba...@gmail.com
> > > >> > > >wrote:
> > > >> > > >
> > > >> > > >> > Can you remove the first version ?
> > > >> > > >> Isn't it ok to keep it in JIRA issue?
> > > >> > > >>
> > > >> > > >>
> > > >> > > >> > In HBaseWD, can you use reflection to detect whether Scan
> > > >> supports
> > > >> > > >> setAttribute() ?
> > > >> > > >> > If it does, can you encode start row and end row as
> > > "sourceScan"
> > > >> > > >> attribute ?
> > > >> > > >>
> > > >> > > >> Yeah, smth like this is going to be implemented. Though I'd
> > still
> > > >> want
> > > >> > > to
> > > >> > > >> hear from the devs the story about Scan version.
> > > >> > > >>
> > > >> > > >>
> > > >> > > >> > One consideration is that start row or end row may be quite
> > > long.
> > > >> > > >>
> > > >> > > >> Yeah, that is was my though too at first. Though it might be
> > ok,
> > > >> since
> > > >> > > we
> > > >> > > >> anyways "transfer" start/stop rows with Scan object.
> > > >> > > >>
> > > >> > > >> > What do you think ?
> > > >> > > >>
> > > >> > > >> I'd love to hear from you is this variant I mentioned is what
> > we
> > > >> are
> > > >> > > >> looking at here:
> > > >> > > >>
> > > >> > > >>
> > > >> > > >> > From what I understand, you want to distinguish scans fired
> > by
> > > >> the
> > > >> > > same
> > > >> > > >> distributed scan.
> > > >> > > >> > I.e. group scans which were fired by single distributed
> scan.
> > > If
> > > >> > > that's
> > > >> > > >> what you want, distributed
> > > >> > > >> > scan can generate unique ID and set, say "sourceScan"
> > attribute
> > > >> to
> > > >> > its
> > > >> > > >> value. This way we'll
> > > >> > > >> > have <# of distinct "sourceScan" attribute values> =
> <number
> > of
> > > >> > > >> distributed scans invoked by
> > > >> > > >> > client side> and two scans on server side will have the
> same
> > > >> > > >> "sourceScan" attribute iff they
> > > >> > > >> > "belong" to same distributed scan.
> > > >> > > >>
> > > >> > > >>
> > > >> > > >> Alex Baranau
> > > >> > > >> ----
> > > >> > > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> > > Hadoop
> > > >> -
> > > >> > > >> HBase
> > > >> > > >>
> > > >> > > >> On Wed, May 11, 2011 at 5:15 PM, Ted Yu <yuzh...@gmail.com
> >
> > > >> wrote:
> > > >> > > >>
> > > >> > > >>> Alex:
> > > >> > > >>> Your second patch looks good.
> > > >> > > >>> Can you remove the first version ?
> > > >> > > >>>
> > > >> > > >>> In HBaseWD, can you use reflection to detect whether Scan
> > > supports
> > > >> > > >>> setAttribute() ?
> > > >> > > >>> If it does, can you encode start row and end row as
> > "sourceScan"
> > > >> > > >>> attribute ?
> > > >> > > >>>
> > > >> > > >>> One consideration is that start row or end row may be quite
> > > long.
> > > >> > > >>> Ideally we should store hash code of source Scan object as
> > > >> > "sourceScan"
> > > >> > > >>> attribute. But Scan doesn't implement hashCode(). We can add
> > it,
> > > >> that
> > > >> > > would
> > > >> > > >>> require running all Scan related tests.
> > > >> > > >>>
> > > >> > > >>> What do you think ?
> > > >> > > >>>
> > > >> > > >>> Thanks
> > > >> > > >>>
> > > >> > > >>>
> > > >> > > >>> On Tue, May 10, 2011 at 5:46 AM, Alex Baranau <
> > > >> > > alex.ba...@gmail.com>wrote:
> > > >> > > >>>
> > > >> > > >>>> Sorry for the delay in response (public holidays here).
> > > >> > > >>>>
> > > >> > > >>>> This depends on what info you are looking for on server
> side.
> > > >> > > >>>>
> > > >> > > >>>> From what I understand, you want to distinguish scans fired
> > by
> > > >> the
> > > >> > > same
> > > >> > > >>>> distributed scan. I.e. group scans which were fired by
> single
> > > >> > > distributed
> > > >> > > >>>> scan. If that's what you want, distributed scan can
> generate
> > > >> unique
> > > >> > ID
> > > >> > > and
> > > >> > > >>>> set, say "sourceScan" attribute to its value. This way
> we'll
> > > have
> > > >> <#
> > > >> > > of
> > > >> > > >>>> distinct "sourceScan" attribute values> = <number of
> > > distributed
> > > >> > scans
> > > >> > > >>>> invoked by client side> and two scans on server side will
> > have
> > > >> the
> > > >> > > same
> > > >> > > >>>> "sourceScan" attribute iff they "belong" to same
> distributed
> > > >> scan.
> > > >> > > >>>>
> > > >> > > >>>> Is this what are you looking for?
> > > >> > > >>>>
> > > >> > > >>>> Alex Baranau
> > > >> > > >>>>
> > > >> > > >>>> P.S. attached patch for HBASE-3811<
> > > >> > > https://issues.apache.org/jira/browse/HBASE-3811>
> > > >> > > >>>> .
> > > >> > > >>>> P.S-2. should this conversation be moved to dev list?
> > > >> > > >>>>
> > > >> > > >>>> ----
> > > >> > > >>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> -
> > > >> Hadoop
> > > >> > -
> > > >> > > >>>> HBase
> > > >> > > >>>>
> > > >> > > >>>> On Fri, May 6, 2011 at 12:06 AM, Ted Yu <
> yuzh...@gmail.com
> > >
> > > >> > wrote:
> > > >> > > >>>>
> > > >> > > >>>>> Alex:
> > > >> > > >>>>> What type of identification should we put in the map of
> the
> > > Scan
> > > >> > > object
> > > >> > > >>>>> ?
> > > >> > > >>>>> I am thinking of using the Id of RowKeyDistributor. But
> the
> > > user
> > > >> > can
> > > >> > > >>>>> use same distributor on multiple scans.
> > > >> > > >>>>>
> > > >> > > >>>>> Please share your thought.
> > > >> > > >>>>>
> > > >> > > >>>>>
> > > >> > > >>>>> On Thu, Apr 21, 2011 at 8:32 AM, Alex Baranau <
> > > >> > > >>>>> alex.ba...@gmail.com> wrote:
> > > >> > > >>>>>
> > > >> > > >>>>>> https://issues.apache.org/jira/browse/HBASE-3811
> > > >> > > >>>>>>
> > > >> > > >>>>>> Alex Baranau
> > > >> > > >>>>>> ----
> > > >> > > >>>>>> Sematext :: http://sematext.com/ :: Solr - Lucene -
> Nutch
> > -
> > > >> > Hadoop
> > > >> > > -
> > > >> > > >>>>>> HBase
> > > >> > > >>>>>>
> > > >> > > >>>>>> On Thu, Apr 21, 2011 at 5:57 PM, Ted Yu <
> > yuzh...@gmail.com
> > > >
> > > >> > > wrote:
> > > >> > > >>>>>>
> > > >> > > >>>>>> > My plan was to make regions that have active scanners
> > more
> > > >> > stable
> > > >> > > -
> > > >> > > >>>>>> trying
> > > >> > > >>>>>> > not to move them when balancing.
> > > >> > > >>>>>> > I prefer second approach - adding custom attribute(s)
> to
> > > Scan
> > > >> so
> > > >> > > >>>>>> that the
> > > >> > > >>>>>> > Scans created by the method below can be 'grouped'.
> > > >> > > >>>>>> >
> > > >> > > >>>>>> > If you can file a JIRA, that would be great.
> > > >> > > >>>>>> >
> > > >> > > >>>>>> > On Thu, Apr 21, 2011 at 7:23 AM, Alex Baranau <
> > > >> > > >>>>>> alex.ba...@gmail.com
> > > >> > > >>>>>> > >wrote:
> > > >> > > >>>>>> >
> > > >> > > >>>>>> > > Aha, so you want to "count" it as single scan (or
> just
> > > >> > > >>>>>> differently) when
> > > >> > > >>>>>> > > determining the load?
> > > >> > > >>>>>> > >
> > > >> > > >>>>>> > > The current code looks like this:
> > > >> > > >>>>>> > >
> > > >> > > >>>>>> > > class DistributedScanner:
> > > >> > > >>>>>> > > public static DistributedScanner create(HTable
> hTable,
> > > >> Scan
> > > >> > > >>>>>> original,
> > > >> > > >>>>>> > > AbstractRowKeyDistributor keyDistributor) throws
> > > >> IOException {
> > > >> > > >>>>>> > > byte[][] startKeys =
> > > >> > > >>>>>> > >
> > > >> keyDistributor.getAllDistributedKeys(original.getStartRow());
> > > >> > > >>>>>> > > byte[][] stopKeys =
> > > >> > > >>>>>> > >
> > > >> keyDistributor.getAllDistributedKeys(original.getStopRow());
> > > >> > > >>>>>> > > Scan[] scans = new Scan[startKeys.length];
> > > >> > > >>>>>> > > for (byte i = 0; i < startKeys.length; i++) {
> > > >> > > >>>>>> > > scans[i] = new Scan(original);
> > > >> > > >>>>>> > > scans[i].setStartRow(startKeys[i]);
> > > >> > > >>>>>> > > scans[i].setStopRow(stopKeys[i]);
> > > >> > > >>>>>> > > }
> > > >> > > >>>>>> > >
> > > >> > > >>>>>> > > ResultScanner[] rss = new
> > > >> ResultScanner[startKeys.length];
> > > >> > > >>>>>> > > for (byte i = 0; i < scans.length; i++) {
> > > >> > > >>>>>> > > rss[i] = hTable.getScanner(scans[i]);
> > > >> > > >>>>>> > > }
> > > >> > > >>>>>> > >
> > > >> > > >>>>>> > > return new DistributedScanner(rss);
> > > >> > > >>>>>> > > }
> > > >> > > >>>>>> > >
> > > >> > > >>>>>> > > This is client code. To make these scans
> "identifiable"
> > > we
> > > >> > need
> > > >> > > to
> > > >> > > >>>>>> either
> > > >> > > >>>>>> > > use some different (derived from Scan) class or add
> > some
> > > >> > > attribute
> > > >> > > >>>>>> to
> > > >> > > >>>>>> > them.
> > > >> > > >>>>>> > > There's no API for doing the latter. But we can do
> the
> > > >> former,
> > > >> > > but
> > > >> > > >>>>>> I
> > > >> > > >>>>>> > don't
> > > >> > > >>>>>> > > really like the idea of creating extra class (with no
> > > extra
> > > >> > > >>>>>> > functionality)
> > > >> > > >>>>>> > > just to distinguish it from the base one.
> > > >> > > >>>>>> > >
> > > >> > > >>>>>> > > If you can share why/how do you want to treat them
> > > >> differently
> > > >> > > on
> > > >> > > >>>>>> server
> > > >> > > >>>>>> > > side, that would be helpful.
> > > >> > > >>>>>> > >
> > > >> > > >>>>>> > > Alex Baranau
> > > >> > > >>>>>> > > ----
> > > >> > > >>>>>> > > Sematext :: http://sematext.com/ :: Solr - Lucene -
> > > Nutch
> > > >> -
> > > >> > > >>>>>> Hadoop -
> > > >> > > >>>>>> > HBase
> > > >> > > >>>>>> > >
> > > >> > > >>>>>> > > On Thu, Apr 21, 2011 at 4:58 PM, Ted Yu <
> > > >> yuzh...@gmail.com>
> > > >> > > >>>>>> wrote:
> > > >> > > >>>>>> > >
> > > >> > > >>>>>> > > > My request would be to make the distributed scan
> > > >> > identifiable
> > > >> > > >>>>>> from
> > > >> > > >>>>>> > server
> > > >> > > >>>>>> > > > side.
> > > >> > > >>>>>> > > > :-)
> > > >> > > >>>>>> > > >
> > > >> > > >>>>>> > > > On Thu, Apr 21, 2011 at 5:45 AM, Alex Baranau <
> > > >> > > >>>>>> > alex.ba...@gmail.com
> > > >> > > >>>>>> > > > >wrote:
> > > >> > > >>>>>> > > >
> > > >> > > >>>>>> > > > > > Basically bucketsCount may not equal number of
> > > >> regions
> > > >> > for
> > > >> > > >>>>>> the
> > > >> > > >>>>>> > > > underlying
> > > >> > > >>>>>> > > > > > table.
> > > >> > > >>>>>> > > > >
> > > >> > > >>>>>> > > > > True: e.g. when there's only one region that
> holds
> > > data
> > > >> > for
> > > >> > > >>>>>> the whole
> > > >> > > >>>>>> > > > table
> > > >> > > >>>>>> > > > > (not many records in table yet), distributed scan
> > > will
> > > >> > fire
> > > >> > > N
> > > >> > > >>>>>> scans
> > > >> > > >>>>>> > > > against
> > > >> > > >>>>>> > > > > the same region.
> > > >> > > >>>>>> > > > > On the other hand, in case there are huge number
> of
> > > >> > regions
> > > >> > > >>>>>> for
> > > >> > > >>>>>> > single
> > > >> > > >>>>>> > > > > table, each scan can span over multiple regions.
> > > >> > > >>>>>> > > > >
> > > >> > > >>>>>> > > > > > I need to deal with normal scan and
> "distributed
> > > >> scan"
> > > >> > at
> > > >> > > >>>>>> server
> > > >> > > >>>>>> > > side.
> > > >> > > >>>>>> > > > >
> > > >> > > >>>>>> > > > > With current implementation "distributed" scan
> > won't
> > > be
> > > >> > > >>>>>> recognized as
> > > >> > > >>>>>> > > > > something special on the server side. It will be
> an
> > > >> > ordinary
> > > >> > > >>>>>> scan.
> > > >> > > >>>>>> > > Though
> > > >> > > >>>>>> > > > > the number of scan will increase, given that the
> > > >> typical
> > > >> > > >>>>>> situation is
> > > >> > > >>>>>> > > > "many
> > > >> > > >>>>>> > > > > regions for single table", the scans of the same
> > > >> > > "distributed
> > > >> > > >>>>>> scan"
> > > >> > > >>>>>> > are
> > > >> > > >>>>>> > > > > likely not to hit the same region.
> > > >> > > >>>>>> > > > >
> > > >> > > >>>>>> > > > > Not sure if I answered your questions here. Feel
> > free
> > > >> to
> > > >> > ask
> > > >> > > >>>>>> more ;)
> > > >> > > >>>>>> > > > >
> > > >> > > >>>>>> > > > > Alex Baranau
> > > >> > > >>>>>> > > > > ----
> > > >> > > >>>>>> > > > > Sematext :: http://sematext.com/ :: Solr -
> Lucene
> > -
> > > >> Nutch
> > > >> > -
> > > >> > > >>>>>> Hadoop -
> > > >> > > >>>>>> > > > HBase
> > > >> > > >>>>>> > > > >
> > > >> > > >>>>>> > > > > On Wed, Apr 20, 2011 at 2:10 PM, Ted Yu <
> > > >> > > yuzh...@gmail.com>
> > > >> > > >>>>>> wrote:
> > > >> > > >>>>>> > > > >
> > > >> > > >>>>>> > > > > > Alex:
> > > >> > > >>>>>> > > > > > If you read this, you would know why I asked:
> > > >> > > >>>>>> > > > > >
> https://issues.apache.org/jira/browse/HBASE-3679
> > > >> > > >>>>>> > > > > >
> > > >> > > >>>>>> > > > > > I need to deal with normal scan and
> "distributed
> > > >> scan"
> > > >> > at
> > > >> > > >>>>>> server
> > > >> > > >>>>>> > > side.
> > > >> > > >>>>>> > > > > > Basically bucketsCount may not equal number of
> > > >> regions
> > > >> > for
> > > >> > > >>>>>> the
> > > >> > > >>>>>> > > > underlying
> > > >> > > >>>>>> > > > > > table.
> > > >> > > >>>>>> > > > > >
> > > >> > > >>>>>> > > > > > Cheers
> > > >> > > >>>>>> > > > > >
> > > >> > > >>>>>> > > > > > On Tue, Apr 19, 2011 at 11:11 PM, Alex Baranau
> <
> > > >> > > >>>>>> > > > alex.ba...@gmail.com
> > > >> > > >>>>>> > > > > > >wrote:
> > > >> > > >>>>>> > > > > >
> > > >> > > >>>>>> > > > > > > Hi Ted,
> > > >> > > >>>>>> > > > > > >
> > > >> > > >>>>>> > > > > > > We currently use this tool in the scenario
> > where
> > > >> data
> > > >> > is
> > > >> > > >>>>>> consumed
> > > >> > > >>>>>> > > by
> > > >> > > >>>>>> > > > > > > MapReduce jobs, so we haven't tested the
> > > >> performance
> > > >> > of
> > > >> > > >>>>>> pure
> > > >> > > >>>>>> > > > > "distributed
> > > >> > > >>>>>> > > > > > > scan" (i.e. N scans instead of 1) a lot. I
> > expect
> > > >> it
> > > >> > to
> > > >> > > be
> > > >> > > >>>>>> close
> > > >> > > >>>>>> > to
> > > >> > > >>>>>> > > > > > simple
> > > >> > > >>>>>> > > > > > > scan performance, or may be sometimes even
> > faster
> > > >> > > >>>>>> depending on
> > > >> > > >>>>>> > your
> > > >> > > >>>>>> > > > > data
> > > >> > > >>>>>> > > > > > > access patterns. E.g. in case you write
> > > timeseries
> > > >> > data
> > > >> > > >>>>>> > > (sequential)
> > > >> > > >>>>>> > > > > > which
> > > >> > > >>>>>> > > > > > > is written into the single region at a time,
> > then
> > > >> e.g.
> > > >> > > if
> > > >> > > >>>>>> you
> > > >> > > >>>>>> > > access
> > > >> > > >>>>>> > > > > > delta
> > > >> > > >>>>>> > > > > > > for further processing/analysis (esp. if from
> > not
> > > >> > single
> > > >> > > >>>>>> client)
> > > >> > > >>>>>> > > > these
> > > >> > > >>>>>> > > > > > > scans
> > > >> > > >>>>>> > > > > > > are likely to hit the same region or couple
> of
> > > >> regions
> > > >> > > at
> > > >> > > >>>>>> a time,
> > > >> > > >>>>>> > > > which
> > > >> > > >>>>>> > > > > > may
> > > >> > > >>>>>> > > > > > > perform worse comparing to many scans hitting
> > > data
> > > >> > that
> > > >> > > is
> > > >> > > >>>>>> much
> > > >> > > >>>>>> > > > better
> > > >> > > >>>>>> > > > > > > spread over region servers.
> > > >> > > >>>>>> > > > > > >
> > > >> > > >>>>>> > > > > > > As for map-reduce job the approach should not
> > > >> affect
> > > >> > > >>>>>> reading
> > > >> > > >>>>>> > > > > performance
> > > >> > > >>>>>> > > > > > at
> > > >> > > >>>>>> > > > > > > all: it's just that there are bucketsCount
> > times
> > > >> more
> > > >> > > >>>>>> splits and
> > > >> > > >>>>>> > > > hence
> > > >> > > >>>>>> > > > > > > bucketsCount times more Map tasks. In many
> > cases
> > > >> this
> > > >> > > even
> > > >> > > >>>>>> > improves
> > > >> > > >>>>>> > > > > > overall
> > > >> > > >>>>>> > > > > > > performance of the MR job since work is
> better
> > > >> > > distributed
> > > >> > > >>>>>> over
> > > >> > > >>>>>> > > > cluster
> > > >> > > >>>>>> > > > > > > (esp. in situation when the aim is to
> > constantly
> > > >> > process
> > > >> > > >>>>>> the
> > > >> > > >>>>>> > coming
> > > >> > > >>>>>> > > > > delta
> > > >> > > >>>>>> > > > > > > which usually resides in one or just couple
> of
> > > >> regions
> > > >> > > >>>>>> depending
> > > >> > > >>>>>> > on
> > > >> > > >>>>>> > > > > > > processing frequency).
> > > >> > > >>>>>> > > > > > >
> > > >> > > >>>>>> > > > > > > If you can share details on your case, that
> > will
> > > >> help
> > > >> > to
> > > >> > > >>>>>> > understand
> > > >> > > >>>>>> > > > > what
> > > >> > > >>>>>> > > > > > > effect(s) to expect from using this approach.
> > > >> > > >>>>>> > > > > > >
> > > >> > > >>>>>> > > > > > > Alex Baranau
> > > >> > > >>>>>> > > > > > > ----
> > > >> > > >>>>>> > > > > > > Sematext :: http://sematext.com/ :: Solr -
> > > Lucene
> > > >> -
> > > >> > > Nutch
> > > >> > > >>>>>> -
> > > >> > > >>>>>> > Hadoop
> > > >> > > >>>>>> > > -
> > > >> > > >>>>>> > > > > > HBase
> > > >> > > >>>>>> > > > > > >
> > > >> > > >>>>>> > > > > > > On Wed, Apr 20, 2011 at 8:17 AM, Ted Yu <
> > > >> > > >>>>>> yuzh...@gmail.com>
> > > >> > > >>>>>> > > wrote:
> > > >> > > >>>>>> > > > > > >
> > > >> > > >>>>>> > > > > > > > Interesting project, Alex.
> > > >> > > >>>>>> > > > > > > > Since there're bucketsCount scanners
> compared
> > > to
> > > >> one
> > > >> > > >>>>>> scanner
> > > >> > > >>>>>> > > > > > originally,
> > > >> > > >>>>>> > > > > > > > have you performed load testing to see the
> > > impact
> > > >> ?
> > > >> > > >>>>>> > > > > > > >
> > > >> > > >>>>>> > > > > > > > Thanks
> > > >> > > >>>>>> > > > > > > >
> > > >> > > >>>>>> > > > > > > > On Tue, Apr 19, 2011 at 10:25 AM, Alex
> > Baranau
> > > <
> > > >> > > >>>>>> > > > > > alex.ba...@gmail.com
> > > >> > > >>>>>> > > > > > > > >wrote:
> > > >> > > >>>>>> > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Hello guys,
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > I'd like to introduce a new small java
> > > >> project/lib
> > > >> > > >>>>>> around
> > > >> > > >>>>>> > > HBase:
> > > >> > > >>>>>> > > > > > > HBaseWD.
> > > >> > > >>>>>> > > > > > > > > It
> > > >> > > >>>>>> > > > > > > > > is aimed to help with distribution of the
> > > load
> > > >> > > (across
> > > >> > > >>>>>> > > > > regionservers)
> > > >> > > >>>>>> > > > > > > > when
> > > >> > > >>>>>> > > > > > > > > writing sequential (becasue of the row
> key
> > > >> nature)
> > > >> > > >>>>>> records.
> > > >> > > >>>>>> > It
> > > >> > > >>>>>> > > > > > > implements
> > > >> > > >>>>>> > > > > > > > > the solution which was discussed several
> > > times
> > > >> on
> > > >> > > this
> > > >> > > >>>>>> > mailing
> > > >> > > >>>>>> > > > list
> > > >> > > >>>>>> > > > > > > (e.g.
> > > >> > > >>>>>> > > > > > > > > here:
> > http://search-hadoop.com/m/gNRA82No5Wk
> > > ).
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Please find the sources at
> > > >> > > >>>>>> > > > > > https://github.com/sematext/HBaseWD(there's
> > > >> > > >>>>>> > > > > > > > > also
> > > >> > > >>>>>> > > > > > > > > a jar of current version for
> convenience).
> > It
> > > >> is
> > > >> > > very
> > > >> > > >>>>>> easy to
> > > >> > > >>>>>> > > > make
> > > >> > > >>>>>> > > > > > use
> > > >> > > >>>>>> > > > > > > of
> > > >> > > >>>>>> > > > > > > > > it: e.g. I added it to one existing
> project
> > > >> with
> > > >> > 1+2
> > > >> > > >>>>>> lines of
> > > >> > > >>>>>> > > > code
> > > >> > > >>>>>> > > > > > (one
> > > >> > > >>>>>> > > > > > > > > where I write to HBase and 2 for
> > configuring
> > > >> > > MapReduce
> > > >> > > >>>>>> job).
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Any feedback is highly appreciated!
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Please find below the short intro to the
> > lib
> > > >> [1].
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Alex Baranau
> > > >> > > >>>>>> > > > > > > > > ----
> > > >> > > >>>>>> > > > > > > > > Sematext :: http://sematext.com/ :: Solr
> -
> > > >> Lucene
> > > >> > -
> > > >> > > >>>>>> Nutch -
> > > >> > > >>>>>> > > > Hadoop
> > > >> > > >>>>>> > > > > -
> > > >> > > >>>>>> > > > > > > > HBase
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > [1]
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Description:
> > > >> > > >>>>>> > > > > > > > > ------------
> > > >> > > >>>>>> > > > > > > > > HBaseWD stands for Distributing
> > (sequential)
> > > >> > Writes.
> > > >> > > >>>>>> It was
> > > >> > > >>>>>> > > > > inspired
> > > >> > > >>>>>> > > > > > by
> > > >> > > >>>>>> > > > > > > > > discussions on HBase mailing lists around
> > the
> > > >> > > problem
> > > >> > > >>>>>> of
> > > >> > > >>>>>> > > choosing
> > > >> > > >>>>>> > > > > > > > between:
> > > >> > > >>>>>> > > > > > > > > * writing records with sequential row
> keys
> > > >> (e.g.
> > > >> > > >>>>>> time-series
> > > >> > > >>>>>> > > data
> > > >> > > >>>>>> > > > > > with
> > > >> > > >>>>>> > > > > > > > row
> > > >> > > >>>>>> > > > > > > > > key
> > > >> > > >>>>>> > > > > > > > > built based on ts)
> > > >> > > >>>>>> > > > > > > > > * using random unique IDs for records
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > First approach makes possible to perform
> > fast
> > > >> > range
> > > >> > > >>>>>> scans
> > > >> > > >>>>>> > with
> > > >> > > >>>>>> > > > help
> > > >> > > >>>>>> > > > > > of
> > > >> > > >>>>>> > > > > > > > > setting
> > > >> > > >>>>>> > > > > > > > > start/stop keys on Scanner, but creates
> > > single
> > > >> > > region
> > > >> > > >>>>>> server
> > > >> > > >>>>>> > > > > > > hot-spotting
> > > >> > > >>>>>> > > > > > > > > problem upon writing data (as row keys go
> > in
> > > >> > > sequence
> > > >> > > >>>>>> all
> > > >> > > >>>>>> > > records
> > > >> > > >>>>>> > > > > end
> > > >> > > >>>>>> > > > > > > up
> > > >> > > >>>>>> > > > > > > > > written into a single region at a time).
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Second approach aims for fastest writing
> > > >> > performance
> > > >> > > >>>>>> by
> > > >> > > >>>>>> > > > > distributing
> > > >> > > >>>>>> > > > > > > new
> > > >> > > >>>>>> > > > > > > > > records over random regions but makes not
> > > >> possible
> > > >> > > >>>>>> doing fast
> > > >> > > >>>>>> > > > range
> > > >> > > >>>>>> > > > > > > scans
> > > >> > > >>>>>> > > > > > > > > against written data.
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > The suggested approach stays in the
> middle
> > of
> > > >> the
> > > >> > > two
> > > >> > > >>>>>> above
> > > >> > > >>>>>> > and
> > > >> > > >>>>>> > > > > > proved
> > > >> > > >>>>>> > > > > > > to
> > > >> > > >>>>>> > > > > > > > > perform well by distributing records over
> > the
> > > >> > > cluster
> > > >> > > >>>>>> during
> > > >> > > >>>>>> > > > > writing
> > > >> > > >>>>>> > > > > > > data
> > > >> > > >>>>>> > > > > > > > > while allowing range scans over it.
> HBaseWD
> > > >> > provides
> > > >> > > >>>>>> very
> > > >> > > >>>>>> > > simple
> > > >> > > >>>>>> > > > > API
> > > >> > > >>>>>> > > > > > to
> > > >> > > >>>>>> > > > > > > > > work with which makes it perfect to use
> > with
> > > >> > > existing
> > > >> > > >>>>>> code.
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Please refer to unit-tests for lib usage
> > info
> > > >> as
> > > >> > > they
> > > >> > > >>>>>> aimed
> > > >> > > >>>>>> > to
> > > >> > > >>>>>> > > > act
> > > >> > > >>>>>> > > > > as
> > > >> > > >>>>>> > > > > > > > > example.
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Brief Usage Info (Examples):
> > > >> > > >>>>>> > > > > > > > > ----------------------------
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Distributing records with sequential keys
> > > which
> > > >> > are
> > > >> > > >>>>>> being
> > > >> > > >>>>>> > > written
> > > >> > > >>>>>> > > > > in
> > > >> > > >>>>>> > > > > > up
> > > >> > > >>>>>> > > > > > > > to
> > > >> > > >>>>>> > > > > > > > > Byte.MAX_VALUE buckets:
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > byte bucketsCount = (byte) 32; //
> > > >> distributing
> > > >> > > into
> > > >> > > >>>>>> 32
> > > >> > > >>>>>> > > buckets
> > > >> > > >>>>>> > > > > > > > > RowKeyDistributor keyDistributor =
> > > >> > > >>>>>> > > > > > > > > new
> > > >> > > >>>>>> > > > > > > > >
> > > RowKeyDistributorByOneBytePrefix(bucketsCount);
> > > >> > > >>>>>> > > > > > > > > for (int i = 0; i < 100; i++) {
> > > >> > > >>>>>> > > > > > > > > Put put = new
> > > >> > > >>>>>> > > > > >
> > Put(keyDistributor.getDistributedKey(originalKey));
> > > >> > > >>>>>> > > > > > > > > ... // add values
> > > >> > > >>>>>> > > > > > > > > hTable.put(put);
> > > >> > > >>>>>> > > > > > > > > }
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Performing a range scan over written data
> > > >> > > (internally
> > > >> > > >>>>>> > > > > <bucketsCount>
> > > >> > > >>>>>> > > > > > > > > scanners
> > > >> > > >>>>>> > > > > > > > > executed):
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Scan scan = new Scan(startKey,
> stopKey);
> > > >> > > >>>>>> > > > > > > > > ResultScanner rs =
> > > >> > > >>>>>> DistributedScanner.create(hTable, scan,
> > > >> > > >>>>>> > > > > > > > > keyDistributor);
> > > >> > > >>>>>> > > > > > > > > for (Result current : rs) {
> > > >> > > >>>>>> > > > > > > > > ...
> > > >> > > >>>>>> > > > > > > > > }
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Performing mapreduce job over written
> data
> > > >> chunk
> > > >> > > >>>>>> specified by
> > > >> > > >>>>>> > > > Scan:
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Configuration conf =
> > > >> > HBaseConfiguration.create();
> > > >> > > >>>>>> > > > > > > > > Job job = new Job(conf,
> > > "testMapreduceJob");
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Scan scan = new Scan(startKey,
> stopKey);
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > >
> > > >> TableMapReduceUtil.initTableMapperJob("table",
> > > >> > > >>>>>> scan,
> > > >> > > >>>>>> > > > > > > > > RowCounterMapper.class,
> > > >> > > >>>>>> ImmutableBytesWritable.class,
> > > >> > > >>>>>> > > > > > > Result.class,
> > > >> > > >>>>>> > > > > > > > > job);
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > // Substituting standard
> > TableInputFormat
> > > >> which
> > > >> > > was
> > > >> > > >>>>>> set in
> > > >> > > >>>>>> > > > > > > > > //
> > > >> TableMapReduceUtil.initTableMapperJob(...)
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > job.setInputFormatClass(WdTableInputFormat.class);
> > > >> > > >>>>>> > > > > > > > >
> > > >> keyDistributor.addInfo(job.getConfiguration());
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Extending Row Keys Distributing Patterns:
> > > >> > > >>>>>> > > > > > > > > -----------------------------------------
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > HBaseWD is designed to be flexible and to
> > > >> support
> > > >> > > >>>>>> custom row
> > > >> > > >>>>>> > > key
> > > >> > > >>>>>> > > > > > > > > distribution
> > > >> > > >>>>>> > > > > > > > > approaches. To define custom row key
> > > >> distributing
> > > >> > > >>>>>> logic just
> > > >> > > >>>>>> > > > > > implement
> > > >> > > >>>>>> > > > > > > > > AbstractRowKeyDistributor abstract class
> > > which
> > > >> is
> > > >> > > >>>>>> really very
> > > >> > > >>>>>> > > > > simple:
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > public abstract class
> > > >> AbstractRowKeyDistributor
> > > >> > > >>>>>> implements
> > > >> > > >>>>>> > > > > > > > > Parametrizable {
> > > >> > > >>>>>> > > > > > > > > public abstract byte[]
> > > >> > getDistributedKey(byte[]
> > > >> > > >>>>>> > > > originalKey);
> > > >> > > >>>>>> > > > > > > > > public abstract byte[]
> > > >> getOriginalKey(byte[]
> > > >> > > >>>>>> > adjustedKey);
> > > >> > > >>>>>> > > > > > > > > public abstract byte[][]
> > > >> > > >>>>>> getAllDistributedKeys(byte[]
> > > >> > > >>>>>> > > > > > > originalKey);
> > > >> > > >>>>>> > > > > > > > > ... // some utility methods
> > > >> > > >>>>>> > > > > > > > > }
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > >
> > > >> > > >>>>>> > > > > > >
> > > >> > > >>>>>> > > > > >
> > > >> > > >>>>>> > > > >
> > > >> > > >>>>>> > > >
> > > >> > > >>>>>> > >
> > > >> > > >>>>>> >
> > > >> > > >>>>>>
> > > >> > > >>>>>
> > > >> > > >>>>>
> > > >> > > >>>>
> > > >> > > >>>
> > > >> > > >>
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Wei Shung Chung

unread,

May 19, 2011, 9:23:42 AM5/19/11

to hba...@googlegroups.com

Awesome I am going to use it today.

Thank you so much!

Sent from my iPhone

Reply all

Reply to author

Forward

0 new messages