Pre-splitting table when writing to it with HBaseWD

124 views
Skip to first unread message

Alex Baranau

unread,
Aug 3, 2012, 9:38:02 PM8/3/12
to hba...@googlegroups.com
RE pre-splitting the table (changed subject for this part of the thread to "Pre-splitting table when writing to it with HBaseWD"):

There's no helper for that at the moment in HBaseWD lib (but going to be: [1])
    // creating pre-splits
    RowKeyDistributorByOneBytePrefix distributor = new RowKeyDistributorByOneBytePrefix((byte) 32);
    byte[][] allDistributedKeys = distributor.getAllDistributedKeys(new byte[]{});
    byte[][] splitKeys = new byte[allDistributedKeys.length][];
    for (int i = 0; i < allDistributedKeys.length; i++) {
      splitKeys[i] = allDistributedKeys[i];
    }
    admin.createTable(tableDescriptor, splitKeys);

Please, let me know if this helps.

Alex Baranau
------
Sematext :: http://sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr

[1] https://github.com/sematext/HBaseWD/issues/15, feel free to contribute it if you end up having smth re-usable ;)

On Fri, Aug 3, 2012 at 12:44 PM, syed kather <in.a...@gmail.com> wrote:
Team,
   How to do Pre splitting  for HBase table . As i mention in the earlier mail that my rowkey is followed by incremental sequence


when i had created a table like give below . I had notice the startkey and endkey is looking like that

bin/hbase org.apache.hadoop.hbase.util.RegionSplitter ObjectSequence4 -c 3 -f SequenceFamily:ObjectArray:UserArray

Name Region Server Start Key End Key Requests
ObjectSequence4,,1344011081570.941ae89b0c729c924b80d8cf8b8fa4b8. slave1:60030
2aaaaaaa 31670
ObjectSequence4,2aaaaaaa,1344011081570.f4c5647be69af33116fc214f913bb312. slave2:60030 2aaaaaaa 55555554 0
ObjectSequence4,55555554,1344011081570.e5bc298b6915f2b9c42bb28e31c8e640. master:60030 55555554
0
 Sorry if the question is so simple . I searched in net but i couldnt able to find right one

 Other than that i have another doubt
  I am calling the incrementColumnValue from ThriftAPI . How to call this custom incrementColumnValue   function from ThriftAPI and as i know that 
RowKeyDistributorByOneBytePrefix keyDistributor = new RowKeyDistributorByOneBytePrefix(bucketsCount);
we need to initialize this object only at the begining . I dont have idea how to do that

Now my IncrementColumnValue function looks like

public byte[] incrementColumnValue(final byte[] row, RowKeyDistributorByOneBytePrefix keyDistributor,final byte[] family, final byte[] qualifier, final long amount, boolean writeToWAL, byte bucketsCount) throws IOException
    {
        byte[] key=keyDistributor.getDistributedKey(Bytes.toBytes(incrementColumnValue(keyDistributor.getOriginalKey(row), family, qualifier, amount, true)));
        return key;

    }






            Thanks and Regards,
        S SYED ABDUL KATHER 

syed kather

unread,
Aug 6, 2012, 2:14:38 PM8/6/12
to hba...@googlegroups.com

Alex 

After implementing that snippet of code which you gave i have found there still some problem in distribution 

 You can see the Red color high lighted request count is 288727 and green color count is 6.

HBaseAdmin admin=new HBaseAdmin(HBaseConfiguration.create());

RowKeyDistributorByOneBytePrefix distributor = new RowKeyDistributorByOneBytePrefix((byte) 32);

byte[][] allDistributedKeys = distributor.getAllDistributedKeys(new byte[]{});

byte[][] splitKeys = new byte[allDistributedKeys.length][];

for (int i = 0; i < allDistributedKeys.length; i++) {

  splitKeys[i] = allDistributedKeys[i];

}

HTableDescriptor tableDescriptor=new HTableDescriptor("ObjectSequence5");

tableDescriptor.addFamily(new HColumnDescriptor("SequenceFamily"));

tableDescriptor.addFamily(new HColumnDescriptor("ObjectArray"));

tableDescriptor.addFamily(new HColumnDescriptor("UserArray"));

admin.createTable(tableDescriptor, splitKeys);

Table Regions

Name Region Server Start Key End Key Requests ObjectSequence5,,1344248361455.4c6333b7b4657547478fbfb78b8bf89b. slave1:60030 \x00 6 ObjectSequence5,\x00,1344248361455.74e9d501dfdfbfb18ae2bd1f9ae57a17. slave2:60030 \x00 \x01 288727 ObjectSequence5,\x01,1344248361455.79060ed8f8c7e8d698b7aacd1b86cbd5. master:60030 \x01 \x02 6571 ObjectSequence5,\x02,1344248361455.bb646858fb17ca9ada58be40a6f9fb4e. slave1:60030 \x02 \x03 6569 ObjectSequence5,\x03,1344248361456.a87447d1d3d4e9be4f65f7fbc6fc9dcc. slave2:60030 \x03 \x04 6569 ObjectSequence5,\x04,1344248361456.d7358a997b3ee7d416c72d679a742cfa. master:60030 \x04 \x05 6569 ObjectSequence5,\x05,1344248361456.d8e18b13018ee5d4aa319147ea7aeb2c. slave1:60030 \x05 \x06 6569 ObjectSequence5,\x06,1344248361456.c2d82870a2cbe638ffd6b263b3759bf3. slave2:60030 \x06 \x07 6569 ObjectSequence5,\x07,1344248361456.f05be4f78114e057fc4c0cabede60b4f. master:60030 \x07 \x08 6569 ObjectSequence5,\x08,1344248361456.ec6e9e20c5d7bb1557e2b1db6bc4e271. slave1:60030 \x08 \x09 6569 ObjectSequence5,\x09,1344248361456.07556a06eea1c2ea0b69590e485eeedf. slave2:60030 \x09 \x0A 6569 ObjectSequence5,\x0A,1344248361456.c4f698492b4a5b4923402bc7c15dc620. master:60030 \x0A \x0B 6569 ObjectSequence5,\x0B,1344248361456.5f7c168b0ec36be3768c7401d6da6fab. slave1:60030 \x0B \x0C 6571 ObjectSequence5,\x0C,1344248361457.234fa7868ad6771132edf2bf9beb30e8. slave2:60030 \x0C \x0D 6569 ObjectSequence5,\x0D,1344248361457.9af814f64793d2df793624c77579e8d4. master:60030 \x0D \x0E 6569 ObjectSequence5,\x0E,1344248361457.386d791a2a7ee76eddbc60f2f638ae76. slave1:60030 \x0E \x0F 6569 ObjectSequence5,\x0F,1344248361457.ece3acbc9573cce205d4b25bf0367cdc. slave2:60030 \x0F \x10 6569 ObjectSequence5,\x10,1344248361457.57a83ac7d2948b4c9871a1a5e6c7ba64. master:60030 \x10 \x11 6569 ObjectSequence5,\x11,1344248361457.5f9953bc9c0c1d88b9d0294f126a6e21. slave1:60030 \x11 \x12 6569 ObjectSequence5,\x12,1344248361457.070fac08eed8eceb409bc93bdf09c541. slave2:60030 \x12 \x13 6569 ObjectSequence5,\x13,1344248361457.2c26761286b3ff6223929dad10a8d608. master:60030 \x13 \x14 6569 ObjectSequence5,\x14,1344248361457.fcf1757cc8d4672b49eacdf061a75d23. slave1:60030 \x14 \x15 6569 ObjectSequence5,\x15,1344248361458.035cb5ff1640aa36e3da3a376feb541f. slave2:60030 \x15 \x16 6566 ObjectSequence5,\x16,1344248361458.1295dd00e70fa20529d95e841f604748. master:60030 \x16 \x17 6566 ObjectSequence5,\x17,1344248361458.d51f8dfdc227790b43af7d87c7c1ebba. slave1:60030 \x17 \x18 6566 ObjectSequence5,\x18,1344248361458.4e13a01e6061750330b657c196be4e7d. slave2:60030 \x18 \x19 6566 ObjectSequence5,\x19,1344248361458.ef230e4d6787072aa522b00b145f696f. master:60030 \x19 \x1A 6566 ObjectSequence5,\x1A,1344248361458.4935daceea53ff6388443833e4afd89f. slave1:60030 \x1A \x1B 6566 ObjectSequence5,\x1B,1344248361458.7d5fccd5e2fdec98ea0e01c827eb5e43. slave2:60030 \x1B \x1C 6566 ObjectSequence5,\x1C,1344248361458.7a2d3acd733fc96fa703b0515b7f1295. master:60030 \x1C \x1D 6566 ObjectSequence5,\x1D,1344248361458.660879324d1b878f4aef83f5077b73ff. slave1:60030 \x1D \x1E 6566 ObjectSequence5,\x1E,1344248361458.454d6b700d6cf6e4902f45d045cfd48f. slave2:60030 \x1E \x1F 6566 ObjectSequence5,\x1F,1344248361458.f17247ba8a1d216061424c4f94ae795c. master:60030 \x1F 6566

Regions by Region Server

Region Server Region Count http://slave2:60030/ 11 http://master:60030/ 11 http://slave1:60030/ 11

what may be the problem.. let me know the some suggestion to solve this issue

Alex Baranau

unread,
Aug 6, 2012, 9:06:52 PM8/6/12
to hba...@googlegroups.com
This is weird.

Well, I can explain that the first region doesn't get attention, since it has literally nothing (the start key of the second region is the smallest possible key). To fix that, just don't provide first key from distributor as a split [1] - HBase will create first one anyways by default.

But at the same time I can't understand why so many requests go to the second region (\x00 - \x01). There's a unit-test which tests the distribution.. Can it be that the metric you are using to get region requests # to be not accurate? Can you please check the # of rows in that region and compare with some other? (if there's not a lot of rows  - just use scan in hbase shell with START/STOPROW params, otherwise you need to calculate using different way: on client or in MR job). This would help to understand why this happens.

Also (clue): what do you do when you get these measurements? Do you ONLY write data? If not, then could it be that your reading only goes to one region and affects the metrics? Please, measure with reading separately and with writing separately.

Alex Baranau
------
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr

[1]

byte[][] splitKeys = new byte[allDistributedKeys.length - 1][];

for (int i = 1; i < allDistributedKeys.length; i++) {

--
Alex Baranau
------
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr

Reply all
Reply to author
Forward
0 new messages