On Sun, Sep 23, 2012 at 9:50 AM, tsuna <
tsun...@gmail.com> wrote:
> On Thu, Sep 13, 2012 at 11:33 PM, Andrey Stepachev <
oct...@gmail.com> wrote:
>> HBase row locks totaly broken, for external usage (especially in highly
>> concurrent environment).
>
> Can you explain exactly what is broken and what is wrong with the way
> that OpenTSDB uses them to assign unique IDs in a highly concurrent
> environment?
OpenTSDB sends all rpc requests via one channel. On the other side,
Hbase is designed for blocked io, so when HRegion#internalObtainRowLock
tries to lock row, it blocks whole rpc handler.
UniqueId can request locks for the same row simultaneously (from
different threads) .
Both requests will be sent via one channel, and first rpc will aquire
lock, but second will block rpc handler forever.
By default asynchbase use smaller timeouts (20ms) then lock timeouts (60ms).
So, asynchbase will reconnect and send all outgoing rpc once more (and
locks once
more).
Why it kills regions server?
'tsdb-uid' table always resides on one region server (due of it's size).
In my case rpc handler count was set to 10. I has 3 opentsdb servers.
If all servers try to aquire the same lock (it is easy in some common tag),
with default timeouts those 3 servers easy exhaust all 10 rpc handlers.
Of course, my first attempt was to increase number of rpc handlers, but
it not help. And alter that I drill down the sources and find such
weird behavior.
After that I simply replace row locks with zk locks.
(I'm lazy, so I simply take
https://github.com/mairbek/simple-zoo).
After that all works like a charm and opentsdb never blocks.
Why it is possible? Because class
> Unique IDs aren't expected to be assigned too
> frequently, so the row lock is normally not used much.
In case of many data incoming with the same tag and from many sources
(imagine some tag 'cluster_name'), it is very likely to fall in situation
described before. In my case: 80 hosts simultaneously send points,
opentsdb very often locks forever (and hbase region server too).
And I think, that hbase locks are totally broken and unusable with
asynchbase. Zookeeper is more suitable here due of it async nature.
>
> --
> Benoit "tsuna" Sigoure
--
Andrey.