On Tue, Feb 5, 2013 at 11:07 PM, Andrey Stepachev <
oct...@gmail.com> wrote:
> I think, that usage of hbase locks is a very bad idea, such locks they don't
> scale and easily kill hbase region server in case of storm of new metrics.
Yes it was a design mistake to store all the MAX_ID in the same row,
and to use an explicit row lock. In addition to this, there was a
limitation in HBase that made this whole dance more complicated than
necessary.
I believe it's possible to rewrite the code using atomicIncrement()
and compareAndSet() to have a lock-free, ZooKeeper-free version of the
code.
Algorithm would be roughly as follows:
0. Check if the name doesn't have a UID already.
1. Do an atomicIncrement() on the MAX_ID to get a new UID.
2. compareAndSet() the forward mapping (UID => name).
3. compareAndSet() the reverse mapping (name => UID).
If we die between step 1 and 2: we just waste an UID.
If we die between step 2 and 3: we just waste an UID and have an orphan UID.
Step 1 is atomic so no race condition possible here.
Step 2 technically doesn't need to be a compareAndSet() because the ID
we're trying to assign is guaranteed unique by the atomicIncrement(),
however it will help be absolutely sure we don't unintentionally
overwrite an existing cell (which would only happen only in really
weird / data corruption cases).
Step 3 is racy, because if two TSDs attempt to assign an UID to the
same name at the same time, only one of them will have its
compareAndSet() succeed. The other one will have to retry from step 0
and find that the UID has been assigned, so the one it attempted to
assign will just be wasted.
We can always have the "tsdb uid fsck" command find the unused and
orphaned UIDs and put them in some kind of a free list for later use,
so that no UID is "wasted".
Does that sound reasonable? It would be a good excuse to rewrite the
UniqueId code to be fully asynchronous / non-blocking.
On Thu, Feb 7, 2013 at 5:24 PM, ManOLamancha <
clars...@gmail.com> wrote:
> I like this feature too but if 2.0 includes a backend abstraction layer,
> you'll want to port it for that. These ZK locks certainly helped our prod
> instance when we first turned it up, though even with a heap of 4GB we were
> killing our ZK instances. Thanks!
Uhh, how on earth were you able to drive ZK out of memory by assigning
UIDs with TSD? That sounds just wrong.
--
Benoit "tsuna" Sigoure