Failed to lock the `MAXID_ROW` row

202 views
Skip to first unread message

Ivan Pouzyrevsky

unread,
Jul 11, 2012, 7:01:39 AM7/11/12
to open...@googlegroups.com
Hi there.

I am experiencing weird behaviour of OpenTSDB. On a fresh start (with empty HBase) after arrival of new metrics TSD emits the following error:

"2012-07-11 00:18:04,659 WARN  [New I/O  worker #22] UniqueId: Failed to lock the `MAXID_ROW' row"

Afterwards TSD completelty stops responding. Any ideas on how to debug and fix this behaviour? And what can be the cause?

Thanks, Ivan.

Ivan Pouzyrevsky

unread,
Jul 11, 2012, 7:13:41 AM7/11/12
to open...@googlegroups.com
Full stack exception message is:
2012-07-11 13:44:50,050 WARN  [New I/O  worker #1] UniqueId: Failed to lock the `MAXID_ROW' row
org.hbase.async.RemoteException: java.io.IOException: java.lang.NullPointerException
        at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1090)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1079)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.lockRow(HRegionServer.java:2317)
        at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326)
Caused by: java.lang.NullPointerException
        at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.addRowLock(HRegionServer.java:2327)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.lockRow(HRegionServer.java:2313)
        ... 5 more

        at org.hbase.async.RegionClient.deserializeException(RegionClient.java:1186) ~[asynchbase-1.3.0.jar:d93ce96]
        at org.hbase.async.RegionClient.deserialize(RegionClient.java:1161) ~[asynchbase-1.3.0.jar:d93ce96]
        at org.hbase.async.RegionClient.decode(RegionClient.java:1082) ~[asynchbase-1.3.0.jar:d93ce96]
        at org.hbase.async.RegionClient.decode(RegionClient.java:82) ~[asynchbase-1.3.0.jar:d93ce96]
        at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:548) ~[netty-3.4.3.Final.jar:na]
        at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:509) ~[netty-3.4.3.Final.jar:na]
        at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75) ~[netty-3.4.3.Final.jar:na]
        at org.hbase.async.RegionClient.handleUpstream(RegionClient.java:938) ~[asynchbase-1.3.0.jar:d93ce96]
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) ~[netty-3.4.3.Final.jar:na]
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) ~[netty-3.4.3.Final.jar:na]
        at org.hbase.async.HBaseClient$RegionClientPipeline.sendUpstream(HBaseClient.java:2406) ~[asynchbase-1.3.0.jar:d93ce96]
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) [netty-3.4.3.Final.jar:na]
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) [netty-3.4.3.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:94) [netty-3.4.3.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:372) [netty-3.4.3.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:246) [netty-3.4.3.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:38) [netty-3.4.3.Final.jar:na]
        at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:102) [netty-3.4.3.Final.jar:na]
        at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) [netty-3.4.3.Final.jar:na]
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [na:1.6.0_26]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [na:1.6.0_26]
        at java.lang.Thread.run(Thread.java:662) [na:1.6.0_26]


I found the following tickets in various trackers:


среда, 11 июля 2012 г., 15:01:39 UTC+4 пользователь Ivan Pouzyrevsky написал:

tsuna

unread,
Jul 19, 2012, 5:33:22 PM7/19/12
to Ivan Pouzyrevsky, open...@googlegroups.com
On Wed, Jul 11, 2012 at 4:13 AM, Ivan Pouzyrevsky <sand...@gmail.com> wrote:
> https://issues.apache.org/jira/browse/HBASE-5249

You're probably running into the issue above. This could be due to
the fact that there was a long GC pause or something else that caused
the row lock to expire. You can probably determine this by looking at
the log of the RegionServer, if you see messages like "Row Lock <some
big number> lease expired".

--
Benoit "tsuna" Sigoure
Software Engineer @ www.StumbleUpon.com

Ivan Pouzyrevsky

unread,
Jul 24, 2012, 9:11:13 AM7/24/12
to tsuna, open...@googlegroups.com
But how can I work around this issue? It still occurs from time to time at my installation.

tsuna

unread,
Jul 24, 2012, 11:54:32 AM7/24/12
to Ivan Pouzyrevsky, open...@googlegroups.com
On Tue, Jul 24, 2012 at 6:11 AM, Ivan Pouzyrevsky <sand...@gmail.com> wrote:
> But how can I work around this issue? It still occurs from time to time at
> my installation.

You must not work around the issue, you must identify why it's
happening so you can fix the problem. Did you look at HBase's
RegionServer logs and GC logs? Do you see long GC pauses? What about
TSD? Any long GC pauses there? Does HBase complain about row lock
leases expiring?

Ivan Pouzyrevsky

unread,
Jul 25, 2012, 2:35:57 PM7/25/12
to tsuna, open...@googlegroups.com
Well, I am not a Java developer so I cannot reliably answer your question about GC. :) Even worse, unfortunately, I have a limited access to server logs.
To my knowledge, HBase is indeed complaining about leases expiring:

2012-07-24 16:20:16,350 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 1071837979268941134 lease expired
2012-07-24 16:25:56,462 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 3200870587636345397 lease expired
2012-07-24 16:27:56,881 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner -86637772990266246 lease expired
2012-07-24 16:29:22,186 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 1672322412060762991 lease expired
2012-07-24 16:29:26,362 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 3279963418495146062 lease expired
2012-07-24 16:36:08,315 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner -419374655710481373 lease expired
2012-07-24 16:50:53,042 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Row Lock -5023787810774735759 lease expired
2012-07-24 16:51:53,042 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Row Lock -3101224459624214192 lease expired
2012-07-24 16:54:23,053 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Row Lock 774450990562519588 lease expired
2012-07-24 16:56:23,091 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Row Lock -377045105103389087 lease expired

My observations are the following: when OTSDB has a clear database then a spike of dozens of new metrics and tags (precisely, 250 hosts with ~400 metrics per host with total ~16 tags) knocks the system off. I've increased heap size from 1G to 5G and increased lock expiration timeout (as in mentioned above Jira ticket) up to two minutes.

Afterwards, I managed to survive the traffic spike by limiting amount of new hosts to 10 per minute. This allows me to use OTSDB further but does not precisely describe the cause of such a behaviour of HBase/OTSDB.

Thanks anyway for a hint about GC and memory heap size!

tdm

unread,
Jul 25, 2012, 2:49:16 PM7/25/12
to open...@googlegroups.com, tsuna
I'm curious about your hardware setup, Ivan. How many boxes in your cluster? (I've been running into a similar issue)

Ivan Pouzyrevsky

unread,
Jul 25, 2012, 4:48:37 PM7/25/12
to tdm, open...@googlegroups.com, tsuna
I have a single box with HBase & OTSDB on it and 250 boxes sending metrics.
I can give you more details tomorrow: I do not have direct access to the box hence I need to ask the administrator to provide me relevant information.

Until the moment my hypothesis is that this issue occurs only when a lot of new metrics / tag keys / tag values are created. If this a recurrent issue to you — consider distributing workload across the time.

tdm

unread,
Jul 25, 2012, 5:16:50 PM7/25/12
to open...@googlegroups.com, tdm, tsuna, ivan.pou...@gmail.com
hmm, interesting. I'm limited to running HBase and OTSDB on a single vm currently, and I'm wondering how much of the issue for me is related to virtualization. Some more details when you get a chance would be appreciated, thanks Ivan.

Andrey Stepachev

unread,
Sep 14, 2012, 2:33:51 AM9/14/12
to open...@googlegroups.com
Hi Ivan,

HBase row locks totaly broken, for external usage (especially in highly concurrent environment).
I've managed to replace hbase row locks with zk locks. Thats solves UniqueId problems.

среда, 11 июля 2012 г., 15:01:39 UTC+4 пользователь Ivan Pouzyrevsky написал:
Hi there.

tsuna

unread,
Sep 23, 2012, 1:50:02 AM9/23/12
to Andrey Stepachev, open...@googlegroups.com
On Thu, Sep 13, 2012 at 11:33 PM, Andrey Stepachev <oct...@gmail.com> wrote:
> HBase row locks totaly broken, for external usage (especially in highly
> concurrent environment).

Can you explain exactly what is broken and what is wrong with the way
that OpenTSDB uses them to assign unique IDs in a highly concurrent
environment? Unique IDs aren't expected to be assigned too
frequently, so the row lock is normally not used much.

--
Benoit "tsuna" Sigoure

Ivan Pouzyrevsky

unread,
Sep 23, 2012, 3:29:35 AM9/23/12
to tsuna, Andrey Stepachev, open...@googlegroups.com
As far as I understood the main issue is that row lock completely
blocks region server. Therefore HBase stops to react to queries.

You assumption "Unique IDs aren't expected to be assigned too
frequently" does not hold in cases like cold start and initial insert
of a batch of new metrics via "put" command.

Andrey managed to reimplement unique id assignment on top of
Zookeeper. This completely mitigated occurring issues with row
locking.

В. (via iPhone)

tsuna

unread,
Sep 23, 2012, 4:30:37 AM9/23/12
to Ivan Pouzyrevsky, Andrey Stepachev, open...@googlegroups.com
On Sun, Sep 23, 2012 at 12:29 AM, Ivan Pouzyrevsky <sand...@gmail.com> wrote:
> As far as I understood the main issue is that row lock completely
> blocks region server. Therefore HBase stops to react to queries.

This isn't how HBase behaves, I believe you are mistaken. When you
take a row lock, you lock just that row. It doesn't affect the rest
of the region, let alone the entire region server. You must have seen
another problem.

> You assumption "Unique IDs aren't expected to be assigned too
> frequently" does not hold in cases like cold start and initial insert
> of a batch of new metrics via "put" command.

Cold start vs hot start isn't related to UID assignment. Only the
creation of new metrics or tags can cause UID assignments, regardless
of whether or not the TSD is cold or hot. Even if you import millions
of data points with new tags, throughput will quickly reach cruising
speed even if you have thousands of new UIDs to create.

> Andrey managed to reimplement unique id assignment on top of
> Zookeeper. This completely mitigated occurring issues with row
> locking.

This is nice but I don't think it's necessary.

--
Benoit "tsuna" Sigoure

Ivan Pouzyrevsky

unread,
Sep 23, 2012, 4:50:47 AM9/23/12
to tsuna, Andrey Stepachev, open...@googlegroups.com
В. (via iPhone)

On 23.09.2012, at 12:30, tsuna <tsun...@gmail.com> wrote:

> On Sun, Sep 23, 2012 at 12:29 AM, Ivan Pouzyrevsky <sand...@gmail.com> wrote:
>> As far as I understood the main issue is that row lock completely
>> blocks region server. Therefore HBase stops to react to queries.
>
> This isn't how HBase behaves, I believe you are mistaken. When you
> take a row lock, you lock just that row. It doesn't affect the rest
> of the region, let alone the entire region server. You must have seen
> another problem.
>

Andrey can provide more details on the subject. I may be mistaken
since I have only high-level understanding of HBase.

>> You assumption "Unique IDs aren't expected to be assigned too
>> frequently" does not hold in cases like cold start and initial insert
>> of a batch of new metrics via "put" command.
>
> Cold start vs hot start isn't related to UID assignment. Only the
> creation of new metrics or tags can cause UID assignments, regardless
> of whether or not the TSD is cold or hot. Even if you import millions
> of data points with new tags, throughput will quickly reach cruising
> speed even if you have thousands of new UIDs to create.

By cold start I mean case when the database is empty. In this case
rate of new metrics is especially high.

>
>> Andrey managed to reimplement unique id assignment on top of
>> Zookeeper. This completely mitigated occurring issues with row
>> locking.
>
> This is nice but I don't think it's necessary.

It's quite convenient, actually. :)

>
> --
> Benoit "tsuna" Sigoure

Andrey Stepachev

unread,
Sep 23, 2012, 10:41:02 AM9/23/12
to tsuna, open...@googlegroups.com
On Sun, Sep 23, 2012 at 9:50 AM, tsuna <tsun...@gmail.com> wrote:
> On Thu, Sep 13, 2012 at 11:33 PM, Andrey Stepachev <oct...@gmail.com> wrote:
>> HBase row locks totaly broken, for external usage (especially in highly
>> concurrent environment).
>
> Can you explain exactly what is broken and what is wrong with the way
> that OpenTSDB uses them to assign unique IDs in a highly concurrent
> environment?

OpenTSDB sends all rpc requests via one channel. On the other side,
Hbase is designed for blocked io, so when HRegion#internalObtainRowLock
tries to lock row, it blocks whole rpc handler.

UniqueId can request locks for the same row simultaneously (from
different threads) .
Both requests will be sent via one channel, and first rpc will aquire
lock, but second will block rpc handler forever.

By default asynchbase use smaller timeouts (20ms) then lock timeouts (60ms).
So, asynchbase will reconnect and send all outgoing rpc once more (and
locks once
more).
Why it kills regions server?
'tsdb-uid' table always resides on one region server (due of it's size).
In my case rpc handler count was set to 10. I has 3 opentsdb servers.
If all servers try to aquire the same lock (it is easy in some common tag),
with default timeouts those 3 servers easy exhaust all 10 rpc handlers.

Of course, my first attempt was to increase number of rpc handlers, but
it not help. And alter that I drill down the sources and find such
weird behavior.
After that I simply replace row locks with zk locks.
(I'm lazy, so I simply take https://github.com/mairbek/simple-zoo).

After that all works like a charm and opentsdb never blocks.


Why it is possible? Because class
> Unique IDs aren't expected to be assigned too
> frequently, so the row lock is normally not used much.

In case of many data incoming with the same tag and from many sources
(imagine some tag 'cluster_name'), it is very likely to fall in situation
described before. In my case: 80 hosts simultaneously send points,
opentsdb very often locks forever (and hbase region server too).

And I think, that hbase locks are totally broken and unusable with
asynchbase. Zookeeper is more suitable here due of it async nature.

>
> --
> Benoit "tsuna" Sigoure



--
Andrey.

ManOLamancha

unread,
Oct 12, 2012, 10:46:15 AM10/12/12
to open...@googlegroups.com, tsuna
On Sunday, September 23, 2012 10:41:23 AM UTC-4, Andrey Stepachev wrote:

After that I simply replace row locks with zk locks.
(I'm lazy, so I simply take https://github.com/mairbek/simple-zoo).

Hi Andrey, I'm facing the same issue with auto-creation enabled where TSD is colliding and corrupting the mappings in the UID table. Do you have code for your implementation of the Zookeeper locks? Thanks!

ManOLamancha

unread,
Oct 16, 2012, 5:01:50 PM10/16/12
to open...@googlegroups.com, tsuna
Hi All,
The Zookeeper locking, while much faster than the HBase row locks, didn't solve the issue for me. Instead, it looks like the issue is with HBase itself. What seems to be happening is that even though we're locking the row, performing a get, increment, set, then unlocking, a very fast, subsequent lock and get on the row will return the previous value, not the incremented value. Maybe this is because the update hasn't propagated to all of the regions or there's a cache issue somewhere. To solve it, I added a local counter to the UniqueId class and before writing to HBase, it checks this counter's value and uses the greater value.

13:54:22.039 WARN  [UniqueId.getOrCreateId] - Storage UID [4810] was less than local [4811]
13:55:15.232 WARN  [UniqueId.getOrCreateId] - Storage UID [5952] was less than local [5953]
13:55:15.240 WARN  [UniqueId.getOrCreateId] - Storage UID [5952] was less than local [5954]
13:58:25.760 WARN  [UniqueId.getOrCreateId] - Storage UID [10122] was less than local [10123]

Andrey Stepachev

unread,
Oct 17, 2012, 2:14:02 AM10/17/12
to ManOLamancha, open...@googlegroups.com, tsuna
Hi,

Looks like you have unsynchronized clocks. In this case it is possible
to be. so.
Actually, sequence should be keept in zk. And at startup it should be
synchronized
with hbase storage.

I think, that safe algorithm should looks like:
Suppose we have O1 and O2 otsdb servers.
Lets O1 and O2 start simultaneously.
O1 locks sequence generator, scans tsdb-uid table,
finds max seq. Then updates generator with max+1.
Unlocks.
After that all works as usual: lock, increment value, unlock.
Due of low required rpc it is no need in other optimizations.
--
Andrey.

ManOLamancha

unread,
Oct 17, 2012, 9:19:45 AM10/17/12
to open...@googlegroups.com, ManOLamancha, tsuna
On Wednesday, October 17, 2012 2:14:23 AM UTC-4, Andrey Stepachev wrote:
Looks like you have unsynchronized clocks. In this case it is possible
to be. so.

My dev cluster is synchronized over NTP, though I'm not sure how accurate it is.
 
Actually, sequence should be keept in zk. And at startup it should be
synchronized
with hbase storage.

I think, that safe algorithm should looks like:
Suppose we have O1 and O2 otsdb servers.
Lets O1 and O2 start simultaneously.
O1 locks sequence generator, scans tsdb-uid table,
finds max seq. Then updates generator with max+1.
Unlocks.
After that all works as usual: lock, increment value, unlock.
Due of low required rpc it is no need in other optimizations.

 Sounds good to me. I may need to use Zookeeper for leadership tracking in some other OpenTSDB modules I'm going to be adding, so I'll investigate using atomic counters as the main UID source in ZK and flushing that periodically to storage.

Andrey Stepachev

unread,
Oct 17, 2012, 10:10:00 AM10/17/12
to ManOLamancha, open...@googlegroups.com
On Wed, Oct 17, 2012 at 5:19 PM, ManOLamancha <clars...@gmail.com> wrote:
> On Wednesday, October 17, 2012 2:14:23 AM UTC-4, Andrey Stepachev wrote:
>>
>> Looks like you have unsynchronized clocks. In this case it is possible
>> to be. so.
>
>
> My dev cluster is synchronized over NTP, though I'm not sure how accurate it
> is.

Hm. PutRequest uses TIMESTAMP_NOW, timestamp should assign region server.
So it is not an NTP problem. Can you reproduce situation? May be change
code and instead of Get use Scan to get all versions of sequence value.
To see, what is going on.

>
>>
>> Actually, sequence should be keept in zk. And at startup it should be
>> synchronized
>> with hbase storage.
>>
>> I think, that safe algorithm should looks like:
>> Suppose we have O1 and O2 otsdb servers.
>> Lets O1 and O2 start simultaneously.
>> O1 locks sequence generator, scans tsdb-uid table,
>> finds max seq. Then updates generator with max+1.
>> Unlocks.
>> After that all works as usual: lock, increment value, unlock.
>> Due of low required rpc it is no need in other optimizations.
>
>
> Sounds good to me. I may need to use Zookeeper for leadership tracking in
> some other OpenTSDB modules I'm going to be adding, so I'll investigate
> using atomic counters as the main UID source in ZK and flushing that
> periodically to storage.



--
Andrey.
Reply all
Reply to author
Forward
0 new messages