Opentsdb error when using a replicated hbase instance

240 views
Skip to first unread message

kurien....@gmail.com

unread,
Mar 28, 2016, 6:06:35 PM3/28/16
to OpenTSDB
Using Opentsdb 2.2

==
I set up hbase to replicate to a peer cluster from a primary cluster and am trying to use opentsdb on the replicated cluster in read only mode.

However, I get the following exception:
==
ERROR [HttpQuery.logError] - [id: 0x3d782f27, /10.0.0.137:53400 => /10.0.191.194:4200] Internal Server Error on /q?start=2016/03/28-14:53:00&end=2016/03/28-15:03:52&m=sum:node.cpu_load.count&o=&yrange=%5B0:%5D&wxh=1847x759&json
com.stumbleupon.async.DeferredGroupException: At least one of the Deferreds failed, first exception:
at com.stumbleupon.async.DeferredGroup.done(DeferredGroup.java:169) ~[async-1.4.0.jar:na]
at com.stumbleupon.async.DeferredGroup.recordCompletion(DeferredGroup.java:158) ~[async-1.4.0.jar:na]
at com.stumbleupon.async.DeferredGroup.access$200(DeferredGroup.java:36) ~[async-1.4.0.jar:na]
at com.stumbleupon.async.DeferredGroup$1NotifyOrdered.call(DeferredGroup.java:97) ~[async-1.4.0.jar:na]
at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278) ~[async-1.4.0.jar:na]
at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257) ~[async-1.4.0.jar:na]
at com.stumbleupon.async.Deferred.callback(Deferred.java:1005) ~[async-1.4.0.jar:na]
at org.hbase.async.HBaseRpc.callback(HBaseRpc.java:506) ~[asynchbase-1.6.0.jar:na]
at org.hbase.async.RegionClient.decode(RegionClient.java:1351) ~[asynchbase-1.6.0.jar:na]
at org.hbase.async.RegionClient.decode(RegionClient.java:89) ~[asynchbase-1.6.0.jar:na]
at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:500) ~[netty-3.9.4.Final.jar:na]
at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435) ~[netty-3.9.4.Final.jar:na]
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) ~[netty-3.9.4.Final.jar:na]
at org.hbase.async.RegionClient.handleUpstream(RegionClient.java:1082) ~[asynchbase-1.6.0.jar:na]
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) ~[netty-3.9.4.Final.jar:na]
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) ~[netty-3.9.4.Final.jar:na]
at org.hbase.async.HBaseClient$RegionClientPipeline.sendUpstream(HBaseClient.java:2677) ~[asynchbase-1.6.0.jar:na]
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) [netty-3.9.4.Final.jar:na]
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) [netty-3.9.4.Final.jar:na]
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) [netty-3.9.4.Final.jar:na]
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) [netty-3.9.4.Final.jar:na]
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318) [netty-3.9.4.Final.jar:na]
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) [netty-3.9.4.Final.jar:na]
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) [netty-3.9.4.Final.jar:na]
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [netty-3.9.4.Final.jar:na]
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) [netty-3.9.4.Final.jar:na]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_75]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_75]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
Caused by: net.opentsdb.uid.NoSuchUniqueId: No such unique ID for 'tagk': [0, 0, 1]
at net.opentsdb.uid.UniqueId$1GetNameCB.call(UniqueId.java:226) ~[tsdb-2.1.3.jar:]
at net.opentsdb.uid.UniqueId$1GetNameCB.call(UniqueId.java:223) ~[tsdb-2.1.3.jar:]

Any clues?
thanks
V

Jonathan Creasy

unread,
Mar 28, 2016, 6:15:18 PM3/28/16
to kurien....@gmail.com, OpenTSDB
Looks like your data isn't copied over properly. Do the OpenTSDB uid and data tables look correct in the replica cluster?

The key bit of course is "Caused by: net.opentsdb.uid.NoSuchUniqueId: No such unique ID for 'tagk': [0, 0, 1]" which says there is no UID for a tagk [0,0,1].

V Kurien

unread,
Mar 28, 2016, 6:53:59 PM3/28/16
to OpenTSDB, kurien....@gmail.com
Thanks Jonathan,

So here are the tests that I did:

  1. Checked if the tables (especially tsdb-uid) had the right schema as per: describe 'tsdb-uid'. Looked correct. There were also no errors in the source's region server log.
  2. Clearly however something is wrong on the destination since the output of tsdb fsck gives:
==
2016-03-28 15:44:40,729 ERROR [Fsck #4] Fsck: Unable to resolve the metric from the row key.
    Key: 00000156F9A96000000100000100000200030B000003000003000004000004000005000005
    No such unique ID for 'metrics': [0, 0, 1]
2016-03-28 15:44:40,729 ERROR [Fsck #0] Fsck: Unable to resolve the metric from the row key.
    Key: 00000156F9A960000001000001000002000307000003000003000004000004000005000005
    No such unique ID for 'metrics': [0, 0, 1]
2016-03-28 15:44:40,730 ERROR [Fsck #6] Fsck: Unable to resolve the metric from the row key.
    Key: 00000156F9A96000000100000100000200030F000003000003000004000004000005000005
    No such unique ID for 'metrics': [0, 0, 1]
2016-03-28 15:44:40,730 ERROR [Fsck #7] Fsck: Unable to resolve the metric from the row key.
    Key: 00000156F9A960000001000001000002000307000003000003000004000004000005000005
    No such unique ID for 'metrics': [0, 0, 1]
2016-03-28 15:44:40,730 ERROR [Fsck #1] Fsck: Unable to resolve the metric from the row key.
    Key: 00000156F9A960000001000001000002000303000003000003000004000004000005000005
.....
and then refuses to clean up anything.

Are there other diagnostics that I can do

V Kurien

unread,
Mar 28, 2016, 8:10:29 PM3/28/16
to OpenTSDB, kurien....@gmail.com
I figured out the problem and am posting so as to help others who may run into the same issue.

  1. Replication wasn't working correctly even though the hbase log did not post errors. Running the verifier: hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication '545022062' 'tsdb-uid' resulted in errors being found
  2. I had drop tsdb tables on both sides, stop replication by deleting peers, recreate the tables ON BOTH SIDES (which Hortonworks told me I don't need to do), enable replication and enable table replication.
Reply all
Reply to author
Forward
0 new messages