Opnentsdb compaction issues

198 views
Skip to first unread message

Arpit Mittal

unread,
Nov 11, 2015, 2:33:22 PM11/11/15
to OpenTSDB

While running tsdb fsck --full-scan --threads=8 --fix --resolve-duplicates --compact on devs3 giving below error..It is giving error event after restrating hbase region servers on devsl3..

Can anybody tell us how to fix it? We want to compact data externally using some script/commands..On daily basis compaction will take time so we need to disable it..



2015-11-10 18:22:43,621 WARN New I/O worker #3 Scanner: RegionInfo(table="tsdb", region_name="tsdb,\x00\x060\xCAV<\xB2`\x00\x00\x02\x00\x00\x04\x00\x00\x03\x00\x00\x0C,1447106827193.413bf58b2b7e7b021e860c94bdada84e.", stop_key="") pretends to not know Scanner(table="tsdb", start_key=[0, 25, -33, 62, 86, 64, 54, 96, 0, 0, 2, 0, 0, 4, 0, 0, 3, 0, 0, 12], stop_key="\x00 V\xD5", columns=

{"t"}, populate_blockcache=true, max_num_rows=128, max_num_kvs=4096, region=null, filter=null, scanner_id=0x000000000000000E). I will retry to open a scanner but this is typically because you've been holding the scanner open and idle for too long (possibly due to a long GC pause on your side or in the RegionServer)
org.hbase.async.UnknownScannerException: org.apache.hadoop.hbase.UnknownScannerException: Name: 254, already closed?
at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2221)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2112)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:745)

Caused by RPC: GetNextRowsRequest(scanner_id=0x000000000000000E, max_num_rows=128, region=null, attempt=0)
at org.hbase.async.UnknownScannerException.make(UnknownScannerException.java:60) ~[asynchbase-1.6.0.jar:na]
at org.hbase.async.UnknownScannerException.make(UnknownScannerException.java:32) ~[asynchbase-1.6.0.jar:na]
at org.hbase.async.RegionClient.makeException(RegionClient.java:1448) [asynchbase-1.6.0.jar:na]
at org.hbase.async.RegionClient.decodeException(RegionClient.java:1468) [asynchbase-1.6.0.jar:na]
at org.hbase.async.RegionClient.decode(RegionClient.java:1299) [asynchbase-1.6.0.jar:na]
at org.hbase.async.RegionClient.decode(RegionClient.java:89) [asynchbase-1.6.0.jar:na]
at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:500) [netty-3.9.4.Final.jar:na]
at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435) [netty-3.9.4.Final.jar:na]
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)[netty-3.9.4.Final.jar:na]
at org.hbase.async.RegionClient.handleUpstream(RegionClient.java:1082) [asynchbase-1.6.0.jar:na]
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [netty-3.9.4.Final.jar:na]
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) [netty-3.9.4.Final.jar:na]
at org.hbase.async.HBaseClient$RegionClientPipeline.sendUpstream(HBaseClient.java:2677) [asynchbase-1.6.0.jar:na]
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) [netty-3.9.4.Final.jar:na]
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) [netty-3.9.4.Final.jar:na]
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) [netty-3.9.4.Final.jar:na]
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) [netty-3.9.4.Final.jar:na]
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318) [netty-3.9.4.Final.jar:na]
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) [netty-3.9.4.Final.jar:na]
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) [netty-3.9.4.Final.jar:na]
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [netty-3.9.4.Final.jar:na]
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) [netty-3.9.4.Final.jar:na]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_79]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_79]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79]

2015-11-10 18:28:21,418 WARN New I/O worker #3 Scanner: RegionInfo(table="tsdb", region_name="tsdb,\x00\x060\xCAV<\xB2`\x00\x00\x02\x00\x00\x04\x00\x00\x03\x00\x00\x0C,1447106827193.413bf58b2b7e7b021e860c94bdada84e.", stop_key="") pretends to not know Scanner(table="tsdb", start_key=[0, 38, -50, -39, 86, 64, 40, 80, 0, 0, 2, 0, 0, 4, 0, 0, 3, 0, 0, 7], stop_key="\x00-F]", columns={"t"}

, populate_blockcache=true, max_num_rows=128, max_num_kvs=4096, region=null, filter=null, scanner_id=0x0000000000000009). I will retry to open a scanner but this is typically because you've been holding the scanner open and idle for too long (possibly due to a long GC pause on your side or in the RegionServer)
org.hbase.async.UnknownScannerException: org.apache.hadoop.hbase.UnknownScannerException: Name: 265, already closed?
at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2221)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2112)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:745)

Caused by RPC: GetNextRowsRequest(scanner_id=0x0000000000000009, max_num_rows=128, region=null, attempt=0)
at org.hbase.async.UnknownScannerException.make(UnknownScannerException.java:60) ~[asynchbase-1.6.0.jar:na]
at org.hbase.async.UnknownScannerException.make(UnknownScannerException.java:32) ~[asynchbase-1.6.0.jar:na]
at org.hbase.async.RegionClient.makeException(RegionClient.java:1448) [asynchbase-1.6.0.jar:na]
at org.hbase.async.RegionClient.decodeException(RegionClient.java:1468) [asynchbase-1.6.0.jar:na]
at org.hbase.async.RegionClient.decode(RegionClient.java:1299) [asynchbase-1.6.0.jar:na]
at org.hbase.async.RegionClient.decode(RegionClient.java:89) [asynchbase-1.6.0.jar:na]
at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:500) [netty-3.9.4.Final.jar:na]
at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435) [netty-3.9.4.Final.jar:na]
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)[netty-3.9.4.Final.jar:na]
at org.hbase.async.RegionClient.handleUpstream(RegionClient.java:1082) [asynchbase-1.6.0.jar:na]
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [netty-3.9.4.Final.jar:na]
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) [netty-3.9.4.Final.jar:na]
at org.hbase.async.HBaseClient$RegionClientPipeline.sendUpstream(HBaseClient.java:2677) [asynchbase-1.6.0.jar:na]
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) [netty-3.9.4.Final.jar:na]
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) [netty-3.9.4.Final.jar:na]
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) [netty-3.9.4.Final.jar:na]
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) [netty-3.9.4.Final.jar:na]
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318) [netty-3.9.4.Final.jar:na]
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) [netty-3.9.4.Final.jar:na]
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) [netty-3.9.4.Final.jar:na]
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [netty-3.9.4.Final.jar:na]
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) [netty-3.9.4.Final.jar:na]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_79]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_79]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79]
2015-11-10 18:28:21,420 INFO Fsck #6 Fsck: More than one column had a value for the same timestamp: (1447047000000 - Mon Nov 09 00:30:00 EST 2015)
row key: (0026CED956402850000002000004000003000007)
write time: (1447109463607) compacted: (false) qualifier: [112, -117] <--- keep oldest
write time: (1447109463607 - Mon Nov 09 17:51:03 EST 2015) compacted: (false) qualifier: [112, -117] value: 39.375

Arpit Mittal

unread,
Nov 11, 2015, 2:47:13 PM11/11/15
to OpenTSDB
If there is some way to compact data externally please let us know..

On Wednesday, November 11, 2015 at 11:33:22 AM UTC-8, Arpit Mittal wrote:

While running tsdb fsck --full-scan --threads=8 --fix --resolve-duplicates --compact on opentsdb  giving below error..It is giving error event after restrating hbase region servers.

Can anybody tell us how to fix it? We want to compact data externally using some script/commands..On daily basis compaction will take time so we need to disable boolean flag since it took time in inserting data..We struck at it..Any help will be appreciated if u know,

ManOLamancha

unread,
Nov 11, 2015, 3:00:21 PM11/11/15
to OpenTSDB

On Wednesday, November 11, 2015 at 11:47:13 AM UTC-8, Arpit Mittal wrote:
If there is some way to compact data externally please let us know..

On Wednesday, November 11, 2015 at 11:33:22 AM UTC-8, Arpit Mittal wrote:

While running tsdb fsck --full-scan --threads=8 --fix --resolve-duplicates --compact on opentsdb  giving below error..It is giving error event after restrating hbase region servers.

Can anybody tell us how to fix it? We want to compact data externally using some script/commands..On daily basis compaction will take time so we need to disable boolean flag since it took time in inserting data..We struck at it..Any help will be appreciated if u know, 

2015-11-10 18:22:43,621 WARN New I/O worker #3 Scanner: RegionInfo(table="tsdb", region_name="tsdb,\x00\x060\xCAV<\xB2`\x00\x00\x02\x00\x00\x04\x00\x00\x03\x00\x00\x0C,1447106827193.413bf58b2b7e7b021e860c94bdada84e.", stop_key="") pretends to not know Scanner(table="tsdb", start_key=[0, 25, -33, 62, 86, 64, 54, 96, 0, 0, 2, 0, 0, 4, 0, 0, 3, 0, 0, 12], stop_key="\x00 V\xD5", columns=

{"t"}, populate_blockcache=true, max_num_rows=128, max_num_kvs=4096, region=null, filter=null, scanner_id=0x000000000000000E). I will retry to open a scanner but this is typically because you've been holding the scanner open and idle for too long (possibly due to a long GC pause on your side or in the RegionServer)

It looks like the TSD may be in GC or just running slowly. Try logging the GCs for the process while it's running and see if it enters full collections frequently. If so, try dropping the threads. Also a full scan will take longer and longer. Instead, try looking at enabling appends with 2.2 as it will compact the data in real-time.
 
Reply all
Reply to author
Forward
0 new messages