Restarting HBASE

158 views
Skip to first unread message

shubham agarwal

unread,
Nov 24, 2015, 6:44:33 AM11/24/15
to Async HBase
Hi,
I accidentally deleted some datanode of the HBASE cluster. Now whenever I am trying to restart HBASE master and region server, it keeps on splitting regions and the fail after some time. So can someone help me on how to restart these. 
The logs of regionserver 


2015-11-24 11:26:00,548 INFO  [RS_LOG_REPLAY_OPS-] util.FSHDFSUtils: recoverLease=false, attempt=14 on file=hdfs://localhost/hbase/WALs/,1448343272089-splitting/%2C60020%2C1448343272089..meta.1448343280481.meta after 837759ms
2015-11-24 11:26:58,137 INFO  [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=3.29 MB, freeSize=3.13 GB, max=3.13 GB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=89, evicted=0, evictedPerRun=0.0
2015-11-24 11:27:04,654 INFO  [RS_LOG_REPLAY_OPS-:60020-0] util.FSHDFSUtils: recoverLease=false, attempt=15 on file=hdfs://localhost/hbase/WALs/,60020,1448343272089-splitting/%2C60020%2C1448343272089..meta.1448343280481.meta after 901865ms
2015-11-24 11:27:04,654 WARN  [RS_LOG_REPLAY_OPS-:60020-0] util.FSHDFSUtils: Cannot recoverLease after trying for 900000ms (hbase.lease.recovery.timeout); continuing, but may be DATALOSS!!!; attempt=15 on file=hdfs://localhost/hbase/WALs/,60020,1448343272089-splitting/.visa.com%2C60020%2C1448343272089..meta.1448343280481.meta after 901865ms
2015-11-24 11:27:04,922 WARN  [RS_LOG_REPLAY_OPS-:60020-0] wal.WALFactory: Lease should have recovered. This is not expected. Will retry
java.io.IOException: Cannot obtain block length for LocatedBlock{BP-1837467880-10.211.26.203-1439511762417:blk_1073763333_22515; getBlockSize()=83; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[127.0.0.1:50010,DS-b74ae2a5-b2d6-42df-848d-c1aee5cfc112,DISK]]}
at org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:386)
at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:329)
at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:265)
at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:257)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1492)
at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:302)
at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:298)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:298)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:766)
at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:290)
at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:266)
at org.apache.hadoop.hbase.wal.WALSplitter.getReader(WALSplitter.java:839)
at org.apache.hadoop.hbase.wal.WALSplitter.getReader(WALSplitter.java:763)
at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:304)
at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:242)
at org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:104)
at org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:72)
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2015-11-24 11:31:58,137 INFO  [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=3.29 MB, freeSize=3.13 GB, max=3.13 GB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=119, evicted=0, evictedPerRun=0.0
2015-11-24 11:32:05,451 ERROR [RS_LOG_REPLAY_OPS-:60020-0] wal.WALFactory: Can't open after 300 attempts and 300797ms  for hdfs://localhost/hbase/WALs/,60020,1448343272089-splitting/%2C60020%2C1448343272089..meta.1448343280481.meta
2015-11-24 11:32:05,453 INFO  [RS_LOG_REPLAY_OPS--0] wal.WALSplitter: Processed 0 edits across 0 regions; edits skipped=0; log file=hdfs://localhost/hbase/WALs/,60020,1448343272089-splitting/%2C60020%2C1448343272089..meta.1448343280481.meta, length=83, corrupted=false, progress failed=false
2015-11-24 11:32:05,453 WARN  [RS_LOG_REPLAY_OPS:60020-0] regionserver.SplitLogWorker: log splitting of WALs/,60020,1448343272089-splitting/%2C60020%2C1448343272089..meta.1448343280481.meta failed, returning error
java.io.IOException: Cannot obtain block length for LocatedBlock{BP-1837467880-10.211.26.203-1439511762417:blk_1073763333_22515; getBlockSize()=83; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[127.0.0.1:50010,DS-b74ae2a5-b2d6-42df-848d-c1aee5cfc112,DISK]]}
at org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:386)
at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:329)
at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:265)
at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:257)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1492)
at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:302)
at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:298)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:298)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:766)
at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:290)
at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:266)
at org.apache.hadoop.hbase.wal.WALSplitter.getReader(WALSplitter.java:839)
at org.apache.hadoop.hbase.wal.WALSplitter.getReader(WALSplitter.java:763)
at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:304)
at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:242)
at org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:104)
at org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:72)
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2015-11-24 11:32:05,458 INFO  [RS_LOG_REPLAY_OPS-:60020-0] coordination.ZkSplitLogWorkerCoordination: successfully transitioned task /hbase/splitWAL/WALs%2F%2C60020%2C1448343272089-splitting%2Fsl73operadbd001.visa.com%252C60020%252C1448343272089..meta.1448343280481.meta to final state ERR ,60020,1448363517961
2015-11-24 11:32:05,464 INFO  [RS_LOG_REPLAY_OPS-:60020-0] handler.WALSplitterHandler: worker ,60020,1448363517961 done with task org.apache.hadoop.hbase.coordination.ZkSplitLogWorkerCoordination$ZkSplitTaskDetails@a61bf66 in 1202744ms
2015-11-24 11:36:58,137 INFO  [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=3.29 MB, freeSize=3.13 GB, max=3.13 GB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=149, evicted=0, evictedPerRun=0.0

stand...@gmail.com

unread,
Jan 22, 2016, 5:05:29 AM1/22/16
to Async HBase
Hi,

I am having the same issue. Did you manage to resolve this at all, and if so how?

Many thanks.
Reply all
Reply to author
Forward
0 new messages