2016-04-14 11:50:34,495 ERROR [RS_OPEN_REGION-sea-badger:60020-2] handler.OpenRegionHandler: Failed open of region=tsdb,\x00\x07\xD8V\xF7\xAF \x00\x00\x01\x00\x03\x05\x00\x00\x02\x00\x02\xF3\x00\x00\x04\x00\x00\x10\x00\x00\x0C\x00\x03\x08\x00\x00&\x00\x00\x8A\x00\x00'\x00\x00\x8A\x00\x00(\x00\x03\x09\x00\x00=\x00\x00\xBC,1459245586846.909c9397f7105f3141ce8a5dcea6b8c4., starting to roll back the global memstore size.java.io.IOException: java.io.IOException: org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile Trailer from file file:/data/hbase/hbase/data/default/tsdb/909c9397f7105f3141ce8a5dcea6b8c4/t/746cdacd07844815af8a46e1bf9dd19aat org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionStores(HRegion.java:832)...
We run single node OpenTSDB with HBase writing to local file (RAID backed) in stead of HDFS (as recommended for our scale). OpenTSDB easily handles the ingestion rate (about 7000 dps).However we have had repeated file level corruption problems. Over the last few months our 2 test systems have 5 times had an HBase 'tsdb' region is stuck in a FAILED_OPEN state. The only way I could recover from this is to delete the region file from the disk.Is there something we can improve in our setup to avoid these errors? I am thinking about moving to HDFS. Is it possible/worth while to run a single node HDFS (with mulitple JBOD disks for reliability).
Feel free to open an issue, something like
"OpenTSDB should reliably operate on a single node"
It is something we discussed while looking at roadmap items and should be supportable once we have abstracted the storage layer better.