(I realize this thread is a few months old, but this may help somebody)
I ran into this exact same scenario a few weeks ago. Followed the quickie howto to setup a single node instance, TSD worked well. I rebooted, then all of a sudden TSD is unhappy because -ROOT- is gone because HBase was unhappy due to the exact same ChecksumException exception. I forget how graceful I shut the system down, if I stopped HBase first or did something silly and had to reset. Also hbck was of no help since the region wasn't online ("root region is null [...] fatal").
Anyways, from what I can tell in 0.92 (HBASE-1364) they introduced a distributed log splitting feature. Googling around for the java exception lead me to putting this in my base config file:
<property> <name>hbase.master.distributed.log.splitting</name> <value>false</value> </property>
After this, the region came online and TSD was happy. After spending a few minutes just now reading over the source and log files, it's not obvious to me what this did. I'm completely new to HBase so I don't know how things work under the hood. Without really diving in to say for sure, my guess is that there was a truncated log file caused by reboot and then when HBase started up and tried to rotate the file (which now failed a checksum), it couldn't; possibly this short-circuited something and let it move it out of the way. Having said that, this is apparently an important new performance feature and should really be left enabled according to the HBase docs. Or it all could be a red herring!
On Tuesday, March 27, 2012 12:09:29 AM UTC-7, Toni Moreno wrote:
I have a working hbase 0.92.0 and OpenTSDB 1.1.0 installation and I've been collecting about 500 metrics by minute since 1 week ago. Suddenly at the middle of the week my HBASE and TSD process seemed freezed and I rebooted all the processes by kill them.
2012-03-27 08:51:34,815 ERROR [main-EventThread] HBaseClient: The znode for the -ROOT- region doesn't exist!