OpenTsdb hangs during region split

Mike Kobyakov

unread,

Apr 14, 2015, 3:44:23 PM4/14/15

to open...@googlegroups.com

I suspect this is during region split, although I am not sure. My current repro is to telnet to an instance, run 'help', then execute 'stats'. 'help' works almost all of the time, but 'stats' always hangs, and the socket is not responsive to any other commands. Here is the gist of the logs during the repro. https://gist.github.com/mxk1235/6bec63717bb09b40f82f

One of the most troubling lines in the gist is the following line:

INFO [ClientCnxn.run] - EventThread shut down

does this affect any requests that come after? is the thread restarted if it's needed?

Here is how it happened. We ended up in a situation where a lot of metrics got piped into OpenTsdb via the socket interface at the same time, and opentsdb appeared to hang, /api/version wasn't responding and metrics were not being recorded. Some messages from this group indicated that it may happen during compaction, so we turned compaction off and restarted. /api/version came back 200 for upto an hour, but then started having issues again. Metrics are not recorded at all. The UI is not responsive either. We think the region is in the process of being split or never completed, and opentsdb is having issues simply talking to Hbase.

do you have any advice in how to run diagnostics or repair on the hbase side?

any help is greatly appreciated. thanks in advance.

-mike

ManOLamancha

unread,

Apr 23, 2015, 9:39:42 PM4/23/15

to open...@googlegroups.com

Most likely you ran into a GC issue as the RPCs queued up. Did you have GC logging enabled? The "EventThread shut down" is related to zookeeper and it will restart when needed so I don't think that one is anything to worry about.

For the inflight queue, we're going to make that configurable and support plugins to handle dat a points properly when HBase is behind. The interface is already in the "put" branch, I just need to document and start implementing plugins.

伍照坤

unread,

May 20, 2015, 10:18:57 AM5/20/15

to open...@googlegroups.com

check your CPU usage, and GC.

Monitor both client and HBase region server.

Reply all

Reply to author

Forward