HBase2 compatibility and performance feedback

288 views
Skip to first unread message

ezjrbfjm ezjrbfjm

unread,
Jun 16, 2020, 10:30:00 AM6/16/20
to OpenTSDB
After upgrading from HBase verison 1.x to HBase version 2.x we have noticed a significant performance drop in OpenTSDB (just for reads), especially with high cardinality metrics. Regionservers also started to intermittently hang and die more frequently. Previously, we applied the PR of https://github.com/OpenTSDB/asynchbase/issues/198 to avoid the endless RPC retry storm, with HBase2 this issue came back (could be related to the new way of getting data from the hbase:meta?). To solve it we tried to reduce the region movement as much as possible.

Another thing we have found is using literal_or OpenTSDB filters were taking ages to complete and HBase nodes CPU utilization were higher than before (running only these type of queries), much higher than using solely regexp or wildcard even. We raised this issue here https://github.com/OpenTSDB/opentsdb/issues/1968 

Have you experienced similar performance issues with OpenTSDB (HBase2)?
Is there any settings we should consider to apply for OpenTSDB with HBase2 other than the hbase.meta.scan?

OpenTSDB: 2.4
AsyncHBase: 1.8.2 (with PR202 - without the NonRecoverableException change, to not give up on the HBase meta scans)

Thanks

Josh Norell

unread,
Jun 16, 2020, 3:42:55 PM6/16/20
to ezjrbfjm ezjrbfjm, OpenTSDB
I'm using Hbase 2.0.2.3.1.2.2-1. It seems reasonably stable, but I have never used any other versions to compare it to. I also see region servers seemingly restart unexpectedly on occasion.

https://github.com/OpenTSDB/asynchbase/issues/198 hit me also. I applied the suggested patch, and it seems to have mostly fixed the problem. I still get the occasional storm of "regionserver.RSRpcServices: Client tried to access missing scanner" messages, but they're not nearly as voluminous, and short lived.

Have you gone through the OpenTSDB tuning guide?

--
You received this message because you are subscribed to the Google Groups "OpenTSDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opentsdb+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/opentsdb/5ae7a248-5aba-4721-bd2f-4d0e42671593o%40googlegroups.com.

Fred Peng

unread,
Jun 17, 2020, 12:15:32 AM6/17/20
to OpenTSDB
+1, I applied the suggested patch(https://github.com/OpenTSDB/asynchbase/issues/198) too. It seems to reduce RegionServer OOM times and our opentsdb and hbase cluster become more stable, but not completely solve the problem. (Rpc storm of hbase:meta seems to cause regionservers die too, especially regionserver restart, tsdb table regions move, split, reopen or other NSRE case.)
I will try to read the src code deeply and find why.

在 2020年6月17日星期三 UTC+8上午3:42:55,Josh写道:
To unsubscribe from this group and stop receiving emails from it, send an email to open...@googlegroups.com.

ezjrbfjm ezjrbfjm

unread,
Jun 22, 2020, 11:52:25 AM6/22/20
to OpenTSDB
Thanks for the feedback. The backend is a reasonably sized cluster, based on the use-case (10+ physical nodes), which handles about 1M requests per second on average (70% reads and 30% writes). The settings are aligned with the recommendations on the page.
To unsubscribe from this group and stop receiving emails from it, send an email to open...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages