OpenTSDB Selective Queries

173 views
Skip to first unread message

stan.p....@gmail.com

unread,
Mar 6, 2018, 5:49:35 AM3/6/18
to OpenTSDB
Hello all,

We run OpenTSDB (2.3) in my company with workloads of ~250k new datapoints per second and over 2000 queries per minute on a fleet of OpenTSDB read-only instances. Salting is enabled to spread load more evenly across our HBase cluster which runs on d2.xl instances.

Overall we’re quite satisfied with OpenTSDB and appreciate the efforts put towards its development!

Our biggest concern at the moment is service performance degradation caused by queries on high cardinality metric paths ran over wide time spans (weeks or months). From a backend perspective such queries may return one row for every million rows scanned.

We are aware that these issues can be addressed by de-normalizing the metrics, doing rollups or using features in OpenTSDB 2.4 which replace those scans with gets, etc. In the meantime however, we need to protect the service in some way before offending metric paths are detected (e.g. by a heavy query).

An interesting observation was that when these queries happen, OpenTSDB sends RPCs to HBase that can take an unlimited amount of time to return. IIRC by default a minimum of 128 rows are required before the OpenTSDB scanner callback is even called.

So if the query is very selective on which rows it needs (e.g. by filtering on the highest cardinality tag, etc.), this could take many seconds or even minutes, keeping an HBase RPC Handler busy. Running a few of these simultaneously can immediately affect all request handling in HBase (we don’t split HBase RPC Queues between write and read requests), driving the service to a halt.

As per https://issues.apache.org/jira/browse/HBASE-13090, there is a way to limit long running scanners but our attempts to utilize it have been unsuccessful so far. We suspect the Async HBase client sends RPCs in a way that disables this functionality (e.g. heartbeats are disabled).

1. How can a limit be enforced on the running time of a scan RPC so that HBase RPC Handler queues do not get congested?
2. What’s the suggested approach to limit the negative effect of such workloads, i.e. queries on high-cardinality metrics for wide time spans, with a very selective filter?

Thanks!
Stan

ManOLamancha

unread,
May 22, 2018, 1:58:14 PM5/22/18
to OpenTSDB
I'll have to look into heartbeats for AsyncHBase. One way to limit is to lower the inflight queue limit http://opentsdb.github.io/asynchbase/docs/build/html/configuration.html so that there are fewer outstanding RPCs per region server. Also, definitely split the RPC queues as it's a huge improvement for this use case. There can still be bottlenecks in upstream of the split queue but for the most part this helped our situation.
 
2. What’s the suggested approach to limit the negative effect of such workloads, i.e. queries on high-cardinality metrics for wide time spans, with a very selective filter?

The best thing to do is implement a byte limit with 2.4 RC2. That's what we've done and it keeps the TSDs and HBase from falling over as it turns out that the majority of these high-cardinality queries will return a ton of data to the TSD and cause OOMing there.  

The main query issue we see is that HBase is handling the scan requests properly, but due to the single threaded consumer nature of AsyncHBase, the thread is tied up performing TSD processing (group by, serialization, etc after the data is fetched) so that other queries are waiting on data sitting in the TSD network receive queue. One solution to this I'd like to implement is to dynamically scale a client pool per region server in AsyncHBase to improve threading across queries hitting the same servers.

Thibault Godouet

unread,
May 25, 2018, 1:01:53 PM5/25/18
to ManOLamancha, OpenTSDB






1. How can a limit be enforced on the running time of a scan RPC so that HBase RPC Handler queues do not get congested?

I'll have to look into heartbeats for AsyncHBase. One way to limit is to lower the inflight queue limit http://opentsdb.github.io/asynchbase/docs/build/html/configuration.html so that there are fewer outstanding RPCs per region server. Also, definitely split the RPC queues as it's a huge improvement for this use case. There can still be bottlenecks in upstream of the split queue but for the most part this helped our situation.

Could you please clarify what you mean by 'split the RPC queues'?  Is it a setting in TSDB? HBase? 
E.g. point to relevant doc if any.

Thanks,
Thibault



ManOLamancha

unread,
May 25, 2018, 8:25:08 PM5/25/18
to OpenTSDB
On Friday, May 25, 2018 at 10:01:53 AM UTC-7, Thibault Godouet wrote:

Could you please clarify what you mean by 'split the RPC queues'?  Is it a setting in TSDB? HBase? 
E.g. point to relevant doc if any.

Thibault Godouet

unread,
May 26, 2018, 9:49:56 AM5/26/18
to OpenTSDB
Interesting, thanks!
What value of  hbase.ipc.server.callqueue.scan.ratio do you use then?  0.5?
Do you increase hbase.ipc.server.callqueue.handler.factor too?

If it can have a significant impact, could be a good one to add to http://opentsdb.net/docs/build/html/user_guide/tuning.html perhaps?

ManOLamancha

unread,
May 29, 2018, 4:16:13 PM5/29/18
to OpenTSDB
On Saturday, May 26, 2018 at 6:49:56 AM UTC-7, Thibault Godouet wrote:
Interesting, thanks!
What value of  hbase.ipc.server.callqueue.scan.ratio do you use then?  0.5?
Do you increase hbase.ipc.server.callqueue.handler.factor too?

If it can have a significant impact, could be a good one to add to http://opentsdb.net/docs/build/html/user_guide/tuning.html perhaps?

Updated. We set it to .6, just the scan ratio. 

stan.p....@gmail.com

unread,
Jun 5, 2018, 10:50:09 AM6/5/18
to OpenTSDB
Many thanks for getting back to us Chris! Splitting the queues is definitely worth considering, thanks for pointing that out.

There are a few more details we have uncovered in the meantime.

One of our approaches to alleviate the impact of highly selective queries that slowed down HBase request handling was to impose a time limit on RPCs to HBase (suggested as a best practice at https://issues.apache.org/jira/browse/HBASE-13090).

It turns out our attempts had failed because the HBase filtering functionality had a bug in the version we used. That prevented the time limit to be enforced properly when filtering - https://issues.apache.org/jira/browse/HBASE-19818.

This essentially means that RPC requests to HBase are only limited by number of rows returned (128 per roundtrip by default) or returned result size (2MB by default) when a row filter is also applied. In the case of selective queries neither would be hit for a long time, causing a request queue congestion at the Region Servers.

The suggested byte limit available in 2.4 RC2 sounds exciting! In some edge cases however, it would still be likely for the byte limit to be hit after a very long time, e.g. a high cardinality metric queried with a very selective filter, returning only a very small percentage of rows.

As a side note, in order to test the time limit on HBase calls we had to enable the heartbeats and handle partials flags in AsyncHBase client. Both are required for the time limit setting (hbase.client.scanner.timeout.period) to take effect. Could you advise on whether enabling partial results is unsupported and should be avoided? Is 2.4 or 3.x going to take advantage of that functionality?

We are also looking into implementing a caching layer in front of our fleet of OpenTSDB read-only nodes in order to have more control on data access patterns. Some inspiration has been drawn from projects like https://github.com/turn/splicer. We would ideally like to collaborate on that if there are already similar plans for the next versions of OpenTSDB.

Thanks,
Stan

ManOLamancha

unread,
Jun 5, 2018, 6:03:18 PM6/5/18
to OpenTSDB
On Tuesday, June 5, 2018 at 7:50:09 AM UTC-7, stan.p....@gmail.com wrote:
One of our approaches to alleviate the impact of highly selective queries that slowed down HBase request handling was to impose a time limit on RPCs to HBase (suggested as a best practice at https://issues.apache.org/jira/browse/HBASE-13090).

It turns out our attempts had failed because the HBase filtering functionality had a bug in the version we used. That prevented the time limit to be enforced properly when filtering - https://issues.apache.org/jira/browse/HBASE-19818.


Ug. There are also RPC timeouts and an Inflight Queue limitation in AsyncHBase (the inflight is only good on mutations right now) so those are also worth looking at.
 
This essentially means that RPC requests to HBase are only limited by number of rows returned (128 per roundtrip by default) or returned result size (2MB by default) when a row filter is also applied. In the case of selective queries neither would be hit for a long time, causing a request queue congestion at the Region Servers.


True, for these cases you could try lowering the row limit and it would result in more fetches for lower cardinality data but that may be a reasonable trade off.
 
The suggested byte limit available in 2.4 RC2 sounds exciting! In some edge cases however, it would still be likely for the byte limit to be hit after a very long time, e.g. a high cardinality metric queried with a very selective filter, returning only a very small percentage of rows.


2.4 also has overrides for byte limits that you can set with a regex on the metric string or pass in at query time. I still need to document that.
 
As a side note, in order to test the time limit on HBase calls we had to enable the heartbeats and handle partials flags in AsyncHBase client. Both are required for the time limit setting (hbase.client.scanner.timeout.period) to take effect. Could you advise on whether enabling partial results is unsupported and should be avoided? Is 2.4 or 3.x going to take advantage of that functionality?


Partials should be ok so if you want to submit PRs I'd be happy to pull them in.
 
We are also looking into implementing a caching layer in front of our fleet of OpenTSDB read-only nodes in order to have more control on data access patterns. Some inspiration has been drawn from projects like https://github.com/turn/splicer. We would ideally like to collaborate on that if there are already similar plans for the next versions of OpenTSDB.

That'd be great. I already have a dumb read-cache working with a built in LRU or Redis in 3.0. I have code that needs cleanup for a smarter time-sliced cache so I could use help getting that cleaned up for 3.0. And adding Splicer like behavior would be great in 3.0. My next step is a query throttling/routing layer in 3.x that I'll get to work on later this week.
Reply all
Reply to author
Forward
0 new messages