When running opentsdb 2.1 (6 instances, 16GB RAM each) and publishing data as fast as we can make it (goal 4mil datapoints/sec) we see that opentsdb will accept put requests FAR faster than it can dish them to HBase. This results in opentsdb hitting a lot of GC pauses, and in extreme cases completely hanging.
We end up getting a lot of trace like:
2016-02-18T15:04:12.935-0500: 584.046: [Full GC (Allocation Failure) 15G->14G(16G), 36.6974305 secs]
[Eden: 0.0B(816.0M)->0.0B(816.0M) Survivors: 0.0B->0.0B Heap: 16.0G(16.0G)->14.1G(16.0G)], [Metaspace: 17737K->17737K(1064960K)]
[Times: user=41.64 sys=9.42, real=36.70 secs]
What I see is when publishing as fast as possible, we start up with 2-2.5 million datapoints per second then quickly fall down to ~500k/sec and eventually trail off to 0-5k/sec. However, if we rate limit to 1mil/sec, we can sustain this without hitting these GC cycles.
Is this just a symptom of the asynchronous processing in OpenTSDB? Is there a setting to apply backpressure to the publishers? (Consistent 1mil/second is better than falling over TSDB instances).
I notice that 2.2 has a synchronous call that can be made, which may be a bit overkill but would likely prevent these issues.
Are there any recommendations to alleviate this problem (aside from artificial rate limiting)? More TSD instances?
When running opentsdb 2.1 (6 instances, 16GB RAM each) and publishing data as fast as we can make it (goal 4mil datapoints/sec) we see that opentsdb will accept put requests FAR faster than it can dish them to HBase. This results in opentsdb hitting a lot of GC pauses, and in extreme cases completely hanging.
We end up getting a lot of trace like:
2016-02-18T15:04:12.935-0500: 584.046: [Full GC (Allocation Failure) 15G->14G(16G), 36.6974305 secs]
[Eden: 0.0B(816.0M)->0.0B(816.0M) Survivors: 0.0B->0.0B Heap: 16.0G(16.0G)->14.1G(16.0G)], [Metaspace: 17737K->17737K(1064960K)]
[Times: user=41.64 sys=9.42, real=36.70 secs]
What I see is when publishing as fast as possible, we start up with 2-2.5 million datapoints per second then quickly fall down to ~500k/sec and eventually trail off to 0-5k/sec. However, if we rate limit to 1mil/sec, we can sustain this without hitting these GC cycles.
Is this just a symptom of the asynchronous processing in OpenTSDB? Is there a setting to apply backpressure to the publishers? (Consistent 1mil/second is better than falling over TSDB instances).
I notice that 2.2 has a synchronous call that can be made, which may be a bit overkill but would likely prevent these issues.
Are there any recommendations to alleviate this problem (aside from artificial rate limiting)? More TSD instances?
Hi, co-worker of OP. We upgraded to 2.2 and are testing that out.
We dropped the high_watermark to 100 and the low to 50. We aren't seeing any logging in the logs and I'm not seeing any error from HTTP or PleaseThrottle responses (unless they are 200 response codes?). Should we investigate dropping the low/high watermarks down further?
WRT pre-splitting, we do that (much easier with the salting added in 2.2!). We are currently diving in and trying to reduce the compactions in HBase, but we know there will always be region-splits and compactions, so we would like to know how to handle them. We can easily add flow control logic to our publishers, we just aren't seeing those PleaseThrottles come back from OpenTSDB.
On Tuesday, February 23, 2016 at 11:35:36 AM UTC-8, Sean Hanson wrote:Hi, co-worker of OP. We upgraded to 2.2 and are testing that out.
We dropped the high_watermark to 100 and the low to 50. We aren't seeing any logging in the logs and I'm not seeing any error from HTTP or PleaseThrottle responses (unless they are 200 response codes?). Should we investigate dropping the low/high watermarks down further?Hmm, try tracking the stat "tsd.hbase.nsre" and see if it's climbing. If so then try dropping the HWM to 5 or so. And if it's still GCing like crazy, could you grab and post a jmap histo please? Then we can see what's eating up the heap.
WRT pre-splitting, we do that (much easier with the salting added in 2.2!). We are currently diving in and trying to reduce the compactions in HBase, but we know there will always be region-splits and compactions, so we would like to know how to handle them. We can easily add flow control logic to our publishers, we just aren't seeing those PleaseThrottles come back from OpenTSDB.That's a bummer that those exceptions aren't popping back. And you are writing over HTTP and getting 204's back? Another thing you *could* try would be to tune the number of data points sent per HTTP request. If you're only sending one metric per HTTP call, that would definitely create a lot of garbage. I had good luck with 50 per call but you could try more or less.
Yeah we are pushing about 512 per call so that may be causing some of the issue here. We are writing over http also.
With those 40 tsd's how many region servers are you running?
How is the off-heap caching done? Also with 40 tsds you only have a certain set of regionservers running the tsd process?
ahhh makes sense. Are your TSD's behind something like haproxy or another load balancing method?
Here is a small snippet:
Process 1num #instances #bytes class name----------------------------------------------1: 77520146 6015881344 [B2: 36527144 3214388672 org.hbase.async.PutRequest3: 73054292 1753304576 [[B4: 36564608 1170067456 com.stumbleupon.async.Deferred5: 31000743 1095850256 [C6: 31001472 744035328 java.lang.String
Process 2
num #instances #bytes class name [1091/29800]----------------------------------------------1: 55672484 4327518680 [B2: 26233095 2308512360 org.hbase.async.PutRequest3: 23078059 1271840816 [C4: 52466169 1259189592 [[B5: 26258830 840282560 com.stumbleupon.async.Deferred6: 22522620 540542880 java.lang.String
On Tuesday, March 1, 2016 at 1:34:54 PM UTC-8, Anthony Caiafa wrote:Here is a small snippet:
Process 1num #instances #bytes class name----------------------------------------------1: 77520146 6015881344 [B2: 36527144 3214388672 org.hbase.async.PutRequest3: 73054292 1753304576 [[B4: 36564608 1170067456 com.stumbleupon.async.Deferred5: 31000743 1095850256 [C6: 31001472 744035328 java.lang.String
Process 2num #instances #bytes class name [1091/29800]----------------------------------------------1: 55672484 4327518680 [B2: 26233095 2308512360 org.hbase.async.PutRequest3: 23078059 1271840816 [C4: 52466169 1259189592 [[B5: 26258830 840282560 com.stumbleupon.async.Deferred6: 22522620 540542880 java.lang.StringDid you run jmap with the "live" flag or do these include the non-live objects? Either way it looks like you do have a backlog of PutRequests (20 to 30M) so we need to tune something there.
Another thing to try is to set an inflight limit via "hbase.region_client.inflight_limit". I forgot this isn't set in the open source branch. Try tuning that and you should start seeing the please throttles.
Did you run jmap with the "live" flag or do these include the non-live objects? Either way it looks like you do have a backlog of PutRequests (20 to 30M) so we need to tune something there.I ran jmap -histo processid against the process when it filled the heap. So yeah 20 to 30M put requests in the backlog. Is there any reason these puts are piling up so much? I am sending about 1.5m/s with about 100 points in each put. What do you think is backing this up so much? Right now for this benchmarking i am doing 2 tsd's per region server and 9 region servers.
From taking a look at the flushQueue on hbase it seems that at times it will spike to 300 Quadr. Which seems a bit crazy to me.