Opentsdb Heap Size Recommendations

146 views
Skip to first unread message

jagmeet bali

unread,
Sep 23, 2016, 7:18:06 AM9/23/16
to open...@googlegroups.com
We are running opentsdb since last 2 years.
Our setup is currently ingesting 1.5M/s and the opentsdb read boxes
are handling well over
20K requests per second.

The issue is currently we are using heap sizes of 42GB for read
cluster, which seems way too large for a opentsdb box handling only
reads. Each box handles around 1K requests per second.

Is there a benchmark or doc which I can refer for tuning my setup and
possibly reduce my heap size.

Thanks

Jonathan Creasy

unread,
Sep 23, 2016, 10:41:58 AM9/23/16
to jagmeet bali, OpenTSDB

That does seem way too large. I assume your read nodes are in ro mode?

jagmeet bali

unread,
Sep 23, 2016, 10:49:42 AM9/23/16
to Jonathan Creasy, OpenTSDB
They are in ro mode only.

Jonathan Creasy

unread,
Sep 26, 2016, 12:08:57 PM9/26/16
to jagmeet bali, OpenTSDB

So, I run reads in a little different way. I tend to have a usage pattern where there are lots of parallel requests. Because of this, my latest infrastructure has 5x docker containers on each data node with 4GB of heap and HAProxy balancing requests across them.

This gives me 40 TSDB read instances. At around 1,200rps the read nodes are mostly idle in terms of resource usage.

This cluster does not yet have it but Turn built Splicer which is a query tool for OpenTSDB. It will shard queries into 1 hour blocks and cache the result blocks in Redis. This is great for installations where most of the traffic is dashboards on TV monitors. It will also send the queries to the data node where the region holding the metric being queried is located. This greatly improves read performance.

ManOLamancha

unread,
Dec 19, 2016, 8:53:32 PM12/19/16
to OpenTSDB
1k is pretty good actually but yeah, 42G is huge. We run ours with 16G. Could you maybe send over some heap-dumps of your readers? 

Right now one huge issue we have is the UID cache in that it's not sized. I'm working on the LRU code right now that will help that. 
Reply all
Reply to author
Forward
0 new messages