Hi,
First of all, the title may be a bit misleading. I'm not suggesting that OpenTSDB is acting weird under high load, the issue is probably on my system / infrastructure, but still, I don't understand what's going on, so here I am.
OpenTSDB version : 2.2
OS : Amazon Linux 2016.03
Infrastructure : AWS EMR
- master : 1 x m3.xlarge (4 CPU, 15GB RAM), running Namenode, ZK, Region Master and OpenTSDB
- slaves : 4 x m3.xlarge, running Datanode and Region Master
A note on the infrastructure : I know that this infrastructure is not appropriate for a correct Hadoop / HDFS cluster. One master, no redundant Zookeeper, only 4 datanodes, etc... The purpose of this build is just to test out OpenTSDB and see how it's behaving, the backup strategy, etc... before going on a "real" production infrastructure.
Context : I'm currently testing OpenTSDB, more specifically, I'm doing load-testing right now. I've setup a bunch of Locust (
http://locust.io), coded a small python script that generates a bunch of metrics that replicates what will be pushed in production, and launched everything to the OpenTSDB.
Using 4 locust process, I'm able to generate 1000 HTTT POST per secondes, each containing 200 metrics. I can see HBase indeed doing 200k inserts per seconds, loadbalanced on several nodes (I had pre-splitted the table and balanced all regions across all Hbase region servers available). CPU on the master is quite high, but manageable (between 2-3 load, generated mainly by OpenTSDB processes).
The issue is when I'm adding more load. By starting more locust process, the CPU load increases on the OpenTSDB (around 5 load, spikes to 6). The Locust producers are able to send the same throughput for a short amount of time (less than a minute), then the Locust throughput goes down, almost to a full stop (less than 10 POST / sec / process, when each process is capable of sending 250 POST / sec). However, the amount of writes on HBase is also very low. If I stop the producers, the CPU on OpenTSDB will stay high for several minutes (HBase write throughput still being very low), then suddendly, the load decreases. At this point, I'm able to restart Locust producers, and able to reach the previous high throughput.
Also, I can see that during almost the whole "high load" period, OpenTSDB was not able to write points to HBase. All grafana graphs shows a big "hole" during that period.
I'm not understanding what happenned during the very high load period. Here's some comments I have :
- OpenTSDB is CPU bound (it's clearly mentioned in the documentation). So, by generating too many POST, I've overloaded the system, leading to the high load (a load of more than 4 indicates that there are processes waiting to be processed on the master)
- So as I'm generating more query than OpenTSDB can handle, so I suppose it'll start to buffer. However, I don't know where it stores it. It seems that it's using its cache for this purpose (as stated in FAQ, question "What type of hardware should I run the TSDs on?") which should resides on tmpfs (so in RAM). I guess I have several questions here :
- Is the "tsd.http.cachedir" used to store transient data points ? If yes, it's the main "buffer" storage when OpenTSDB is under high load ?
- Can I use file located on a disk partition ? (I suppose it's possible, but not very clever, as disks will be very quickly I/O limited)
- How OpenTSDB is using his processes memory ? Is it using it to store transient point as well ? How many RAM can the OpenTSDB processes can take ?
- If OpenTSDB is caching data points when it's overloaded, why:
- the load stayed high after I stopped the Locust producers ? It could mean that it's working to empty out the buffer but...
- ... then why the write throughput stays very low on HBase and why I have lost all the points during the "high load period" ?
- Understanding how OpenTSDB react in case of high load would be very helpful. What I have understood so far :
- OpenTSDB begins to be under heavy load and start caching data points "somewhere". It's still able to write points currently being processed.
- Cache is being filled quickly and is finally full. OpenTSDB is then rejecting new inbound data points (hence the very poor throughput seen from the producers).
- However, OpenTSDB processes are still able to process points. so to me, they should be able to take some points out of the cache, process them and write them to Hbase. For example, if we are pushing 2 times more data points that OpenTSDB is able to handle, it should still be able to process half of the points, and reject the other half. However, what I see is that it's dropping all the points. Why ?
Any advice would be greatly appreciated.
Many thanks,
Guillaume