Tuning OpenTSDB for Performance Benchmark

1,255 views
Skip to first unread message

ma...@thomas-goldschmidt.de

unread,
Oct 8, 2013, 4:45:21 AM10/8/13
to open...@googlegroups.com, anton....@se.abb.com, heiko.k...@de.abb.com, hongyu.pe...@se.abb.com, thomas.go...@de.abb.com

Dear developer(s),

 

We are researchers for ABB corporate research and have an interest in your software for time-series data storage. Currently, we are busy setting up a large performance benchmark of 3 different open-source time-series data storage. These are KairosDB, Databus and OpenTSDB. We intend to publish the results at an academic conference. We would like to have your input on how to best tune your software for our tests.

 

We plan to run tests for two different workload profiles:

 

The first workload profile is taken from and advanced metering infrastructure (AMI) for smart grids. Such a system

collects energy usage from smart meters installed in customer homes. The system stores and analyzes the data in order to optimize energy production and distribution and to prevent outtakes. The concrete scenario requires to store e.g., up to 1,000,000 meter readings in 15 minute intervals in a 2 minute time window. Each meter reading includes a 32-bit float representing the energy consumed thus far. This workload mainly tests the ability of the technology to deal with high peak demands. We plan to simulate the data coming from 5 to 10 concentrator nodes resulting in the following load characteristics to which we will gradually scale up the tests:  5-20 users, each 50.0000 data points pushed within a time window of 2 minutes. Run each 15 minutes.

 

The second workload profiles comes from electrical power engineering in the context of Wide Area Measurements Systems (WANS), which employs Phasor Measurement Units (PMU) to measure the electrical waves of a power grid. The use of GPS receivers allows for time synchronization of individual PMUs, thereby offering synchronized real-time measurements of multiple remote measurement points on the grid. Each PMU has fourteen analog and eight digital signals. Every second, a PMU uploads 50 samples of each of these signals. In our scenario, we have up to 3000 PMUs sending their data to the cloud-based monitoring solution, which is similar to a deployment for a power grid of a country. This workload mainly tests the throughput ability of the evaluated technologies.

 

We want o test the above write workloads with different cluster sizes; 5, 10, 20, 40 nodes. For each setup we want identify the maximum number of PMUs or meters the system can sustain. Furthermore, we also have two read profiles we want to test. The first read workload retrieves the history of a single PMU for the last 10 minutes (random selected PMU). We want to measure the maximum number of queries/s with a maximum response time of 500ms. The second read work load consists of reading all PMU data at two specific time points, which are 20ms apart (random selected time). The measure is the same as in the first read scenario.

 

Finally we want to combine the write and read workloads. We use a „typical“ relation between read + write for the workload e.g., 100 PMU: 1 query/s.  We want to measure the maximum number of PMUs until either write fails or the query response time is larger than 500ms.

 

If you have any more questions feel free to ask.

 

Best regards,

Anton Jansen, Thomas Goldschmidt, Heiko Koziolek, Hongyu Pei Breivold 

ManOLamancha

unread,
Oct 8, 2013, 9:07:56 PM10/8/13
to open...@googlegroups.com, anton....@se.abb.com, heiko.k...@de.abb.com, hongyu.pe...@se.abb.com, thomas.go...@de.abb.com
On Tuesday, October 8, 2013 4:45:21 AM UTC-4, ma...@thomas-goldschmidt.de wrote:

Dear developer(s),

 

We are researchers for ABB corporate research and have an interest in your software for time-series data storage. Currently, we are busy setting up a large performance benchmark of 3 different open-source time-series data storage. These are KairosDB, Databus and OpenTSDB. We intend to publish the results at an academic conference. We would like to have your input on how to best tune your software for our tests.


That sounds great and we'd love to see the results. Just curious, by "Databus" are you referring to LinkedIn's database streaming project? Databus is not a time series data storage system in and of itself, rather it just watches an existing DB for changes. Or is there some system out there using Databus as a source for time series data?

Some general notes for your OpenTSDB testing:

- Make sure to pre-assign UIDs to your metrics and tag name/values before you start pushing data points. Assignments will slow down your initial write throughput whereas if the UIDs are already assigned (and cached in a running TSD) the writes will be as quick as possible.
- You may want to compare using the Telnet interface for writing data vs the HTTP interface. Telnet doesn't have the overhead of HTTP but you can batch multiple data points in a single HTTP. I don't think anyone has really compared the throughput yet.
- You'll want a good sized HBase cluster if you're going to be testing into the millions of writes per second. This includes a dedicated name-node for HDFS/zookeeper/hbase master and a number of region servers (probably start at 10). You'll want physical hardware as opposed to VMs.
- You may want to run OpenTSDB on dedicated servers as well and to get to the millions of writes, you'll likely want to use a load balancer in front of them such as Varnish.
- You will also need to pre-split your regions for maximum performance. If you start with a fresh table, all of the writes will be sent to a single region on a single server and that will look like your cluster isn't doing anything and throughput will be poor.

 We plan to run tests for two different workload profiles:

 

The first workload profile is taken from and advanced metering infrastructure (AMI) for smart grids. Such a system

collects energy usage from smart meters installed in customer homes. The system stores and analyzes the data in order to optimize energy production and distribution and to prevent outtakes. The concrete scenario requires to store e.g., up to 1,000,000 meter readings in 15 minute intervals in a 2 minute time window. Each meter reading includes a 32-bit float representing the energy consumed thus far. This workload mainly tests the ability of the technology to deal with high peak demands. We plan to simulate the data coming from 5 to 10 concentrator nodes resulting in the following load characteristics to which we will gradually scale up the tests:  5-20 users, each 50.0000 data points pushed within a time window of 2 minutes. Run each 15 minutes.


What do you mean by "50.0000"? Do you mean each user (5 to 20 users) pushes 500,000 data points within 2 minutes for up to 10M data points in 2 minutes?

Naming and pre-splitting will be a little tricky for this test. Would you have a single metric such as "meter.kwh" for each customer and then 1M customers? If you setup a naming schema where each customer is a tag value, such as "meter.kwh customer=id1" and "meter.kwh customer=id2", you won't be able to pre-split the regions easily since the row key begins with the metric ID, which would always be the same in this case. Instead you may want to flip the schema (as some of our users have) where you declare the "metric" as the customer ID and a tag with the value recorded such as "customer_1 metric=meter.kwh". This way you will have a unique metric ID assigned to each customer and you can easily pre-split the regions for maximum initial throughput.
 

 The second workload profiles comes from electrical power engineering in the context of Wide Area Measurements Systems (WANS), which employs Phasor Measurement Units (PMU) to measure the electrical waves of a power grid. The use of GPS receivers allows for time synchronization of individual PMUs, thereby offering synchronized real-time measurements of multiple remote measurement points on the grid. Each PMU has fourteen analog and eight digital signals. Every second, a PMU uploads 50 samples of each of these signals. In our scenario, we have up to 3000 PMUs sending their data to the cloud-based monitoring solution, which is similar to a deployment for a power grid of a country. This workload mainly tests the throughput ability of the evaluated technologies.


Naming for this example can be a bit more straight forward as you'll have 22 metrics that you can pre-split on, e.g. "pmu.phase1.frequency" or "pmu.phase1.current" with tags indicating the pmu e.g. "pmu=pmu1" and "pmu=pmu2". So for 3K pdus * 22 metrics * 50 samples per second you would be looking at a throughput of 3.3M data points per second and that should be pretty easy for a few TSDs and a solid HBase cluster. Note that OpenTSDB and KairosDB (afaik) only support millisecond resolution, not micro second as it seems the PMUs supply (according to the great wikipedia).
 

 We want o test the above write workloads with different cluster sizes; 5, 10, 20, 40 nodes.


By cluster size are you referring to the number of servers for the backend (HBase or Cassandra)? You'll also want to try different numbers of front-ends, e.g. 1 TSD, 2 TSDs as these may be the primary bottleneck even with a 5 node backend cluster.
 

For each setup we want identify the maximum number of PMUs or meters the system can sustain. Furthermore, we also have two read profiles we want to test. The first read workload retrieves the history of a single PMU for the last 10 minutes (random selected PMU). We want to measure the maximum number of queries/s with a maximum response time of 500ms. The second read work load consists of reading all PMU data at two specific time points, which are 20ms apart (random selected time). The measure is the same as in the first read scenario.


These should be pretty fast as the data should be in HBase's cache (assuming the region servers have a decent amount of RAM) after writing. You may want to consider writing the data, shutdown all daemons and backends, restarting, then reading to get a worst-case scenario where data must be read from disk.

 Finally we want to combine the write and read workloads. We use a „typical“ relation between read + write for the workload e.g., 100 PMU: 1 query/s.  We want to measure the maximum number of PMUs until either write fails or the query response time is larger than 500ms.


For your queries, are you going to be reading the data for a specific metric and a specific PDU? e.g. "return pmu.phase1.frequency for the last 10 minutes where pdu=pdu1"? Or are you interested in aggregated queries such as "return the average of pmu.phase1.frequency for the last 10 minutes for all pdus"? The data naming schema affects the query speed.

 

If you have any more questions feel free to ask.


 Thanks for your interest in OpenTSDB and we'll be happy to help you tune!

Sai Kiran Kanuri

unread,
Oct 9, 2013, 5:45:32 AM10/9/13
to open...@googlegroups.com, anton....@se.abb.com, heiko.k...@de.abb.com, hongyu.pe...@se.abb.com, thomas.go...@de.abb.com


On Wednesday, 9 October 2013 06:37:56 UTC+5:30, ManOLamancha wrote:
On Tuesday, October 8, 2013 4:45:21 AM UTC-4, ma...@thomas-goldschmidt.de wrote:

Dear developer(s),


- You will also need to pre-split your regions for maximum performance. If you start with a fresh table, all of the writes will be sent to a single region on a single server and that will look like your cluster isn't doing anything and throughput will be poor.
 

My understanding was that opentsdb schema specifically works around this.  Using the metrics name as the start of key offsets the region loading issues that normally come with time series based data. Do we  still need to do this?

- Kiran

ManOLamancha

unread,
Oct 9, 2013, 12:24:25 PM10/9/13
to open...@googlegroups.com, anton....@se.abb.com, heiko.k...@de.abb.com, hongyu.pe...@se.abb.com, thomas.go...@de.abb.com

The schema does allow for good distribution amongst regions, however HBase doesn't store data based on a hash of the key like Cassandra does, rather it uses the key directly to determine which region should receive the data. When you create a new table in HBase only one region is created. That one region is responsible for the entire key space. A region can only be hosted by a single server, so writing data to a fresh table, regardless of the schema used, will write to that single region and server, creating a bottleneck. When the region fills up (1GB by default in the latest HBase releases) the region will split and another server will host the second region and write throughput will improve since you now have two servers to send data to. So pre-splitting is critical to achieve the highest throughput in an HBase cluster for a fresh table. You want to split the keyspace up so that each server winds up hosting at least one region and the writes are distributed evenly amongst servers. Does that help?

Christophe Salperwyck

unread,
May 5, 2015, 6:03:51 AM5/5/15
to open...@googlegroups.com, heiko.k...@de.abb.com, hongyu.pe...@se.abb.com, thomas.go...@de.abb.com, anton....@se.abb.com
Hi Thomas,

Did you finish your benchmark? Would it be possible to share the results?

Thanks a lot in advance!
Christophe
Message has been deleted

Nicolas

unread,
Jul 22, 2016, 4:40:32 AM7/22/16
to OpenTSDB, heiko.k...@de.abb.com, hongyu.pe...@se.abb.com, thomas.go...@de.abb.com, anton....@se.abb.com

Christophe S

unread,
Sep 7, 2016, 4:59:39 AM9/7/16
to OpenTSDB, heiko.k...@de.abb.com, hongyu.pe...@se.abb.com, thomas.go...@de.abb.com, anton....@se.abb.com
Hi all,

About benchmarking, I found this blog:

OpenTSDB ranks 5.

And the results (OpenTSDB not really tested for performances)

If someone wants to add the benchmark with their workload :-).

Lots of information but need to be rechecked for some I guess.

IMO I guess this is not fully fair from the consistency/replication point of view and back-end storage.

Christophe


On Friday, July 22, 2016 at 10:40:32 AM UTC+2, Nicolas wrote:

ManOLamancha

unread,
Dec 19, 2016, 8:50:56 PM12/19/16
to OpenTSDB, heiko.k...@de.abb.com, hongyu.pe...@se.abb.com, thomas.go...@de.abb.com, anton....@se.abb.com
On Wednesday, September 7, 2016 at 1:59:39 AM UTC-7, Christophe S wrote:
Hi all,

About benchmarking, I found this blog:

OpenTSDB ranks 5.

And the results (OpenTSDB not really tested for performances)

If someone wants to add the benchmark with their workload :-).

Lots of information but need to be rechecked for some I guess.

IMO I guess this is not fully fair from the consistency/replication point of view and back-end storage.

Christophe

Thanks for sharing! We also started a bit of work on YCSB! core for benchmarking time series workloads in a common format. (Trying to get the InfluxDB folks onboard too). If anyone wants to help out ping me/us. 
Reply all
Reply to author
Forward
0 new messages