Dear developer(s),
We are researchers for ABB corporate research and have an interest in your software for time-series data storage. Currently, we are busy setting up a large performance benchmark of 3 different open-source time-series data storage. These are KairosDB, Databus and OpenTSDB. We intend to publish the results at an academic conference. We would like to have your input on how to best tune your software for our tests.
We plan to run tests for two different workload profiles:
The first workload profile is taken from and advanced metering infrastructure (AMI) for smart grids. Such a system
collects energy usage from smart meters installed in customer homes. The system stores and analyzes the data in order to optimize energy production and distribution and to prevent outtakes. The concrete scenario requires to store e.g., up to 1,000,000 meter readings in 15 minute intervals in a 2 minute time window. Each meter reading includes a 32-bit float representing the energy consumed thus far. This workload mainly tests the ability of the technology to deal with high peak demands. We plan to simulate the data coming from 5 to 10 concentrator nodes resulting in the following load characteristics to which we will gradually scale up the tests: 5-20 users, each 50.0000 data points pushed within a time window of 2 minutes. Run each 15 minutes.
The second workload profiles comes from electrical power engineering in the context of Wide Area Measurements Systems (WANS), which employs Phasor Measurement Units (PMU) to measure the electrical waves of a power grid. The use of GPS receivers allows for time synchronization of individual PMUs, thereby offering synchronized real-time measurements of multiple remote measurement points on the grid. Each PMU has fourteen analog and eight digital signals. Every second, a PMU uploads 50 samples of each of these signals. In our scenario, we have up to 3000 PMUs sending their data to the cloud-based monitoring solution, which is similar to a deployment for a power grid of a country. This workload mainly tests the throughput ability of the evaluated technologies.
We want o test the above write workloads with different cluster sizes; 5, 10, 20, 40 nodes. For each setup we want identify the maximum number of PMUs or meters the system can sustain. Furthermore, we also have two read profiles we want to test. The first read workload retrieves the history of a single PMU for the last 10 minutes (random selected PMU). We want to measure the maximum number of queries/s with a maximum response time of 500ms. The second read work load consists of reading all PMU data at two specific time points, which are 20ms apart (random selected time). The measure is the same as in the first read scenario.
Finally we want to combine the write and read workloads. We use a „typical“ relation between read + write for the workload e.g., 100 PMU: 1 query/s. We want to measure the maximum number of PMUs until either write fails or the query response time is larger than 500ms.
If you have any more questions feel free to ask.
Best regards,
Anton Jansen, Thomas Goldschmidt, Heiko Koziolek, Hongyu Pei Breivold
Dear developer(s),
We are researchers for ABB corporate research and have an interest in your software for time-series data storage. Currently, we are busy setting up a large performance benchmark of 3 different open-source time-series data storage. These are KairosDB, Databus and OpenTSDB. We intend to publish the results at an academic conference. We would like to have your input on how to best tune your software for our tests.
We plan to run tests for two different workload profiles:
The first workload profile is taken from and advanced metering infrastructure (AMI) for smart grids. Such a system
collects energy usage from smart meters installed in customer homes. The system stores and analyzes the data in order to optimize energy production and distribution and to prevent outtakes. The concrete scenario requires to store e.g., up to 1,000,000 meter readings in 15 minute intervals in a 2 minute time window. Each meter reading includes a 32-bit float representing the energy consumed thus far. This workload mainly tests the ability of the technology to deal with high peak demands. We plan to simulate the data coming from 5 to 10 concentrator nodes resulting in the following load characteristics to which we will gradually scale up the tests: 5-20 users, each 50.0000 data points pushed within a time window of 2 minutes. Run each 15 minutes.
The second workload profiles comes from electrical power engineering in the context of Wide Area Measurements Systems (WANS), which employs Phasor Measurement Units (PMU) to measure the electrical waves of a power grid. The use of GPS receivers allows for time synchronization of individual PMUs, thereby offering synchronized real-time measurements of multiple remote measurement points on the grid. Each PMU has fourteen analog and eight digital signals. Every second, a PMU uploads 50 samples of each of these signals. In our scenario, we have up to 3000 PMUs sending their data to the cloud-based monitoring solution, which is similar to a deployment for a power grid of a country. This workload mainly tests the throughput ability of the evaluated technologies.
We want o test the above write workloads with different cluster sizes; 5, 10, 20, 40 nodes.
For each setup we want identify the maximum number of PMUs or meters the system can sustain. Furthermore, we also have two read profiles we want to test. The first read workload retrieves the history of a single PMU for the last 10 minutes (random selected PMU). We want to measure the maximum number of queries/s with a maximum response time of 500ms. The second read work load consists of reading all PMU data at two specific time points, which are 20ms apart (random selected time). The measure is the same as in the first read scenario.
Finally we want to combine the write and read workloads. We use a „typical“ relation between read + write for the workload e.g., 100 PMU: 1 query/s. We want to measure the maximum number of PMUs until either write fails or the query response time is larger than 500ms.
If you have any more questions feel free to ask.
On Tuesday, October 8, 2013 4:45:21 AM UTC-4, ma...@thomas-goldschmidt.de wrote:
Dear developer(s),
- You will also need to pre-split your regions for maximum performance. If you start with a fresh table, all of the writes will be sent to a single region on a single server and that will look like your cluster isn't doing anything and throughput will be poor.
Hi all,About benchmarking, I found this blog:OpenTSDB ranks 5.And the results (OpenTSDB not really tested for performances)If someone wants to add the benchmark with their workload :-).Lots of information but need to be rechecked for some I guess.IMO I guess this is not fully fair from the consistency/replication point of view and back-end storage.Christophe