Graphite writer aggregated data feature request

73 views
Skip to first unread message

Daniel Moll

unread,
Dec 1, 2014, 2:33:43 PM12/1/14
to gat...@googlegroups.com
Hi Gatling team,

I am using Graphite to store both load related metrics from Gatling and SUT resource metrics in one place. From there I use the Graphite REST API to present my load test results in (real time) graph dashboards, benchmark metrics between different build and organize my test runs per project.

I found the 1 second retention Gatling requires leads to some trouble in Graphite in my use case:

- Opening 48 hours load test results lead to very slow queries (which makes sense considering the number of datapoints) 
- Retaining datapoints in this granularity requires a lot of storage space.

To tackle these issues I decided to implement a carbon-aggregator process that aggregates the incoming Gatling data per 10 seconds using these aggregation rules:

gatling2.<simulation>.users.<scenario>.active (10) = avg gatling2.<simulation>.users.<scenario>.active
gatling2.<simulation>.users.<scenario>.done (10) = avg gatling2.<simulation>.users.<scenario>.done
gatling2.<simulation>.users.<scenario>.waiting (10) = avg gatling2.<simulation>.users.<scenario>.waiting
gatling2.<simulation>.<request>.<status>.count (10) = sum gatling2.<simulation>.<request>.<status>.count
gatling2.<simulation>.<request>.<status>.max (10) = avg gatling2.<simulation>.<request>.<status>.max
gatling2.<simulation>.<request>.<status>.min (10) = avg gatling2.<simulation>.<request>.<status>.min
gatling2.<simulation>.<request>.<status>.percentiles95 (10) = avg gatling2.<simulation>.<request>.<status>.percentiles95
gatling2.<simulation>.<request>.<status>.percentiles99 (10) = avg gatling2.<simulation>.<request>.<status>.percentiles99
 
This solved the issues of the slow queries and storage space but introduced some new issues:

- Using the "avg" aggregation method for min, max and percentile metrics does not forward the correct values but an average over 10 seconds of these datapoints instead.
- It seems that if, for some reason (network latency, lost packets?), requests from Gatling are not reaching the carbon-aggregator (in time), this results in "hiccups" in the "count" metrics. My "transactions per second" graph will then show a dip in throughput, that I can't correlate to any SUT resource graph or transaction response times.

What would be really convenient for my use case is if Gatling would have the option to configure a 10 second Graphite push interval and would aggregate the metric data accordingly before sending it. Would you consider implementing such a feature?


Cheers


Daniel

Reply all
Reply to author
Forward
0 new messages