Plotting latency/jitter data

848 views
Skip to first unread message

Michael Mattoss

unread,
Aug 31, 2015, 12:50:42 PM8/31/15
to mechanical-sympathy
Hi,

I would like to collect latency data from our system in the form of <timestamp, latency value> and later create a chart for further analysis.
However, there are 10M+ data points and I can't seem to find a plotting software which can handle that much data (e.g. Excel is limited to 2^20=1048576 rows).
Just to be clear, I'm not talking about a latency curve chart (percentiles) but rather something similar to the attached image.
I was wondering if someone here knows of such software (either open source or commercial), preferably with an option to zoom in/out.

Thank you!
Michael





Wojciech Kudla

unread,
Aug 31, 2015, 1:01:53 PM8/31/15
to mechanical-sympathy

For up to 20 mio data points R + ggplot2 does the trick for me. If you're after extracting a more general profile over time series than point plot + alpha turns out to be useful.
Beyond 20 mio datapoints ggplot2 takes too long to render for my liking.
I know some people use matlab, but that's commercial software.


--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Anton Lebedevich

unread,
Aug 31, 2015, 2:29:46 PM8/31/15
to mechanica...@googlegroups.com
There are not enough pixels on a screen to draw 10m points so they'll overlap.
As an example there are about 120k black points on this graph http://mabrek.github.io/anomaly-detection-devops2013/graph/two-weeks.png  (2 weeks of data with 10s resolution) and it's still too dense to understand what's really going on. It might be better to split data into short intervals (like 1 second) and then calculate min/median/percentilles for each interval and get something like this http://mabrek.github.io/img/multivariate/latencies.png

R + ggplot2 + xts is good for such visualizations, there is a set of recipes to deal with oveplotting (too many points on graph) at https://rpubs.com/hadley/ggplot2-toolbox

Regards,
Anton Lebedevich
http://mabrek.github.io/

Ross Bencina

unread,
Sep 1, 2015, 2:24:33 AM9/1/15
to mechanica...@googlegroups.com
Maybe try numpy+matplotlib. I can't guarantee that it will do it, but
10M doesn't sound like too many. I've certainly plotted large data sets
with it.

Ross.

Alex Bagehot

unread,
Sep 1, 2015, 2:26:59 AM9/1/15
to mechanica...@googlegroups.com
Heatmaps can efficiently communicate latency distribution over time. This should be complementary to percentile line plots.


Epoch has a heatmap implementation. There's likely a few commercial solutions.
Else you can roll your own with R (possibly) or d3. 

Dygraphs can plot large data sets. May be worth a look also.

If high 9's are important use log scale and highlight low counts with a different colour scale.

Either way it will likely be useful to summarise the data in for example hdrhistogram then have one histogram per time interval to dump the data into a format the the chart library can plot.

Thanks
Alex

Wojciech Kudla

unread,
Sep 1, 2015, 3:52:10 AM9/1/15
to mechanical-sympathy


> There are not enough pixels on a screen to draw 10m points so they'll overlap.


Correct. That's when alpha becomes really handy. You will loose the visibility of the outliers though, unless alpha is a function of absolute value.

>
> Regards,
>
> Anton Lebedevich
> http://mabrek.github.io/
>
> On Mon, Aug 31, 2015 at 8:01 PM, Wojciech Kudla <wojciec...@gmail.com> wrote:


>>
>> For up to 20 mio data points R + ggplot2 does the trick for me. If you're after extracting a more general profile over time series than point plot + alpha turns out to be useful.
>> Beyond 20 mio datapoints ggplot2 takes too long to render for my liking.
>> I know some people use matlab, but that's commercial software.
>>
>> On Mon, 31 Aug 2015 17:50 Michael Mattoss <michael...@gmail.com> wrote:


>>>
>>> Hi,
>>>
>>> I would like to collect latency data from our system in the form of <timestamp, latency value> and later create a chart for further analysis.
>>>
>>> However, there are 10M+ data points and I can't seem to find a plotting software which can handle that much data (e.g. Excel is limited to 2^20=1048576 rows).
>>>
>>> Just to be clear, I'm not talking about a latency curve chart (percentiles) but rather something similar to the attached image.
>>>
>>> I was wondering if someone here knows of such software (either open source or commercial), preferably with an option to zoom in/out.
>>>
>>> Thank you!
>>>
>>> Michael
>>>
>>>
>>>
>>>

>>> --
>>> You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.

Raphael Luta

unread,
Sep 1, 2015, 4:01:55 AM9/1/15
to mechanica...@googlegroups.com
You can use something like OpenTSDB or InfluxDB to store your points and you can use their query with downsampling capabilities to plot a manageable number of points on larger time ranges while keeping the ability to zoom in at full resolution on anomaly areas.

I’ve successfully used OpenTSDB + Grafana for similar use cases.

— raphael

> Le 31 août 2015 à 19:01, Wojciech Kudla <wojciec...@gmail.com> a écrit :
>
> For up to 20 mio data points R + ggplot2 does the trick for me. If you're after extracting a more general profile over time series than point plot + alpha turns out to be useful.
> Beyond 20 mio datapoints ggplot2 takes too long to render for my liking.
> I know some people use matlab, but that's commercial software.
>
> On Mon, 31 Aug 2015 17:50 Michael Mattoss <michael...@gmail.com> wrote:
> Hi,
>
> I would like to collect latency data from our system in the form of <timestamp, latency value> and later create a chart for further analysis.
> However, there are 10M+ data points and I can't seem to find a plotting software which can handle that much data (e.g. Excel is limited to 2^20=1048576 rows).
> Just to be clear, I'm not talking about a latency curve chart (percentiles) but rather something similar to the attached image.
> I was wondering if someone here knows of such software (either open source or commercial), preferably with an option to zoom in/out.
>
> Thank you!
> Michael
>
>
>
>
>
>
>
>
signature.asc

Michael Mattoss

unread,
Sep 1, 2015, 5:03:26 AM9/1/15
to mechanical-sympathy
I would like to thank everyone for their great input and suggestions!
I will review them to see which one suits my needs best.

Thanks again!
Michael

Kyle Kavanagh

unread,
Sep 4, 2015, 8:40:08 PM9/4/15
to mechanical-sympathy
If your system is supposed to have a (roughly) constant service time, a really nice way to locate jitter in a latency dataset is to calculate the latency difference between each message. 

Any message that shows an increase in latency from the previous message greater than the app's service time could be classified as an abnormal latency delta. If the app is slammed with traffic, an arriving message may have a high latency due to queuing, but should not have a delta from the previous message greater than the service time. Latency delta may be < 0, showing a return from a high latency to a lower latency - These deltas are irrelevant (but can be grouped in with the high latency delta event to calculate the total effect of jitter). If the app is empty (not actively processing a message) when a message arrives, the latency difference is expected to be ~0 (assuming the previous message from long ago had a normal latency). In general, the “expected” latency delta range is (<0, service time).
Reply all
Reply to author
Forward
0 new messages