Understanding metrics values

593 views
Skip to first unread message

sapi...@gmail.com

unread,
Nov 9, 2014, 11:47:55 PM11/9/14
to kamon...@googlegroups.com
Hey guys,

First of all - thanks for the awesome tool :)

I have a few questions about the metrics values (I'm using Datadog integration):

- actor.processing_time.count (or trace.elapsed_time.count). As I understand, it should show a number of processed messages per actor. But values in my case look like fractional numbers - 0.1, 0.45, 0.6, etc. I can believe that there is some rounding issue, and actually those numbers are 1, 4.5, 6, but in this case 4.5 doesn't make any sense to me. I'm just trying to understand how this is calculated.

- actor.mailbox_size.max. I see a lot of 0.5 values. Why?

- things like *.95percentile, are those still in nanoseconds and should be divided by 1000000 to get milliseconds?

Thanks so much!

Ivan Topolnjak

unread,
Nov 10, 2014, 7:29:34 AM11/10/14
to kamon...@googlegroups.com
Hello there and welcome to the community!

Going straight to your questions:


- actor.processing_time.count (or trace.elapsed_time.count). As I understand, it should show a number of processed messages per actor. But values in my case look like fractional numbers - 0.1, 0.45, 0.6, etc. I can believe that there is some rounding issue, and actually those numbers are 1, 4.5, 6, but in this case 4.5 doesn't make any sense to me. I'm just trying to understand how this is calculated.

Your understanding is correct, the .count metric represents the number of measurements reported for a given metric, the reason why you see fractional numbers is that Datadog treats all counts as per-second counts to be displayed per each flush, which is 10 seconds by default. Here is a little extract from the Datadog documentation [1] that you might find useful:

Note that StatsD counters are normalized over the flush interval to report per-second units. In the graph above, the marker is reporting 35.33 web page views per second at ~15:24. In contrast, if one person visited the webpage each second, the graph would be a flat line at y = 1. To increment or measure values over time, please see gauges.

The .count metrics are generated automatically, we just report the latency measurements and they do the math so there is nothing we can do to change that, so, if you want to see the real counts you can modify the dashboard to multiply the .count value by 10 and that's it.


- actor.mailbox_size.max. I see a lot of 0.5 values. Why?

That's weird. It might happen that you see fractional values in the average that Datadog displays in all dashboard, even when plotting the max, but you should never see a fractional value as a max mailbox size. Are you sure you are not referring to the value pointed in the screenshot bellow?



- things like *.95percentile, are those still in nanoseconds and should be divided by 1000000 to get milliseconds?

Yes, those numbers are in nanoseconds. I just realized that reporting in nanoseconds can be annoying for some use cases, specially if the latency measurements always fall in the milliseconds or microseconds range so I created an issue [2] to improve that an allow users to set the unit with which they want to report.

Hope you find the info useful, let us know if you need anything else, regards!

Yaroslav Tkachenko

unread,
Nov 10, 2014, 12:36:18 PM11/10/14
to kamon...@googlegroups.com
Hey Ivan, thanks for your replies!

On Mon, Nov 10, 2014 at 4:29 AM, Ivan Topolnjak <ivan...@gmail.com> wrote:
Hello there and welcome to the community!

Going straight to your questions:

- actor.processing_time.count (or trace.elapsed_time.count). As I understand, it should show a number of processed messages per actor. But values in my case look like fractional numbers - 0.1, 0.45, 0.6, etc. I can believe that there is some rounding issue, and actually those numbers are 1, 4.5, 6, but in this case 4.5 doesn't make any sense to me. I'm just trying to understand how this is calculated.

Your understanding is correct, the .count metric represents the number of measurements reported for a given metric, the reason why you see fractional numbers is that Datadog treats all counts as per-second counts to be displayed per each flush, which is 10 seconds by default. Here is a little extract from the Datadog documentation [1] that you might find useful:

Note that StatsD counters are normalized over the flush interval to report per-second units. In the graph above, the marker is reporting 35.33 web page views per second at ~15:24. In contrast, if one person visited the webpage each second, the graph would be a flat line at y = 1. To increment or measure values over time, please see gauges.

The .count metrics are generated automatically, we just report the latency measurements and they do the math so there is nothing we can do to change that, so, if you want to see the real counts you can modify the dashboard to multiply the .count value by 10 and that's it.

- actor.mailbox_size.max. I see a lot of 0.5 values. Why?

That's weird. It might happen that you see fractional values in the average that Datadog displays in all dashboard, even when plotting the max, but you should never see a fractional value as a max mailbox size. Are you sure you are not referring to the value pointed in the screenshot bellow?



Yep, I'm sure about this one, check the screenshot:

Inline image 1

 

- things like *.95percentile, are those still in nanoseconds and should be divided by 1000000 to get milliseconds?

Yes, those numbers are in nanoseconds. I just realized that reporting in nanoseconds can be annoying for some use cases, specially if the latency measurements always fall in the milliseconds or microseconds range so I created an issue [2] to improve that an allow users to set the unit with which they want to report.

Hope you find the info useful, let us know if you need anything else, regards!

--
You received this message because you are subscribed to a topic in the Google Groups "kamon-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kamon-user/4U0xiy2UtD4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kamon-user+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ivan Topolnjak

unread,
Nov 10, 2014, 12:49:44 PM11/10/14
to kamon...@googlegroups.com
Thanks for the screenshot, I'll investigate more... I'm not sure if this is a problem with the data we are reporting or maybe is some sort of calculation made by the datadog-agent that we don't know about (similar to what happens with .count metrics), regards!
Reply all
Reply to author
Forward
0 new messages