Stackdriver custom metric units

Ben Reed

unread,

Jan 15, 2020, 3:34:58 PM1/15/20

to Cloudprober

We are using cloudprober with the stackdriver surfacer. It is uploading all custom metrics that we would expect; however, they are being displayed in the wrong units on the graphs. For example, latency is showing up to 30,000/s which does not make sense. In the logs, one was showing at a latency of 8774267.556, but on Stackdriver showed the latency at 3,091/s. The probe is set to an interval of 30 seconds and metrics are uploaded every 60. Does anybody know what is going on? I'm a tad confused.

Manu Garg

unread,

Jan 15, 2020, 6:24:27 PM1/15/20

to Ben Reed, Cloudprober

Hi Ben,

This is sort of a known issue with the stackdriver surfacer. Cloudprober exports all metrics as counters[1] and stackdriver automatically converts counters into rates for graphing (and alerting I think). That's where the "/s" is coming from. Default latency unit is microseconds so if you're seeing 30,000/s in graphs, that means a cumulative latency of 30ms per second -- if you are running one probe every second, this will mean an average latency of 30ms.

One solution to this problem will be to include units while exporting data to stackdriver, but this will require non-trivial changes to how cloudprober handles metrics and this will benefit only stackdriver surfacer at this point. Can you please file an issue so that this issue is more visible and we can take a stab fixing it in future.

Also, if you're interested in more latency details, I'd suggest using latency distributions. Here is an example of using latency distribution for custom metrics:

https://github.com/google/cloudprober/blob/master/examples/external/cloudprober_aggregate.cfg#L14

Latency distribution can also be enabled for the default latency metric through this option:

https://github.com/google/cloudprober/blob/86a1d1fcd2f8505c45ff462d69458fd5b9964e5f/probes/proto/config.proto#L52

Stackdriver handles distributions pretty well.

[1]-Except if you configure them as gauge for external probes' custom metrics.

On Wed, Jan 15, 2020 at 12:34 PM 'Ben Reed' via Cloudprober <cloud...@googlegroups.com> wrote:

We are using cloudprober with the stackdriver surfacer. It is uploading all custom metrics that we would expect; however, they are being displayed in the wrong units on the graphs. For example, latency is showing up to 30,000/s which does not make sense. In the logs, one was showing at a latency of 8774267.556, but on Stackdriver showed the latency at 3,091/s. The probe is set to an interval of 30 seconds and metrics are uploaded every 60. Does anybody know what is going on? I'm a tad confused.

--
You received this message because you are subscribed to the Google Groups "Cloudprober" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloudprober...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cloudprober/484a4757-a7c4-4738-b308-43659a55666f%40googlegroups.com.

--

Manu Garg
Creator of Cloudprober, Page Notes & Pacparser

"Journey is the destination of life."

Ben Reed

unread,

Jan 15, 2020, 6:49:19 PM1/15/20

to

Hi Manu,

Thanks so much for helping us! I figured this was happening when I made some API calls to get the MetricDescriptor and TimeSeries, and the raw data matched the logs as expected. I opened google/cloudprober#349 on GitHub as you requested. I'll follow your links and would love to follow progress and conversations of solutions on GitHub.

As an aside, thanks for your project. It has really helped my team :)

Ben Reed
Software Engineer
ben...@google.com
‪(650) 448-6116‬

Reply all

Reply to author

Forward