Metric naming conventions

885 views
Skip to first unread message

Julie Stalley

unread,
Jun 27, 2018, 3:15:00 PM6/27/18
to Prometheus Developers

Hi All,


We are in the process of writing client libraries for several different runtimes and I have questions regarding naming conventions.  We would like our libraries to be as consistent as possible and to the same names for the same metrics. The docs give guidelines for Process metrics which is great but there are no conventions to follow for things like memory, threads etc. 


Would it make sense to add to the guidelines to give suggested names for more standard metrics ? 


Thanks,

Julie

Brian Brazil

unread,
Jun 27, 2018, 3:23:25 PM6/27/18
to Julie Stalley, Prometheus Developers
On 27 June 2018 at 20:15, Julie Stalley <stalle...@gmail.com> wrote:

Hi All,


We are in the process of writing client libraries for several different runtimes and I have questions regarding naming conventions.  We would like our libraries to be as consistent as possible and to the same names for the same metrics. The docs give guidelines for Process metrics which is great but there are no conventions to follow for things like memory, threads etc. 

Everything else should follow standard naming rules, as memory, threads, gc handling etc. vary from runtime to runtime.

Brian
 


Would it make sense to add to the guidelines to give suggested names for more standard metrics ? 


Thanks,

Julie

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/a8be102a-d928-43f3-ad34-e063a6f3aa72%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Julie Stalley

unread,
Jun 28, 2018, 4:07:04 AM6/28/18
to Prometheus Developers
I agree that some items vary between runtimes but they are many metrics that would be the same. Http requests in particular would be useful to standardise so that users can monitor apps on different runtimes and look at overall http request statistics. What do you think ?

Brian Brazil

unread,
Jun 28, 2018, 4:13:20 AM6/28/18
to Julie Stalley, Prometheus Developers
On 28 June 2018 at 09:07, Julie Stalley <stalle...@gmail.com> wrote:
I agree that some items vary between runtimes but they are many metrics that would be the same. Http requests in particular would be useful to standardise so that users can monitor apps on different runtimes and look at overall http request statistics. What do you think ?

HTTP requests is a perfect example of something which is completely non-standard. Which layer of the stack is that measured at? What labels are included? Is that before or after any middleware? Any of those varying means a different metric name, so the general rule that metric names be tied to the "library" in which they live applies.

Brian
 



On Wednesday, 27 June 2018 20:15:00 UTC+1, Julie Stalley wrote:

Hi All,


We are in the process of writing client libraries for several different runtimes and I have questions regarding naming conventions.  We would like our libraries to be as consistent as possible and to the same names for the same metrics. The docs give guidelines for Process metrics which is great but there are no conventions to follow for things like memory, threads etc. 


Would it make sense to add to the guidelines to give suggested names for more standard metrics ? 


Thanks,

Julie

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--

cnba...@gmail.com

unread,
Jul 2, 2018, 7:29:53 AM7/2/18
to Prometheus Developers
On Thursday, 28 June 2018 09:13:20 UTC+1, Brian Brazil wrote:
> On 28 June 2018 at 09:07, Julie Stalley <stalle...@gmail.com> wrote:
>
>
> I agree that some items vary between runtimes but they are many metrics that would be the same. Http requests in particular would be useful to standardise so that users can monitor apps on different runtimes and look at overall http request statistics. What do you think ?
>
>
> HTTP requests is a perfect example of something which is completely non-standard. Which layer of the stack is that measured at? What labels are included? Is that before or after any middleware? Any of those varying means a different metric name, so the general rule that metric names be tied to the "library" in which they live applies.
>
>
> Brian

The desire to have a degree of consistency is in order to provide the ability for users to:
* Aggregate metrics across micro services
* Have a default dashboard that works out-of-the-box for new microservices that can subsequently be customised as required

We already have this for the process_ metrics which is really useful to be able to measure and aggregate memory and CPU usage across microservices.

What we'd like to do is extend that to provide optional guidelines for other 'common' metrics - some of those should be relatively easy to specify (because of how they're measured) and others may be harder (and therefore there may not be scope to provide conventions/guidelines for).

The easier end of the scale is likely to be OS level metrics, for example there could be a convention for overall OS level memory usage, eg:
# HELP os_resident_memory_bytes The OS resident memory size in bytes.
# TYPE os_resident_memory_bytes gauge

Another area might be around how info is provided. For example, the Go, Python and Java clients provide go_info, python_info and jvm_info respectively, so that could be a convention we promote to be used more widely.

Chris

Brian Brazil

unread,
Jul 2, 2018, 7:47:27 AM7/2/18
to cnba...@gmail.com, Prometheus Developers
On 2 July 2018 at 12:29, <cnba...@gmail.com> wrote:
On Thursday, 28 June 2018 09:13:20 UTC+1, Brian Brazil  wrote:
> On 28 June 2018 at 09:07, Julie Stalley <stalle...@gmail.com> wrote:
>
>
> I agree that some items vary between runtimes but they are many metrics that would be the same. Http requests in particular would be useful to standardise so that users can monitor apps on different runtimes and look at overall http request statistics. What do you think ?
>
>
> HTTP requests is a perfect example of something which is completely non-standard. Which layer of the stack is that measured at? What labels are included? Is that before or after any middleware? Any of those varying means a different metric name, so the general rule that metric names be tied to the "library" in which they live applies.
>
>
> Brian

The desire to have a degree of consistency is in order to provide the ability for users to:
* Aggregate metrics across micro services
* Have a default dashboard that works out-of-the-box for new microservices that can subsequently be customised as required

You can only do that if the metrics have identical meaning, which is not something a client library can know. This would have to be done by the user of the client library.
 
We already have this for the process_ metrics which is really useful to be able to measure and aggregate memory and CPU usage across microservices.

Process metrics were standardised to have the same meaning everywhere. That doesn't work for pretty much anything else unfortunately.
 

What we'd like to do is extend that to provide optional guidelines for other 'common' metrics - some of those should be relatively easy to specify (because of how they're measured) and others may be harder (and therefore there may not be scope to provide conventions/guidelines for).

The easier end of the scale is likely to be OS level metrics, for example there could be a convention for overall OS level memory usage, eg:
# HELP os_resident_memory_bytes The OS resident memory size in bytes.
# TYPE os_resident_memory_bytes gauge

The node exporter and wmi exporter each have their own metrics for this.
 

Another area might be around how info is provided. For example, the Go, Python and Java clients provide go_info, python_info and jvm_info respectively, so that could be a convention we promote to be used more widely.

Each of those are different metrics that work in different ways as the runtimes are all different, thus the different metric names. The most we can say is that this sort of metric is useful, if you're writing a client library you should consider adding one.

Brian
 

Chris


--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--

Ben Kochie

unread,
Jul 2, 2018, 8:15:36 AM7/2/18
to Brian Brazil, cnba...@gmail.com, Prometheus Developers
The easier end of the scale is likely to be OS level metrics, for example there could be a convention for overall OS level memory usage, eg:
# HELP os_resident_memory_bytes The OS resident memory size in bytes.
# TYPE os_resident_memory_bytes gauge

The node exporter and wmi exporter each have their own metrics for this.

And even within the node_exporter, we have to have different metric names because different OSs have different concepts.

A canonical example of this is the "total node memory".
* Linux does not include some memory in the total, most of it related to kernel use.
* Darwin includes all physical memory.

cnba...@gmail.com

unread,
Jul 2, 2018, 10:33:04 AM7/2/18
to Prometheus Developers
On Monday, 2 July 2018 12:47:27 UTC+1, Brian Brazil wrote:
> You can only do that if the metrics have identical meaning, which is not something a client library can know. This would have to be done by the user of the client library.

Absolutely, but if the client library implementor can create a metric that has the same meaning, it would be useful if it has the same naming convention. Conversely if it has a different meaning, the implementor could ensure it has a different naming convention.

If there is a documented set of conventions, both of those become possible. Without it you could easily have a scenario where an implementor uses the same as used elsewhere for a metric, but with a very different meaning.

One scenario I can imagine for this is with `http_request_duration_microseconds`, which is used by the golang client for the responsiveness of incoming requests to a server, but could equally be used by a HTTP client library to denote the responsive of outbound requests.

I agree that where the meaning is different, or measurement point is significantly different, then metrics should have different labels

> Another area might be around how info is provided. For example, the Go, Python and Java clients provide go_info, python_info and jvm_info respectively, so that could be a convention we promote to be used more widely.
>
>
>
> Each of those are different metrics that work in different ways as the runtimes are all different, thus the different metric names. The most we can say is that this sort of metric is useful, if you're writing a client library you should consider adding one.

Absolutely - this would be a recommendation of something that implementors are encouraged to add, and to do so under their own namespace.


One of the things I've been struggling with as an implementor is to understand what conventions I should follow (if any) and what best practices there are. In order to do this, I've had to evaluate what each of the existing clients provide and try to distill is there's any common approaches across them. Having a more in-depth set of guidelines would really lower the barrier for new implementations.

Chris

Brian Brazil

unread,
Jul 2, 2018, 10:38:25 AM7/2/18
to Chris Bailey, Prometheus Developers
On 2 July 2018 at 15:33, <cnba...@gmail.com> wrote:
On Monday, 2 July 2018 12:47:27 UTC+1, Brian Brazil  wrote:
> You can only do that if the metrics have identical meaning, which is not something a client library can know. This would have to be done by the user of the client library.

Absolutely, but if the client library implementor can create a metric that has the same meaning, it would be useful if it has the same naming convention. Conversely if it has a different meaning, the implementor could ensure it has a different naming convention.

If there is a documented set of conventions, both of those become possible. Without it you could easily have a scenario where an implementor uses the same as used elsewhere for a metric, but with a very different meaning.

One scenario I can imagine for this is with `http_request_duration_microseconds`, which is used by the golang client for the responsiveness of incoming requests to a server, but could equally be used by a HTTP client library to denote the responsive of outbound requests.

That was actually removed, for reasons including this. It would be inappropriate to mix http requests from different layers inside an application in one metric for example.


On Monday, 2 July 2018 13:15:36 UTC+1, Ben Kochie  wrote:
> The easier end of the scale is likely to be OS level metrics, for example there could be a convention for overall OS level memory usage, eg:
>
> # HELP os_resident_memory_bytes The OS resident memory size in bytes.
>
> # TYPE os_resident_memory_bytes gauge
>
>
>
> The node exporter and wmi exporter each have their own metrics for this.
>
>
> And even within the node_exporter, we have to have different metric names because different OSs have different concepts.
>
>
> A canonical example of this is the "total node memory".
> * Linux does not include some memory in the total, most of it related to kernel use.
> * Darwin includes all physical memory.

I agree that where the meaning is different, or measurement point is significantly different, then metrics should have different labels

> Another area might be around how info is provided. For example, the Go, Python and Java clients provide go_info, python_info and jvm_info respectively, so that could be a convention we promote to be used more widely.
>
>
>
> Each of those are different metrics that work in different ways as the runtimes are all different, thus the different metric names. The most we can say is that this sort of metric is useful, if you're writing a client library you should consider adding one.

Absolutely - this would be a recommendation of something that implementors are encouraged to add, and to do so under their own namespace.


One of the things I've been struggling with as an implementor is to understand what conventions I should follow (if any) and what best practices there are. In order to do this, I've had to evaluate what each of the existing clients provide and try to distill is there's any common approaches across them. Having a more in-depth set of guidelines would really lower the barrier for new implementations.

https://prometheus.io/docs/instrumenting/writing_clientlibs/ are the guidelines. What you are talking about is not the concern of a client library though.

Brian 

Chris

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Reply all
Reply to author
Forward
0 new messages