confused : cadvisor,heapster & node-expoter and missing container_spec_cpu_limit

1,888 views
Skip to first unread message

nata...@liveperson.com

unread,
Nov 28, 2016, 3:19:11 AM11/28/16
to Prometheus Developers
Hello,

We are using prometheus v1.3.1 in kubernetes v1.4.

I am trying to understand what should we use : cadvisor,heapster & node-exporter

I found a lot of information but very difficult to understand which one is up to date.

As I understand,cadvisor and heapster provide the same metrics,so using cadvisor is enough, is it correct ?

cadvisor and node-expoter not exactly the same but have many overlap and still separate because "They serve very different concerns. And you may want to change the scrape frequency between node and cgroups." (according : https://groups.google.com/forum/#!searchin/prometheus-developers/node_exporter$200.12.0$20released$20(finally)%7Csort:relevance/prometheus-developers/C0HrBzgibDE/rgw1QPohBgAJ)

but not no one from above not expose the metric about cpu limit defined for container :

resources:
limits:
cpu: "2"
memory: 4G

there is only metric about memory limit : container_spec_memory_limit_bytes

Please help me to understand what to use in order to provide basic monitoring for kubernetes nodes and containers

Many Thanks for help!
Natalia

Brian Brazil

unread,
Nov 28, 2016, 3:21:51 AM11/28/16
to Natalia Kagan, Prometheus Developers
On 28 November 2016 at 08:19, <nata...@liveperson.com> wrote:
Hello,

We are using prometheus v1.3.1 in kubernetes v1.4.

I am trying to understand what should we use : cadvisor,heapster & node-exporter

I found a lot of information but very difficult to understand which one is up to date.

As I understand,cadvisor and heapster provide the same metrics,so using cadvisor is enough, is it correct ?

Yes, you should be using cadvisor and the node exporter.
 

cadvisor and node-expoter not exactly the same but have many overlap and still separate because  "They serve very different concerns.  And you may want to change the scrape frequency between node and cgroups." (according : https://groups.google.com/forum/#!searchin/prometheus-developers/node_exporter$200.12.0$20released$20(finally)%7Csort:relevance/prometheus-developers/C0HrBzgibDE/rgw1QPohBgAJ)

but not no one from above not expose the metric about cpu limit defined for container :

 resources:
    limits:
       cpu: "2"
       memory: 4G

there is only metric about memory limit : container_spec_memory_limit_bytes

That sounds like a feature request for cadvisor.

Brian
 

Please help me to understand what to use in order to provide basic monitoring for kubernetes nodes and containers

Many Thanks for help!
Natalia
--
This message may contain confidential and/or privileged information.
If you are not the addressee or authorized to receive this on behalf of the
addressee you must not use, copy, disclose or take action based on this
message or any information herein.
If you have received this message in error, please advise the sender
immediately by reply email and delete this message. Thank you.

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/0ecec3f5-f3a6-488e-b423-a41a8d976663%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Ben Kochie

unread,
Nov 28, 2016, 3:25:50 AM11/28/16
to Natalia, Prometheus Developers
I'm not sure what the state of things are with 1.4, but with 1.3 we get metrics on the limits via container_spec_cpu_quota from the kubelet.

Matthias Rampke

unread,
Nov 28, 2016, 3:28:01 AM11/28/16
to Brian Brazil, Natalia Kagan, Prometheus Developers

In addition to those you listed, github.com/kubernetes/kube-state-metrics exposes things you could find out with kubectl add metrics.

There is some overlap with cAdvisor but they come at it from different angles: kube-state-metrics exposes the logical requests and limits, cAdvisor their effects on the kernel.

Please keep in mind that kube-state-metrics is not done and many metric names may still change.

/MR


To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.



--

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CAHJKeLpJvoQ__Ckyzkm7me9uWsdKYYHzL3a4wU93MOOG0y7Hhw%40mail.gmail.com.

nata...@liveperson.com

unread,
Nov 28, 2016, 3:33:42 AM11/28/16
to Prometheus Developers, nata...@liveperson.com
> Yes, you should be using cadvisor and the node exporter.
>  
Thanks a lot, Brian!

what is the correct job for cadvisor in prometheus :

port : 10255 (http)

- job_name: 'kubernetes-apiserver-cadvisor'
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- source_labels: [__meta_kubernetes_role]
action: replace
target_label: kubernetes_role
- source_labels: [__address__]
regex: '(.*):10250'
replacement: '${1}:10255'
target_label: __address__

or port : 10250 (https) :

- job_name: 'kubernetes-nodes'

kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

relabel_configs:
- source_labels: [__meta_kubernetes_node_name]
action: replace
target_label: node_name

regarding the node-exporter :
we are using image : prom/node-exporter as is, does it need any additional flags ?

Many Thanks!

nata...@liveperson.com

unread,
Nov 28, 2016, 4:00:45 AM11/28/16
to Prometheus Developers, nata...@liveperson.com
On Monday, November 28, 2016 at 10:25:50 AM UTC+2, Ben Kochie wrote:
> I'm not sure what the state of things are with 1.4, but with 1.3 we get metrics on the limits via container_spec_cpu_quota from the kubelet.

wow, you are right !!!! This is exactly what I am looking for.

container_spec_cpu_shares "CPU share of the container."
container_spec_cpu_quota "CPU quota of the container."

so, in order to alert when container reach cpu limit :
container_cpu_user_seconds_total / container_spec_cpu_quota
is it right ?

Thanks a lot!


nata...@liveperson.com

unread,
Nov 28, 2016, 4:07:37 AM11/28/16
to Prometheus Developers, brian....@robustperception.io, nata...@liveperson.com
On Monday, November 28, 2016 at 10:28:01 AM UTC+2, Matthias Rampke wrote:
> In addition to those you listed, github.com/kubernetes/kube-state-metrics exposes things you could find out with kubectl add metrics.
>
> There is some overlap with cAdvisor but they come at it from different angles: kube-state-metrics exposes the logical requests and limits, cAdvisor their effects on the kernel.
>

yes, I am using it as well (but metrics for rc and daemonset are missing and I opened feature request : https://github.com/kubernetes/kube-state-metrics/issues/50)

Thanks!

Message has been deleted

Julius Volz

unread,
Nov 30, 2016, 1:17:28 PM11/30/16
to Natalia Kagan, Prometheus Developers
I don't know what unit "container_spec_cpu_quota" is in, but at least for "container_cpu_user_seconds_total", that's the total time in seconds of CPU used since the container started. You'll want to take the rate over that to get the current usage ratio:

rate(container_cpu_user_seconds_total[5m])

...or something like that.

Maybe the others know more about how to interpret the quota metric.

 
Thanks a lot!




--
This message may contain confidential and/or privileged information.
If you are not the addressee or authorized to receive this on behalf of the
addressee you must not use, copy, disclose or take action based on this
message or any information herein.
If you have received this message in error, please advise the sender
immediately by reply email and delete this message. Thank you.

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.

Matthias Rampke

unread,
Dec 1, 2016, 5:59:21 AM12/1/16
to Julius Volz, Natalia Kagan, Prometheus Developers
The container_spec_cpu_quota is in units of container_spec_cpu_period. To get the quota in "CPU cores", use

container_spec_cpu_quota / container_spec_cpu_period

/MR

On Wed, Nov 30, 2016 at 6:17 PM Julius Volz <juliu...@gmail.com> wrote:
On Mon, Nov 28, 2016 at 2:00 AM, <nata...@liveperson.com> wrote:
On Monday, November 28, 2016 at 10:25:50 AM UTC+2, Ben Kochie wrote:
> I'm not sure what the state of things are with 1.4, but with 1.3 we get metrics on the limits via container_spec_cpu_quota from the kubelet.

wow, you are right !!!! This is exactly what I am looking for.

container_spec_cpu_shares "CPU share of the container."
container_spec_cpu_quota "CPU quota of the container."

so, in order to alert when container reach cpu limit :
container_cpu_user_seconds_total / container_spec_cpu_quota
is it right ?

I don't know what unit "container_spec_cpu_quota" is in, but at least for "container_cpu_user_seconds_total", that's the total time in seconds of CPU used since the container started. You'll want to take the rate over that to get the current usage ratio:

rate(container_cpu_user_seconds_total[5m])

...or something like that.

Maybe the others know more about how to interpret the quota metric.

 
Thanks a lot!



--
This message may contain confidential and/or privileged information.
If you are not the addressee or authorized to receive this on behalf of the
addressee you must not use, copy, disclose or take action based on this
message or any information herein.
If you have received this message in error, please advise the sender
immediately by reply email and delete this message. Thank you.

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CA%2BT6YozqNUbejZSkAsfx2pRAiO%2Bk6jus-LVLS6TKeOkye2z5aw%40mail.gmail.com.

Natalia

unread,
Dec 1, 2016, 7:26:02 AM12/1/16
to Prometheus Developers, juliu...@gmail.com, nata...@liveperson.com


On Thursday, December 1, 2016 at 12:59:21 PM UTC+2, Matthias Rampke wrote:
The container_spec_cpu_quota is in units of container_spec_cpu_period. To get the quota in "CPU cores", use

container_spec_cpu_quota / container_spec_cpu_period

/MR

so, if I want to find all containers used over 80% of cpu limit, something like this :

(rate(container_cpu_user_seconds_total[5m]) / (container_spec_cpu_quota / container_spec_cpu_period)) *100 > 80 ?

Thanks!

Julius Volz

unread,
Dec 1, 2016, 3:57:21 PM12/1/16
to Natalia, Prometheus Developers
That looks about right I think.
 
Thanks!

Reply all
Reply to author
Forward
0 new messages