Aggregating kubernetes pod metrics |combining different metrics

6,779 views
Skip to first unread message

bahhoo

unread,
Jul 25, 2017, 6:31:00 AM7/25/17
to Prometheus Users

I want to aggregate the metrics of several pods which belong an application so that I have a continuous grapf in Grafana when pods get deleted.
Since there are more than one application per namespace I want to use the kubernetes labels to sum the metrics by the application name.

I've come accross this article which actually seems to do what I want but it is outdated:


I've the container kubernetes-state-metrics up and running on my cluster and prometheus config is set to receive the metrics:

kube_pod_labels{label_app="app-one",label_application="app-one",label_deployment="app-one-latest-18",label_deploymentConfig="app-one-latest",label_deploymentconfig="app-one-latest",namespace="app-nonprod",pod="app-one-latest-18-05ggg"} 1
kube_pod_labels{label_app="app-two",label_application="app-two",label_deployment="app-two-latest-18",label_deploymentConfig="app-two-latest",label_deploymentconfig="app-two-latest",namespace="app-nonprod",pod="app-two-latest-18-vn794"} 1


One of the metrics I want to aggregate looks like this:

container_memory_working_set_bytes{beta_kubernetes_io_arch="amd64",beta_kubernetes_io_os="linux",container_name="app-one",id="/system.slice/d****pe",image="172.***00/app-nonprod/app-one@sha256:***",instance="***",job="kubernetes-nodes",kubernetes_io_hostname="****",name="k8s_app-one.ccf7757a_app-one-latest-18-05ggg_app-nonprod_dfc****",namespace="app-nonprod",pod_name="app-one-latest-18-05ggg",region="primary"}


I want to sum up the memory metrics of the pods which have the label "app". So I want to combine these two metrics

1.
sum(container_memory_working_set_bytes{namespace=~".+-nonprod",name=~".+",pod_name=~".+"} )by (pod_name,namespace)


{namespace="monitoring-nonprod",pod_name="kube-state-metrics-1-f40wt"}    22065152
{namespace="monitoring-nonprod",pod_name="prometheus-2534504536-xbptl"}    1056088064
{namespace="app-nonprod",pod_name="app-two-18-05ggg"}    940535808
{namespace="app-nonprod",pod_name="app-two-18-mjfg4"}    956335838
{namespace="app-nonprod",pod_name="app-one-18-vn794"}    952115200


2.
kube_pod_labels{label_app=~".+"} 

kube_pod_labels{instance="kube-state-metrics:8080",job="kubernetes-state-exporter",label_app="app",label_application="app",label_deployment="app-one-latest-18",label_deploymentConfig="app-one-latest",label_deploymentconfig="app-one-latest",namespace="app-nonprod",pod="app-one-latest-18-vn794"}


I'd be glad if anyone could help.
Thanks.

Tom Wilkie

unread,
Jul 25, 2017, 6:41:54 AM7/25/17
to bahhoo, Prometheus Users
Hello!  I wrote that post, but it was using kube-api-exporter.  For kube-state-metrics (which is what you should be using), you need something like:

    sum by (namespace, app) (
      sum(container_memory_working_set_bytes{image!=""}) by (pod_name, namespace)
    * on (namespace, pod_name) group_left(app)
      label_replace(kube_pod_labels, "pod_name", "$1", "pod", "(.*)")
    )


--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/a45f2fde-2c83-454d-8696-d3ba9f40ca9e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tom Wilkie

unread,
Jul 25, 2017, 6:44:08 AM7/25/17
to bahhoo, Prometheus Users
Sorry, that should be:

    sum by (namespace, label_app) (
      sum(container_memory_working_set_bytes{image!=""}) by (pod_name, namespace)
    * on (namespace, pod_name) group_left(app)
      label_replace(kube_pod_labels, "pod_name", "$1", "pod", "(.*)")
    )

bahhoo

unread,
Jul 25, 2017, 6:59:58 AM7/25/17
to Prometheus Users, bah...@gmail.com
Perfect! Thanks for the quick reply.

This query does work with prometheus, but in Grafana I get 

{
  "status": "error",
  "errorType": "execution",
  "error": "many-to-many matching not allowed: matching labels must be unique on one side",
  "message": "many-to-many matching not allowed: matching labels must be unique on one side"
}


Any idea why that might be? 

Tom Wilkie

unread,
Jul 25, 2017, 7:04:43 AM7/25/17
to bahhoo, Prometheus Users
Any idea why that might be? 

Hmm, no - are you sure you have the same query in Grafana?  Its works from Grafana for me.

Tom

bahhoo

unread,
Jul 25, 2017, 7:07:40 AM7/25/17
to Prometheus Users, bah...@gmail.com
Yep, I just pasted the same query into the query field in my dashboard.
I am using  Grafana v4.1.1, which one do you have?

Tom Wilkie

unread,
Jul 25, 2017, 7:27:34 AM7/25/17
to bahhoo, Prometheus Users
I use grafana 4.3.2.  But I've just realised I have these queries setup as recording rules as they tend to touch a lot of timeseries, and the grafana is setup to use the rules.  So I haven't actually tried it in Grafanae, sorry.

Tom

bahhoo

unread,
Jul 25, 2017, 10:45:08 AM7/25/17
to Prometheus Users, bah...@gmail.com
Perfect!! That helped.

After putting the query in a rule I can see the series in Grafana.

I just had to adjust the query as below to get what I really wanted.


app:container_memory_working_set_bytes:sum =

sum
by (namespace, label_app) (
     sum(container_memory_working_set_bytes{image!=""}) by (pod_name, namespace)
   * on (namespace, pod_name) group_left(app,label_app)
     label_replace(kube_pod_labels, "pod_name", "$1", "pod", "(.*)")
   )

Now I can use the following in Grafana to see different metrics for different apps

app:container_memory_working_set_bytes:sum{namespace=~"^$namespace$"}

Tom Wilkie

unread,
Jul 25, 2017, 11:13:23 AM7/25/17
to bahhoo, Prometheus Users
+1 

You can probably remove the app label from the group_left clause.

Tom

Reply all
Reply to author
Forward
0 new messages