Add labels

137 views
Skip to first unread message

Christian Oelsner

unread,
Nov 18, 2022, 4:51:41 AM11/18/22
to Prometheus Users
Hello,

I am trying to add labels to metrics fetched from Confluent Cloud.
We are monitoring some 35 Kafka clusters.

scrape_configs:
  - job_name: Confluent Cloud 
    scrape_interval: 1m
    scrape_timeout: 1m 
    honor_timestamps: true
    static_configs: 
      - targets: - api.telemetry.confluent.cloud 
    scheme: https
    basic_auth: 
      username: <Cloud API Key> 
      password: <Cloud API Secret> 
    metrics_path: /v2/metrics/cloud/export 
    params: 
        "resource.kafka.id": 
           - lkc-1 
           - lkc-2
           - lkc-3
           - lkc-4 
           - lkc-5
           - lkc-6
           - lkc-etc etc


Each lkc-xxxx represent a cluster which belongs to a department.
I would like to add a departmentID to the metrics belonging to to each cluster.
For example lkc-1 and lkc-5 would beong to department "analytics"

How would i go about adding labels to the metrics?

Best regards
Christian Oelsner

Brian Candler

unread,
Nov 18, 2022, 9:05:10 AM11/18/22
to Prometheus Users
> How would i go about adding labels to the metrics?

You have this:

   static_configs: 
      - targets:
        - api.telemetry.confluent.cloud 

This means you are only scraping one endpoint, one time.  If you wanted to add the same labels to every metric received from that endpoint, you would do this:

   static_configs: 
      - labels:
          foo: bar
          baz: qux
        targets:
        - api.telemetry.confluent.cloud 

Of course, that's not what you're asking.

The question now is, do the metrics that you get back all carry a label which identifies the cluster, such as {cluster="lkc-1"}?

If so, then it's a simple case of metric relabelling to add the department labels corresponding to each cluster ID.  Add to the scrape job:

    metric_relabel_configs:
      - source_labels: [cluster]
        regex: lkc-1
        target_label: departmentID
        replacement: Accounts
      - source_labels: [cluster]
        regex: lkc-2
        target_label: departmentID
        replacement: Engineering
      # etc

If you don't have such a label, then you will need to scrape the API endpoint separately, once for each value of resource.kafka.id

The dumb option is multiple scrape jobs:

scrape_configs:
  - job_name: Confluent Cloud lkc-1
    scrape_interval: 1m
    scrape_timeout: 1m
    static_configs:
      - labels:
          department: Accounts

        targets:
          - api.telemetry.confluent.cloud
    scheme: https
    basic_auth:
      username: <Cloud API Key>
      password: <Cloud API Secret>
    metrics_path: /v2/metrics/cloud/export
    params:
        "resource.kafka.id": [lkc-1]
  - job_name: Confluent Cloud lkc-2
    scrape_interval: 1m
    scrape_timeout: 1m
    static_configs:
      - labels:
          department: Engineering

        targets:
          - api.telemetry.confluent.cloud
    scheme: https
    basic_auth:
      username: <Cloud API Key>
      password: <Cloud API Secret>
    metrics_path: /v2/metrics/cloud/export
    params:
        "resource.kafka.id": [lkc-2]
  # ... etc

That should work just fine, but is annoyingly verbose and repetitive.

The second option, which I would normally use in this situation, is to set the query parameter using a __param_XXXX label:

scrape_configs:
  - job_name: Confluent Cloud
    scrape_interval: 1m
    scrape_timeout: 1m
    static_configs:
      - labels:
          department: Accounts
          "__param_resource.kafka.id": lkc-1
        targets:
          - api.telemetry.confluent.cloud
      - labels:
          department: Engineering
          "__param_resource.kafka.id": lkc-2
        targets:
          - api.telemetry.confluent.cloud
      - labels:
          department: Special Projects
          "__param_resource.kafka.id": lkc-3
        targets:
          - api.telemetry.confluent.cloud
      # etc

    scheme: https
    basic_auth:
      username: <Cloud API Key>
      password: <Cloud API Secret>
    metrics_path: /v2/metrics/cloud/export

Here, the parameter value is set to a single value each time using the magic label "__param_<paramname>" instead of using "params: { name: [ list_of_values ] }"

Unfortunately, the problem is that I'm not sure that __param supports parameter names with dots in them, because dots are technically not valid in a label name.  You would need to try it to find out if it works, and I wouldn't be surprised if it were rejected.

Aside:
- You should almost never use "honor_timestamps" so I have removed it in the examples above.  If you do use it, you have to be very sure why, and understand how it may break things.
- When there are multiple targets like this I would use file_sd_configs rather than static_configs for this (it's easier to maintain).

The downside to these approaches is that you are now hitting the same API endpoint N times (each returning 1/Nth of the data).  This only matters if you get charged per API call.

If you still want to fetch the responses in a single API call as you are now, then you will have to use metric_relabelling, and somehow decide for each metric that comes back which kafka cluster it came from by examining the labels - which is the first approach I proposed.

HTH,

Brian.

Christian Oelsner

unread,
Nov 19, 2022, 6:43:26 AM11/19/22
to Prometheus Users
Hi Brian,
Thanks for your input, i will try to work with them.

I put in the honor_timestamps only because it was done in the example config provided on the confluent cloud metrics api documentation.
The reason why i am fetching the metrics all in one call is that Confluent imposes a 60 requests limit pr hour, and we found that we often hit that limit and received an HTTP 439, too many requests. After that we were "locked" out for 15-20 mins. This was not optimal.

A quick query in prometheus for example gives me this:
confluent_kafka_server_retained_bytes{instance="api.telemetry.confluent.cloud:443", job="Confluent-Cloud", kafka_id="lkc-0x3v22", topic="confluent-kafka-connect-qa.confluent-kafka_configs"}

Does that mean that i have a label simply called kafka_id?

I did infact try to wrap my head around using file_sd_configs but could not work out how the params part of it, so i gave up on that. It would be nice though, since our list of clusters keps growing every week.

Let me try ome of your thoughts here in the weekend and report back here.

Thanks again.

/Christian Oelsner

Brian Candler

unread,
Nov 20, 2022, 5:36:57 AM11/20/22
to Prometheus Users
On Saturday, 19 November 2022 at 11:43:26 UTC christia...@gmail.com wrote:
A quick query in prometheus for example gives me this:
confluent_kafka_server_retained_bytes{instance="api.telemetry.confluent.cloud:443", job="Confluent-Cloud", kafka_id="lkc-0x3v22", topic="confluent-kafka-connect-qa.confluent-kafka_configs"}

Does that mean that i have a label simply called kafka_id?

Yes indeed.  So if you can relate the values of that to the department, then you can use the simple metric relabelling I showed originally to add the departmentID label. But you need a separate rewrite rule for each kafka_id to department mapping - so you'll have to update the config every time you add a new cluster (which you're already doing to add the new query params).

There is another approach to consider: you can make a separate set of static timeseries with the metadata bindings, like this:

kafka_cluster_info{kafka_id="lkc-0x3v22", departmentID="Engineering", env="production"} 1
kafka_cluster_info{kafka_id="lkc-0x3v25", departmentID="Accounts", env="test"} 1
...

(A static timeseries can be made using node_exporter textfile_collector, or a static web page that you scrape)

The "kafka_id" label here has to match the "kafka_id" label values in the scraped data.  Then whenever you do a query on one of the main metrics, you can do a join to add the extra metadata labels, something like this:
 
confluent_kafka_server_retained_bytes * on (kafka_id) group_left(departmentID,env) kafka_cluster_info

Or you can do filtering on the metadata to select only the clusters belonging to a particular department or for a particular environment, e.g.

confluent_kafka_server_retained_bytes * on (kafka_id) group_left(departmentID) kafka_cluster_info{env="production"}

For the full details of this approach see:


The tradeoff here is that your queries get more complex whenever you need the departmentID or environment labels, especially in alerting rules.  Adding the extra labels at scrape time keeps your queries simpler.

You can also combine both approaches: use recording rules with join queries like those above, to create new metrics with the extra labels.

 
I did infact try to wrap my head around using file_sd_configs but could not work out how the params part of it, so i gave up on that. It would be nice though, since our list of clusters keps growing every week.

If you're only scraping the API once (because you have an API limit to avoid) then a single target with static_configs is fine.

Regards,

Brian.

Christian Oelsner

unread,
Nov 22, 2022, 7:24:05 AM11/22/22
to Prometheus Users
Hi Brian,
Once again, thanks a lot for your assistance.
I went with using the metric_relabel_config you showed  in your first post.
It worked nicely.

Cheers :)

Regards
Christian Oelsner
Reply all
Reply to author
Forward
0 new messages