prometheus-cloudwatch-exporter stops working while scraping larger number of metrics

46 views

Skip to first unread message

mordowiciel

unread,

May 10, 2020, 2:01:58 PM5/10/20

to Prometheus Users

Hi everyone!

I'm having a problem with setting up prometheus-cloudwatch-exporter on Kubernetes cluster. I'm installing it from the following helm chart.

The config for exporter looks as follows:

region: eu-west-1
metrics:
  - aws_namespace: AWS/RDS
    aws_metric_name: CPUUtilization
    aws_dimensions: [DBInstanceIdentifier]
    aws_statistics: [Average]
  - aws_namespace: AWS/RDS
    aws_metric_name: DatabaseConnections
    aws_dimensions: [DBInstanceIdentifier]
    aws_statistics: [Average]
  - aws_namespace: AWS/RDS
    aws_metric_name: FreeableMemory
    aws_dimensions: [DBInstanceIdentifier]
    aws_statistics: [Average]
  - aws_namespace: AWS/RDS
    aws_metric_name: ReadIOPS
    aws_dimensions: [DBInstanceIdentifier]
    aws_statistics: [Average]
  - aws_namespace: AWS/RDS
    aws_metric_name: WriteIOPS
    aws_dimensions: [DBInstanceIdentifier]
    aws_statistics: [Average]

After the deployment, when I'm trying to access the /metrics endpoint of the exporter, the query takes a very long time - sometimes I'm getting a timeout, and sometimes I'm able to get the response after 30-40s. I'm also unable to query the metrics from the Prometheus console (the query returns no data response).

However, when I reduce the number of gathered metrics, for example to the following form:

region: eu-west-1
metrics:
  - aws_namespace: AWS/RDS
    aws_metric_name: CPUUtilization
    aws_dimensions: [DBInstanceIdentifier]
    aws_statistics: [Average]

The /metrics endpoint always provides the response in ~5s and I can see the scraped cpuutilzation metric in Prometheus console.

I've looked at the exporter and Prometheus logs and I didn't find anything interesting there - no stacktraces, errors etc.

For every metric I've provided above, the Cloudwatch API returns ~450 DBInstanceIdentifiers. It looks like the exporter is becoming overloaded when I try to query for the full set of the provided RDS metrics. Did anyone encounter the similar problem? Is it somehow possible to "scale" the exporter so it would handle scraping larger amounts of Cloudwatch data?

Sally Lehman

unread,

May 16, 2020, 1:05:02 AM5/16/20

to Prometheus Users

First of all, thanks! :) Your example config helped me isolate that my cloudwatch_exporter issue was just a config file problem. :)

I don't know why you would get a delay or timeout, but I can say that right now, I just tried both of your configs, and I get a response within 15s for the longer metrics list that you specified. I'm using the debug logging from this pull to see exactly when the scrape happens and what's in it. https://github.com/prometheus/cloudwatch_exporter/pull/225.

Reply all

Reply to author

Forward

0 new messages