Hi everyone!
I'm having a problem with setting up prometheus-cloudwatch-exporter on Kubernetes cluster. I'm installing it from the following
helm chart.
The config for exporter looks as follows:
region: eu-west-1
metrics:
- aws_namespace: AWS/RDS
aws_metric_name: CPUUtilization
aws_dimensions: [DBInstanceIdentifier]
aws_statistics: [Average]
- aws_namespace: AWS/RDS
aws_metric_name: DatabaseConnections
aws_dimensions: [DBInstanceIdentifier]
aws_statistics: [Average]
- aws_namespace: AWS/RDS
aws_metric_name: FreeableMemory
aws_dimensions: [DBInstanceIdentifier]
aws_statistics: [Average]
- aws_namespace: AWS/RDS
aws_metric_name: ReadIOPS
aws_dimensions: [DBInstanceIdentifier]
aws_statistics: [Average]
- aws_namespace: AWS/RDS
aws_metric_name: WriteIOPS
aws_dimensions: [DBInstanceIdentifier]
aws_statistics: [Average]
After the deployment, when I'm trying to access the /metrics endpoint of the exporter, the query takes a very long time - sometimes I'm getting a timeout, and sometimes I'm able to get the response after 30-40s. I'm also unable to query the metrics from the Prometheus console (the query returns no data response).
However, when I reduce the number of gathered metrics, for example to the following form:
region: eu-west-1
metrics:
- aws_namespace: AWS/RDS
aws_metric_name: CPUUtilization
aws_dimensions: [DBInstanceIdentifier]
aws_statistics: [Average]
The /metrics endpoint always provides the response in ~5s and I can see the scraped cpuutilzation metric in Prometheus console.
I've looked at the exporter and Prometheus logs and I didn't find anything interesting there - no stacktraces, errors etc.
For every metric I've provided above, the Cloudwatch API returns ~450 DBInstanceIdentifiers. It looks like the exporter is becoming overloaded when I try to query for the full set of the provided RDS metrics. Did anyone encounter the similar problem? Is it somehow possible to "scale" the exporter so it would handle scraping larger amounts of Cloudwatch data?