metrics from cloudwatch_exporter are not digested

66 views
Skip to first unread message

Moses Moore

unread,
Aug 28, 2019, 12:59:19 PM8/28/19
to Prometheus Users
I've deployed the official Amazon Cloudwatch exporter (https://github.com/prometheus/cloudwatch_exporter)[1] and I'm satisfied with the metrics offered by the exporter, and when I instruct Prometheus to pick up the metrics from this exporter, Prometheus says it successfully picked up all the metrics (`scrape_samples_scraped` matches `curl -s localhost:9106/metrics |grep -cv \#`)
... but when I ask Prometheus for the metrics, almost all of them are missing.

I've been at this for a few days now and I am stumped.  I hope someone else has encountered this problem before.

Config files:
/etc/prometheus/cloudwatch.yml
```
---
metrics:
- aws_dimensions: [Class,Resource,Service,Type]
  aws_metric_name: CallCount
  aws_namespace: AWS/Usage
  aws_statistics: [Average,Maximum]
region: ca-central-1
```
launched with `/usr/bin/java -jar /opt/prometheus/cloudwatch_exporter.jar 9106 /etc/prometheus/cloudwatch.yml
`curl -s localhost:9106/metrics gives me:
```
cloudwatch_requests_total 64.0
aws_usage_call_count_maximum{job="aws_usage",instance="",type="API",resource="ListMetrics",service="CloudWatch",class="None",} 1.0 1567009560000
aws_usage_call_count_maximum{job="aws_usage",instance="",type="API",resource="GetMetricStatistics",service="CloudWatch",class="None",} 1.0 1567009560000
aws_usage_call_count_average{job="aws_usage",instance="",type="API",resource="ListMetrics",service="CloudWatch",class="None",} 1.0 1567009560000
aws_usage_call_count_average{job="aws_usage",instance="",type="API",resource="GetMetricStatistics",service="CloudWatch",class="None",} 1.0 1567009560000
cloudwatch_exporter_scrape_duration_seconds 0.0508605
cloudwatch_exporter_scrape_error 0.0
```
so far so good.  having "job" as a label, and an explicitly empty "instance" label are weird, but this is the official exporter so I guess they know what they're doing.

Prometheus config to scrape this exporter is:
```
---
global: null
alerting: {alertmangers: null}
rule_files: null
scrape_configs:
- job_name: aws-cloudwatch
  static_configs:
  - { targets: ['localhost:9106'] }
```

after waiting a minute, when I ask prometheus for metrics:
```
$ curl -gs 'localhost:9090/api/v1/query?query=scrape_samples_scraped{job="aws-cloudwatch"}' |jq -rS '.data.result[]|"\(.metric.__name__)  \(.value[1])"'`
scrape_samples_scraped  7
```
seven lines from the exporter, 7 scrape samples scraped, looks good.  So now I try to get one of the metrics:
```
curl -gs 'localhost:9090/api/v1/query?query=aws_usage_call_count_maximum'
{"status":"success","data":{"resultType":"vector","result":[]}}
```
Empty.  But if I ask for the only three metrics that don't have job/instance labels i.e. cloudwatch_requests_total

```
curl -gs 'localhost:9090/api/v1/query?query=cloudwatch_requests_total' |jq .data.result[0]
{
  "metric": {
    "__name__": "cloudwatch_requests_total",
    "instance": "localhost:9106",
    "job": "aws-cloudwatch-AWS"
  },
  "value": [ 1567010659.893, "88" ]
}
```
There it is.

What am I doing wrong here that's causing prometheus to pick up 7 metrics but only keep 3 of them?  I launched prometheus with --log.level=info and I don't see error messages by prometheus.

Moses Moore

unread,
Aug 28, 2019, 5:18:57 PM8/28/19
to Prometheus Users
I'm also not getting the metrics when I use "/federate" instead of "/v1/api/query".
I should mention I'm still using Prometheus v2.11.0 built with go v1.12

Waitaminute, maybe it's the timestamps -- I can't query for the current state of the metrics that cloudwatch_exporter appends timestamps to.
Aren't we supposed not to put timestamps in our exporter output?  As per https://prometheus.io/docs/instrumenting/writing_exporters/ 
> Accordingly, you should not set timestamps on the metrics you expose, let Prometheus take care of that. If you think you need timestamps, then you probably need the Pushgateway instead.

Am I still doing something wrong, or is it time to file a bug report?

Moses Moore

unread,
Mar 10, 2020, 4:38:35 PM3/10/20
to Prometheus Users
Okay I figured out what's going on, but I'm still scratching my head about how to federate the metrics gleaned by cloudwatch_exporter.

The timestamps are there for a reason -- I could ask for metrics from Cloudwatch exporter and specify "300s ago please" and cloudwatch will return "well, I've got cpu usage from this machine from 385s ago and from that machine from 445s ago..." and cloudwatch_exporter does the right thing and emits the timestamps with the metrics (if I don't tell it to omit the timestamps and prometheus assumes it's always "now").  So far so good.

Now the problem I have is when I try to bring these metrics back from the small collector prometheus to the datacentre where I can afford things like 14d of local storage, Thanos, Grafana, and I have permission to emit alerts to email, Pagerduty, et al.  If I ask for the latest metrics with "/federate" I get up-to-date stuff from node_exporter and process_exporter and mysqld_exporter ... but the cloudwatch_exporter stuff, because they're dated anywhere from 6 to 10 minutes ago, never appear.  I tried adding --query.lookback-delta=8m  but I've been warned elsewhere on this mailing list that I'm going to regret bumping it up from the default 5m... and it doesn't always work for things that are <7m old.

I could tell cloudwatch_exporter to drop the timestamps on all the metrics it gleans, but then comparing ELB traffic of this exporter against request count from another exporter will always be out-of-synch (and because of that 385s...445s thing above, I can't just use a constant offset to correct it.  So I'd rather avoid it if I can.

How can I get a /federate scrape to bring back the latest albeit late metrics published by cloudwatch_exporter ?
Reply all
Reply to author
Forward
0 new messages