I am running Prometheus on a 4-core Amazon instance and it's CPU usage during the day is always very close to 100% or otherwise it's very spiky. Is it worth increasing the number of cores in my case to a few more? I'm attaching a graph below. The disk utilisation is also very spiky I have noticed.
**Environment**
AWS EC2. "m3.xlarge" (4 x vCPU and 15GB of RAM). We are using an external EBS volume for storage.
* System information:
Linux 4.4.0-1013-aws x86_64
* Prometheus version:
prometheus, version 1.6.2 (branch: master, revision: b38e977fd8cc2a0d13f47e7f0e17b82d1a908a9a)
build user: root@c99d9d650cf4
build date: 20170511-12:59:13
go version: go1.8.1
* Container running config:
"/bin/prometheus -config.file=/etc/prometheus/prometheus.yml -storage.local.path=/prometheus -alertmanager.url=
http://alertmanager:9093 -storage.local.target-heap-size=
8053063680"
* Prometheus configuration file:
```
# my global config
global:
scrape_interval: 15s # By default, scrape targets every 15 seconds.
evaluation_interval: 15s # By default, scrape targets every 15 seconds.
# Load and evaluate rules in this file every 'evaluation_interval' seconds.
rule_files:
- "alert.rules"
# - "first.rules"
# - "second.rules"
scrape_configs:
- job_name: 'consul'
scrape_interval: 4s
metrics_path: '/__prometheus/pull'
consul_sd_configs:
relabel_configs:
- source_labels: [__meta_consul_tags]
regex: .*,http,.*
action: keep
- source_labels: [__meta_consul_tags]
regex: '.*,(http),.*'
replacement: '${1}'
target_label: instance
- job_name: 'pushgateway'
scrape_interval: 4s
honor_labels: true
metrics_path: '/metrics'
static_configs:
- targets:
- 'pushgateway:9091'
metric_relabel_configs:
- source_labels: [__scheme__]
target_label: instance
replacement: 'http'
- job_name: 'prometheus'
scrape_interval: 10s
metrics_path: '/metrics'
static_configs:
- targets: ['localhost:9090','cadvisor:8080','node-exporter:9100']
```