Prometheus server increases CPU usage beyond 200%

34 views

Skip to first unread message

Isabel Noronha

unread,

May 14, 2020, 4:45:22 AM5/14/20

to Prometheus Users

Hi,

Server config where prometheus is running:

160 CPU cores

500 Gb RAM

2Tb Hardisk.

Prometheus version:2.18.0

cadvisor version:0.36.0

Prometheus is running inside a container.

I have already done relabeling.

Retention period is 15days.

I am using Cadvisor to get metrics from containers around 4k containers.

I have done relabeling for container metrics as well.

Scrape interval is 40s

I use top command to check the CPU usage.

So to my surprise Prometheus was exceeding 200% CPU usage.

On this server (where prometheus server is running ) has around 2K containers.

On another target 2K containers,

So overall 4K containers.

Could anyone help me understand the possible reasons for prometheus to increase the CPU usage?

prometheus .yml

# my global config

global:

scrape_interval: 15s # By default, scrape targets every 15 seconds.

evaluation_interval: 15s # By default, scrape targets every 15 seconds.

# scrape_timeout is set to the global default (10s).

# Attach these labels to any time series or alerts when communicating with

# external systems (federation, remote storage, Alertmanager).

external_labels:

monitor: ‘prometheus-monitor’

# Load and evaluate rules in this file every 'evaluation_interval' seconds.

rule_files:

#- 'alert.rules'

# - "first.rules"

- "alert_rules.yml"

# alert

alerting:

alertmanagers:

- scheme: http

static_configs:

- targets:

- "server:9093"

# A scrape configuration containing exactly one endpoint to scrape:

# Here it's Prometheus itself.

scrape_configs:

# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.

- job_name: 'prometheus'

# Override the global default and scrape targets from this job every 5 seconds.

scrape_interval: 40s

scrape_timeout: 40s

static_configs:

- targets: ['localhost:9010']

- job_name: 'cadvisor'

# Override the global default and scrape targets from this job every 5 seconds.

scrape_interval: 40s

scrape_timeout: 40s

static_configs:

- targets: [server1:8080',server2:8080',server3:8080']

metric_relabel_configs:

- source_labels: [__name__]

action: keep

- job_name: 'node-exporter'

# Override the global default and scrape targets from this job every 5 seconds.

scrape_interval: 15s

scrape_timeout: 15s

static_configs:

- targets: [server1:9100',server2:9100',server3:9100']

metric_relabel_configs:

- source_labels: [__name__]

action: keep

- job_name: 'docker'

# metrics_path defaults to '/metrics'

# scheme defaults to 'http'.

scrape_interval: 5s

static_configs:

- targets: ['172.17.0.1:9999']

Thank you!

Stuart Clark

unread,

May 14, 2020, 4:50:22 AM5/14/20

to Isabel Noronha, Prometheus Users

On 2020-05-14 09:45, Isabel Noronha wrote:
> Hi,
>
> Server config where prometheus is running:
> 160 CPU cores
> 500 Gb RAM
> 2Tb Hardisk.
>
> Prometheus version:2.18.0
> cadvisor version:0.36.0
>
> Prometheus is running inside a container.
> I have already done relabeling.
> Retention period is 15days.
>
> I am using Cadvisor to get metrics from containers around 4k
> containers.
> I have done relabeling for container metrics as well.
>
> Scrape interval is 40s
>
> I use top command to check the CPU usage.
> So to my surprise Prometheus was exceeding 200% CPU usage.
> On this server (where prometheus server is running ) has around 2K
> containers.
>

Memory, CPU and disk usage will be for down to a number of different
tasks:

- Scraping (more targets/time series, more resources)
- Recording rules (more rule touching more data, more resources)
- Queries (more & more complex, more resources)
- WAL processing, compaction and expiry (more time series, more
resources)

Those different usages will add together. There are various metrics to
show the number of scrapes, timeseries, queries, etc.

> --
> You received this message because you are subscribed to the Google
> Groups "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to prometheus-use...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/41576f93-da37-4524-aba8-8e5d0e595402%40googlegroups.com
> [1].
>
>
> Links:
> ------
> [1]
> https://groups.google.com/d/msgid/prometheus-users/41576f93-da37-4524-aba8-8e5d0e595402%40googlegroups.com?utm_medium=email&utm_source=footer

--
Stuart Clark

Reply all

Reply to author

Forward

0 new messages