Prometheus is very slow in k8s

Redneck_prm

unread,

Aug 31, 2020, 4:59:17 AM8/31/20

to Prometheus Users

Hi all (and sorry about my English level)
I have а issue with Prometheus in k8s (deployed by Prometheus operator).
After the start of the container, it is very slow and not responsive.
Process prometeus consume 100-150% cpu (1-1.5 core vs 0.7 core on another, but identical cluster), and response take 5-20 seconds.
LA is growing on the node to 7-10 but it is not a lack of resources on the node, other containers work fine. Prometheus container has no cpu and memory limits. Metrics targets are responsive.

Inside the container, the commands ps and top are also slowed down. Other commands work normal.

[master1 ~]$ time curl http://10.233.67.83:9090
<a href="/graph">Found</a>.
real    0m8.400s
user    0m0.004s
sys     0m0.014s

I have tried different versions of the Prometheus, different versions of the Prometheus operator, it did not help.
I have tried a deployment without the operator and got the same result.

There is nothing useful in logs, even in debug mode.

Can anyone help with diagnostics and debag methods?

helm values.yaml:

# helm install prom-operator stable/prometheus-operator --namespace monitoring -f values.yaml --version 9.3.1

prometheus:
ingress:
    enabled: true
    ...

prometheusSpec:
#    logLevel: debug
    retention: 30m
    image:
      tag: v2.20.1
    externalLabels:
      k8s_cluster: "cluster1"

Brian Candler

unread,

Aug 31, 2020, 6:22:15 AM8/31/20

to Prometheus Users

This is a standard Unix performance debugging problem - finding the bottleneck.

The key area you don't mention is disk I/O. What sort of backend storage does your prometheus container use?

Redneck_prm

unread,

Aug 31, 2020, 6:31:39 AM8/31/20

to Prometheus Users

inside the container:

/prometheus $ dd if=/dev/zero of=file.txt count=1024 bs=1048576
1024+0 records in
1024+0 records out
1073741824 bytes (1.0GB) copied, 1.156714 seconds, 885.3MB/s
/prometheus $ dd if=file.txt of=file2.txt count=1024 bs=1048576
1024+0 records in
1024+0 records out

1073741824 bytes (1.0GB) copied, 3.352250 seconds, 305.5MB/s

disk is some network storage, with limits 500Mb/s and 5000 IOPS

I am curious why ps and top slow inside the container? What they need?

понедельник, 31 августа 2020 г. в 15:22:15 UTC+5, b.ca...@pobox.com:

Brian Candler

unread,

Aug 31, 2020, 6:39:37 AM8/31/20

to Prometheus Users

On Monday, 31 August 2020 11:31:39 UTC+1, Redneck_prm wrote:

disk is some network storage

Meaning what exactly: block storage like iSCSI for example? Prometheus does not work well with NFS storage.

I am curious why ps and top slow inside the container? What they need?

In that case, it's clearly not a prometheus problem, it's something to do with your container or your k8s environment. Try using strace to see the system calls, e.g.

strace -s 128 -f ps ....etc... 2>strace.out

then go through the calls in strace.out. Adding "-ttt" can give you better timestamps.

Some possibilities: a problem with DNS; a problem with uid/gid resolution in nsswitch (especially if you're using something like an LDAP backend); a problem with one of your filesystems.

Redneck_prm

unread,

Aug 31, 2020, 7:02:14 AM8/31/20

to Prometheus Users

Meaning what exactly: block storage like iSCSI for example?

I think, it is iscsi. It is drive on rhvm virtual machine.

In that case, it's clearly not a prometheus problem, it's something to do with your container or your k8s environment.

ps and top slows only when prometheus use 1 core. On start they are responsive.

strace -s 128 -f ps ....etc... 2>strace.out

strace for ps - hang on

open("/proc/1/cmdline", O_RDONLY) = 3
read(3,

and cat /proc/1/cmdline take a lot of time

strace prometheus process:

strace -p 41850
strace: Process 41850 attached
futex(0x44daec8, FUTEX_WAIT_PRIVATE, 0, NULL

понедельник, 31 августа 2020 г. в 15:39:37 UTC+5, b.ca...@pobox.com:

Brian Candler

unread,

Aug 31, 2020, 8:34:22 AM8/31/20

to Prometheus Users

On Monday, 31 August 2020 12:02:14 UTC+1, Redneck_prm wrote:

and cat /proc/1/cmdline take a lot of time

Inside the container, is pid 1 the prometheus process, or are you running with an init process? (in the case of k8s this could be "shareProcessNamespace: true" and pid 1 would be "pause")

What does cat /proc/1/cmdline show as its result?

Something very weird is going on if this does not return immediately.

Brian Candler

unread,

Aug 31, 2020, 8:49:03 AM8/31/20

to Prometheus Users

On Monday, 31 August 2020 12:02:14 UTC+1, Redneck_prm wrote:

strace prometheus process:

strace -p 41850
strace: Process 41850 attached
futex(0x44daec8, FUTEX_WAIT_PRIVATE, 0, NULL

You'll need to add -f flag to get useful output, e.g.

strace -f -p 41850

Aside: does this mean you were running strace outside the container, i.e. pid 1 inside the container is pid 41850 on the host? If so, what happens if you do "cat /proc/41850/cmdline" ?

Redneck_prm

unread,

Aug 31, 2020, 9:12:31 AM8/31/20

to Prometheus Users

cat /proc/1/cmdline
/bin/prometheus--web.console.templates=/etc/prometheus/consoles--web.console.libraries=/etc/prometheus/console_libraries--config.file=/etc/prometheus/config_out/prometheus.env.yaml--storage.tsdb.path=/prometheus--storage.tsdb.retention.time=30m--web.enable-lifecycle--storage.tsdb.no-lockfile--web.external-url=http://prometheus.loc/--web.route-prefix=//

strace -f

stream hangs on for a few seconds on different places:

[pid 15911] <... read resumed> 0xc0043b7000, 4096) = -1 EAGAIN (Resource temporarily unavailable)
[pid 15338] tgkill(1, 13, SIGURG <unfinished ...>
[pid 15911] epoll_pwait(4, [{EPOLLIN|EPOLLOUT, {u32=976213944, u64=140562370910136}}], 128, 0, NULL, 99926) = 1
[pid 15349] madvise(0xc00d600000, 4194304, MADV_HUGEPAGE <unfinished ...>
[pid 15911] read(92, <unfinished ...>
[pid 15338] <... tgkill resumed> ) = 0

[pid 15347] setsockopt(92, SOL_SOCKET, SO_BROADCAST, [1], 4 <unfinished ...>
[pid 15911] write(69, "\27\3\3\0%\0\0\0\0\0\0\1\323\17&\324eXA\352?\375\323-\2175d\341\t\247f\333"..., 42 <unfinished ...>
[pid 15349] <... epoll_ctl resumed> )   = 0
[pid 15347] <... setsockopt resumed> ) = 0
[pid 15346] <... epoll_ctl resumed> )   = -1 EPERM (Operation not permitted)
[pid 15341] write(63, "\27\3\3\0%\0\0\0\0\0\0\0\21\313\227@\244^U\273C\200\203\344\206K\250bM\201U4"..., 42 <unfinished ...>
[pid 15911] <... write resumed> )       = 42

[pid 27265] mmap(0xc010000000, 67108864, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
[pid 15353] read(21, <unfinished ...>
[pid 15352] epoll_ctl(4, EPOLL_CTL_DEL, 22, 0xc00155e6f4 <unfinished ...>
[pid 15347] <... epoll_pwait resumed> [], 128, 0, NULL, 99928) = 0
[pid 15346] <... tgkill resumed> ) = 0
[pid 15317] <... nanosleep resumed> NULL) = 0

понедельник, 31 августа 2020 г. в 17:49:03 UTC+5, b.ca...@pobox.com:

Brian Candler

unread,

Aug 31, 2020, 12:47:08 PM8/31/20

to Prometheus Users

On Monday, 31 August 2020 14:12:31 UTC+1, Redneck_prm wrote:

cat /proc/1/cmdline
/bin/prometheus--web.console.templates=/etc/prometheus/consoles--web.console.libraries=/etc/prometheus/console_libraries--config.file=/etc/prometheus/config_out/prometheus.env.yaml--storage.tsdb.path=/prometheus--storage.tsdb.retention.time=30m--web.enable-lifecycle--storage.tsdb.no-lockfile--web.external-url=http://prometheus.loc/--web.route-prefix=//

--web.route-prefix=// looks wrong.

--storage.tsdb.no-lockfile doesn't even appear to be documented.

But if prometheus operator sets these by default, then I guess they must be OK.

strace -f
stream hangs on for a few seconds on different places:

[pid 15911] <... read resumed> 0xc0043b7000, 4096) = -1 EAGAIN (Resource temporarily unavailable)
[pid 15338] tgkill(1, 13, SIGURG <unfinished ...>
[pid 15911] epoll_pwait(4, [{EPOLLIN|EPOLLOUT, {u32=976213944, u64=140562370910136}}], 128, 0, NULL, 99926) = 1
[pid 15349] madvise(0xc00d600000, 4194304, MADV_HUGEPAGE <unfinished ...>
[pid 15911] read(92, <unfinished ...>
[pid 15338] <... tgkill resumed> ) = 0

Those are different threads mixed up so it's hard to see what's what, and you'd also need to look at more context to find what these fd's actually refer to (e.g. fd 92 for the read, find the corresponding open or connect call). These might just be scrapes.

Sorry, I can't help you. I've not seen the problem you describe.

Reply all

Reply to author

Forward