High CPU and mem in Prometheus 2.3.2

242 views
Skip to first unread message

anume...@gmail.com

unread,
Mar 4, 2019, 12:49:21 PM3/4/19
to Prometheus Users
Hello,

I am running Prometheus on high stress (126-135 timeseries) over the weekend and I notice that the CPU utilization of Prometheus is above 100%. Over a 5minute window, the CPU utilization goes above 100% every 1 minute, but when averaged over a long time, its above 100% continuously. I initially thought it is the go garbage collector but the spike lasts for more than 1sec so I am not sure. Please the graph below.


HighCPUProm2.3.2.png

Please can you let me know what could be going on? I can share the heap and CPU profile if required. Also, the OS memory for Prometheus (inside /proc/<pid>/) is ~450MB while pprof heap profile of Prometheus shows only ~120MB. I am not sure why as well.

Any help is much appreciated.

Thank you
Anu

Simon Pasquier

unread,
Mar 5, 2019, 6:03:37 AM3/5/19
to anume...@gmail.com, Prometheus Users
A CPU profile would help to understand what is happening. You would also need to provide more details about your setup (number of targets, series, machine specifications).
I wouldn't be too worried about the discrepancy between the heap profile and the process metrics as lots of factors come into play here.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/84b55914-fdbe-4889-a912-0c172ab49ac6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

anume...@gmail.com

unread,
Mar 5, 2019, 1:08:40 PM3/5/19
to Prometheus Users
Could you please tell me what are the factors that could come into consideration between the heap profile and process metrics? I have about 126-130 time series, I will get more information about number of targets and machine specs. 


I took two cpu profiles back to back for 30sec, since I was seeing the spikes every 1-3minute. 

image

image

Simon Pasquier

unread,
Mar 6, 2019, 3:54:24 AM3/6/19
to anume...@gmail.com, Prometheus Users
On Tue, Mar 5, 2019 at 7:08 PM <anume...@gmail.com> wrote:
>
> Could you please tell me what are the factors that could come into consideration between the heap profile and process metrics? I have about 126-130 time series, I will get more information about number of targets and machine specs.


The heap profile reports only the memory allocated for the heap. It
doesn't account for all the memory used by the process.

>
>
>
> I took two cpu profiles back to back for 30sec, since I was seeing the spikes every 1-3minute.
>
>
>

Sorry the picture definition is too low and I can't read them.
As Ben noted in the GitHub issue, the source of truth is the
"process_cpu_seconds_total" metric which doesn't exhibit any


>
>
> On Tuesday, March 5, 2019 at 3:03:37 AM UTC-8, Simon Pasquier wrote:
>>
>> A CPU profile would help to understand what is happening. You would also need to provide more details about your setup (number of targets, series, machine specifications).
>> I wouldn't be too worried about the discrepancy between the heap profile and the process metrics as lots of factors come into play here.
>>
>> On Mon, Mar 4, 2019 at 6:49 PM <anume...@gmail.com> wrote:
>>>
>>> Hello,
>>>
>>> I am running Prometheus on high stress (126-135 timeseries) over the weekend and I notice that the CPU utilization of Prometheus is above 100%. Over a 5minute window, the CPU utilization goes above 100% every 1 minute, but when averaged over a long time, its above 100% continuously. I initially thought it is the go garbage collector but the spike lasts for more than 1sec so I am not sure. Please the graph below.
>>>
>>>
>>> Please can you let me know what could be going on? I can share the heap and CPU profile if required. Also, the OS memory for Prometheus (inside /proc/<pid>/) is ~450MB while pprof heap profile of Prometheus shows only ~120MB. I am not sure why as well.
>>>
>>> Any help is much appreciated.
>>>
>>> Thank you
>>> Anu
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
>>> To post to this group, send email to promethe...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/84b55914-fdbe-4889-a912-0c172ab49ac6%40googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
> To post to this group, send email to promethe...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/27af9e01-6efc-42b7-b343-a17dbd6d4d60%40googlegroups.com.

Anu Mercian

unread,
Mar 6, 2019, 7:14:58 PM3/6/19
to Simon Pasquier, Prometheus Users
Hi Simon,

The heap profile seems to be taking the bulk for json encoding (json-iterator/go). Attaching the svg files. It would be good if I can understand why the OS memory for Prometheus is 450MB while heap shows much less. What are the factors? Is it virtual memory? Would you be able to define them? Sorry if my question is very naive. 

Thank you for confirming that cpu query confirms that Prometheus is not causing high CPU.

Regards,
Anu


heap3.svg
heap1.svg

Simon Pasquier

unread,
Mar 7, 2019, 5:49:26 AM3/7/19
to Anu Mercian, Prometheus Users
On Thu, Mar 7, 2019 at 1:15 AM Anu Mercian <anume...@gmail.com> wrote:
>
> Hi Simon,
>
> The heap profile seems to be taking the bulk for json encoding (json-iterator/go). Attaching the svg files. It would be good if I can understand why the OS memory for Prometheus is 450MB while heap shows much less. What are the factors? Is it virtual memory? Would you be able to define them? Sorry if my question is very naive.

The profiles show that most of the memory is used by the query API
endpoint meaning that you're running a query that returns lots of
datapoints.

See also https://golang.org/doc/faq#Why_does_my_Go_process_use_so_much_virtual_memory

Anu Mercian

unread,
Mar 7, 2019, 8:18:23 AM3/7/19
to Simon Pasquier, Prometheus Users
Thank you for the clarifications and explanations.
Reply all
Reply to author
Forward
0 new messages