Smokeping_prober CPU usage optimization possible?

79 views
Skip to first unread message

Alexander Wilke

unread,
Feb 20, 2024, 12:22:15 AM2/20/24
to Prometheus Users
Hello,
I am running smokeping_prober from one VM to Monitor around 500 destinations.
Around 30 devices are monitored with 0.2s Intervall and Others with 1.65s Intervall.

Prometheus scrapes every 5s.

So there are roughly 600 icmp ipv4 24byte pings per Seconds.
CPU usage jumps between 700-1200% using "top"

What Else except reducing Interval or Host Count could Help to reduce CPU usage?
Is the UDP Socket "better" or any other optimization which could be relevant for that Type of Traffic? Running on RHEL8

Someone with similar CPU usage and this amount of pings per Seconds? Maybe Others Ping 6.000 Destination every 10s?

Ben Kochie

unread,
Feb 20, 2024, 4:27:10 AM2/20/24
to Alexander Wilke, Prometheus Users
Best thing you can do is capture some pprof data. That will show you what it's spending the time on.

:9374/debug/pprof/heap 
:9374/debug/pprof/profile?seconds=30

You can post the results to https://pprof.me/ for sharing.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/d803c1a2-64ee-48d1-8513-b864856f53c8n%40googlegroups.com.

Alexander Wilke

unread,
Feb 22, 2024, 6:40:09 AM2/22/24
to Prometheus Users

Alexander Wilke

unread,
Feb 25, 2024, 1:08:33 PM2/25/24
to Prometheus Users
Hello,
any Chance to investigate the Reports and any suggestions?

Ben Kochie

unread,
Feb 25, 2024, 1:22:35 PM2/25/24
to Alexander Wilke, Prometheus Users
Looking at the CPU profile, I'm seeing almost all the time spent in the Go runtime. Mostly the ICMP packet receiving code and garbage collection. I'm not sure there's a lot we can optimize here as it's core Go code for ICMP packet handling.

Can you also post me a graph of a few metrics queries?

rate(process_cpu_seconds_total{job="smokeping_prober"}[30s])
rate(go_gc_duration_seconds_count{job="smokeping_prober"}[5m])
rate(go_gc_duration_seconds_sum{job="smokeping_prober"}[5m])


Alexander Wilke

unread,
Feb 25, 2024, 5:18:03 PM2/25/24
to Prometheus Users
Hello,

I attached a few screenshots showing the results and graphs for 1h and 6h.
In addition I added a screenshot from node_exporter metrics to give you an overview of the system itself.
On the same system there is prometheus, grafana, snmp_exporter (200-800% CPU), smokeping prober, node_exporter, blackbox_exporter.
The main CPU consumers are snmp_exporter and smokeping.

smokeping_queries_graph_6h_01.JPG
smokeping_queries_graph_1h_03.JPG
smokeping_queries_graph_6h_03.JPG
smokeping_queries_graph_6h_02.JPG
smokeping_queries_graph_1h_02.JPG
node_exporter_6h.JPG
smokeping_queries.JPG
smokeping_queries_graph_1h_01.JPG

Ben Kochie

unread,
Feb 27, 2024, 10:59:31 AM2/27/24
to Alexander Wilke, Prometheus Users
Interesting, thanks for the data. It does seem like the process is spending a lot of time doing GC like I thought.

One trick you could try is to increase the memory allocated to the prober, which would reduce the time spent on GC.

The default setting is is GOGC=100.

You could try increasing this by setting the environment variable, GOGC.

Try something like GOGC=200 or GOGC=300.

This will make the process use more memory, but it should reduce the CPU time spent.

Alexander Wilke

unread,
Feb 27, 2024, 2:12:19 PM2/27/24
to Prometheus Users
Hello Ben,

I googled a little bit and found this:

As far as I understand this variable is not working anymore or not used anymore !?
I tried in a test environment:

export GOGC=200

And the restarted (not reload) prometheus and in the UI --> "Runime & Build Information" the GOGC is still empty.

1.) Is this environment variable set correctly?
2.) Is the variable still working?
3.) If it is still working can I apply it only to smokeping_prober but not other services like prometheus? Sounds like higher GOGC has tradeoffs for queries in the prometheus tsdb ?

Ben Kochie

unread,
Feb 27, 2024, 4:24:55 PM2/27/24
to Alexander Wilke, Prometheus Users
The variable change is for the smokeping prober, not prometheus.

I don't know how you run your services.

Reply all
Reply to author
Forward
0 new messages