prometheus cannot collect metrics correctly after binary update

32 views
Skip to first unread message

nesa...@gmail.com

unread,
Jun 30, 2020, 12:57:10 PM6/30/20
to Prometheus Users
Hi,
We have a Prometheus server which is our primary Prometheus server. we had a Prometheus federation which used to gather all the metrics from the main Prometheus every 90 seconds. we now decided to stop using federation as the backup solution and move the Prometheus configuration to the ansible so both servers can have the same configuration at the same time, the main Prometheus and the federation's version was 2.15.2. after changing the federation server's configuration and putting the same configuration as the primary Prometheus on it, I decided to update the binary file to 2.19.2. 
At first, I got service unavailable alert for about 10 minutes, after an hour, we start to get lots of alerts claiming exporter has no data. 
The problem got fixed after 30 minutes  but again we start to get lots of alerts about it after few hours.
when I checked the logs, I saw all jobs have "contecxt deadline exceeded" alert (during the problem I couldn't connect to the web interface as well) whereas I don't get any alert from the primary Prometheus and everything works fine there.
here is my systemd configuration for the secondary Prometheus:

<pre style='color:#000000;background:#ffffff;'><span style='color:#808030; '>[</span><span style='color:#0000e6; '>Unit</span><span style='color:#808030; '>]</span>
<span style='color:#797997; '>Description</span><span style='color:#808030; '>=</span>Prometheus
<span style='color:#797997; '>After</span><span style='color:#808030; '>=</span>network-online<span style='color:#800000; font-weight:bold; '>.</span>target

<span style='color:#808030; '>[</span><span style='color:#0000e6; '>Service</span><span style='color:#808030; '>]</span>
<span style='color:#797997; '>Type</span><span style='color:#808030; '>=</span>simple
<span style='color:#797997; '>Environment</span><span style='color:#808030; '>=</span><span style='color:#0000e6; '>"</span><span style='color:#797997; '>GOMAXPROCS</span><span style='color:#808030; '>=</span><span style='color:#0000e6; '>8"</span>
<span style='color:#797997; '>User</span><span style='color:#808030; '>=</span>prometheus
<span style='color:#797997; '>Group</span><span style='color:#808030; '>=</span>prometheus
<span style='color:#797997; '>ExecReload</span><span style='color:#808030; '>=</span><span style='color:#40015a; '>/bin/kill</span> -HUP <span style='color:#797997; '>$MAINPID</span>
<span style='color:#797997; '>ExecStart</span><span style='color:#808030; '>=</span><span style='color:#40015a; '>/usr/local/sbin/prometheus</span> <span style='color:#0f69ff; '>\</span>
  --<span style='color:#797997; '>config.file</span><span style='color:#808030; '>=</span><span style='color:#40015a; '>/etc/prometheus/prometheus.yml</span> <span style='color:#0f69ff; '>\</span>
  --<span style='color:#797997; '>storage.tsdb.path</span><span style='color:#808030; '>=</span><span style='color:#40015a; '>/var/lib/prometheus</span> <span style='color:#0f69ff; '>\</span>
  --<span style='color:#797997; '>storage.tsdb.retention.time</span><span style='color:#808030; '>=</span>30d <span style='color:#0f69ff; '>\</span>
  --<span style='color:#797997; '>storage.tsdb.retention.size</span><span style='color:#808030; '>=</span>275GB <span style='color:#0f69ff; '>\</span>
  --<span style='color:#797997; '>web.console.libraries</span><span style='color:#808030; '>=</span><span style='color:#40015a; '>/etc/prometheus/console_libraries</span> <span style='color:#0f69ff; '>\</span>
  --<span style='color:#797997; '>web.console.templates</span><span style='color:#808030; '>=</span><span style='color:#40015a; '>/etc/prometheus/consoles</span> <span style='color:#0f69ff; '>\</span>
  --web<span style='color:#800000; font-weight:bold; '>.</span><span style='color:#bb7977; font-weight:bold; '>enable</span>-admin-api <span style='color:#0f69ff; '>\</span>
  --<span style='color:#797997; '>web.listen-address</span><span style='color:#808030; '>=</span><span style='color:#008c00; '>0</span><span style='color:#800000; font-weight:bold; '>.</span><span style='color:#008c00; '>0</span><span style='color:#800000; font-weight:bold; '>.</span><span style='color:#008c00; '>0</span><span style='color:#800000; font-weight:bold; '>.</span><span style='color:#008c00; '>0</span><span style='color:#808030; '>:</span><span style='color:#008c00; '>9090</span> 

<span style='color:#797997; '>CapabilityBoundingSet</span><span style='color:#808030; '>=</span>CAP_SET_UID
<span style='color:#797997; '>LimitNOFILE</span><span style='color:#808030; '>=</span><span style='color:#008c00; '>65000</span>
<span style='color:#797997; '>LockPersonality</span><span style='color:#808030; '>=</span><span style='color:#44aadd; '>true</span>
<span style='color:#797997; '>NoNewPrivileges</span><span style='color:#808030; '>=</span><span style='color:#44aadd; '>true</span>
<span style='color:#797997; '>MemoryDenyWriteExecute</span><span style='color:#808030; '>=</span><span style='color:#44aadd; '>true</span>
<span style='color:#797997; '>PrivateDevices</span><span style='color:#808030; '>=</span><span style='color:#44aadd; '>true</span>
<span style='color:#797997; '>PrivateTmp</span><span style='color:#808030; '>=</span><span style='color:#44aadd; '>true</span>
<span style='color:#797997; '>ProtectHome</span><span style='color:#808030; '>=</span><span style='color:#44aadd; '>true</span>
<span style='color:#797997; '>RemoveIPC</span><span style='color:#808030; '>=</span><span style='color:#44aadd; '>true</span>
<span style='color:#797997; '>RestrictSUIDSGID</span><span style='color:#808030; '>=</span><span style='color:#44aadd; '>true</span>
<span style='color:#797997; '>CPUAccounting</span><span style='color:#808030; '>=</span>yes
<span style='color:#797997; '>MemoryAccounting</span><span style='color:#808030; '>=</span>yes
<span style='color:#696969; '>#SystemCallFilter=@signal @timer</span>

<span style='color:#797997; '>ReadWritePaths</span><span style='color:#808030; '>=</span><span style='color:#40015a; '>/var/lib/prometheus</span>

<span style='color:#797997; '>PrivateUsers</span><span style='color:#808030; '>=</span><span style='color:#44aadd; '>true</span>
<span style='color:#797997; '>ProtectControlGroups</span><span style='color:#808030; '>=</span><span style='color:#44aadd; '>true</span>
<span style='color:#797997; '>ProtectKernelModules</span><span style='color:#808030; '>=</span><span style='color:#44aadd; '>true</span>
<span style='color:#797997; '>ProtectKernelTunables</span><span style='color:#808030; '>=</span><span style='color:#44aadd; '>true</span>
<span style='color:#797997; '>ProtectSystem</span><span style='color:#808030; '>=</span>strict


<span style='color:#797997; '>SyslogIdentifier</span><span style='color:#808030; '>=</span>prometheus
<span style='color:#797997; '>Restart</span><span style='color:#808030; '>=</span>always

<span style='color:#808030; '>[</span><span style='color:#0000e6; '>Install</span><span style='color:#808030; '>]</span>
<span style='color:#797997; '>WantedBy</span><span style='color:#808030; '>=</span>multi-user<span style='color:#800000; font-weight:bold; '>.</span>target
</pre>
<!--Created using ToHtml.com on 2020-06-30 16:52:20 UTC -->

Yashar Nesabian

unread,
Jun 30, 2020, 12:59:59 PM6/30/20
to Prometheus Users
I'm sorry for the problem in showing the systemd configuration, Here is the correct configuration:
[Unit]
Description=Prometheus
After=network-online.target

[Service]
Type=simple
Environment="GOMAXPROCS=8"
User=prometheus
Group=prometheus
ExecReload=/bin/kill -HUP $MAINPID
ExecStart=/usr/local/sbin/prometheus \
  --config.file=/etc/prometheus/prometheus.yml \
  --storage.tsdb.path=/var/lib/prometheus \
  --storage.tsdb.retention.time=30d \
  --storage.tsdb.retention.size=275GB \
  --web.console.libraries=/etc/prometheus/console_libraries \
  --web.console.templates=/etc/prometheus/consoles \
  --web.enable-admin-api \
  --web.listen-address=0.0.0.0:9090 

CapabilityBoundingSet=CAP_SET_UID
LimitNOFILE=65000
LockPersonality=true
NoNewPrivileges=true
MemoryDenyWriteExecute=true
PrivateDevices=true
PrivateTmp=true
ProtectHome=true
RemoveIPC=true
RestrictSUIDSGID=true
CPUAccounting=yes
MemoryAccounting=yes
#SystemCallFilter=@signal @timer

ReadWritePaths=/var/lib/prometheus

PrivateUsers=true
ProtectControlGroups=true
ProtectKernelModules=true
ProtectKernelTunables=true
ProtectSystem=strict


SyslogIdentifier=prometheus
Restart=always

[Install]
WantedBy=multi-user.target

Julien Pivotto

unread,
Jun 30, 2020, 1:34:19 PM6/30/20
to nesa...@gmail.com, Prometheus Users
Can you check if you have leftover JAEGER environment variables by any
chance?
> --
> You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/5362956f-268d-477e-9d9b-0146e73f2fden%40googlegroups.com.


--
Julien Pivotto
@roidelapluie

Yashar Nesabian

unread,
Jun 30, 2020, 1:40:15 PM6/30/20
to Prometheus Users
Hi Julien, I double-checked the envs and we don't have such an environment 
> To unsubscribe from this group and stop receiving emails from it, send an email to promethe...@googlegroups.com.

Yashar Nesabian

unread,
Jul 7, 2020, 2:37:50 AM7/7/20
to Prometheus Users
When we encounter this problem, I see lots of soft interrupts on one of our CPUs :


this server has 8 CPU cores but only one of them is used, I have set GOMAXPROCS=8 in the systemd unit


On Tuesday, June 30, 2020 at 9:27:10 PM UTC+4:30, Yashar Nesabian wrote:
Reply all
Reply to author
Forward
0 new messages