one pod of prometheus always restarts after 'Received SIGTERM, exiting gracefully...'

544 views
Skip to first unread message

Павел Александров

unread,
Nov 27, 2020, 7:02:20 AM11/27/20
to Prometheus Users

What did you do?
update prometheus by helm3 to 2.21.0 version
What did you expect to see?
two nodes should work fine
What did you see instead? Under which circumstances?
prometheus-prometheus-0 restarts every 5 minutes
prometheus-prometheus-1 works fine
Environment
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.10", GitCommit:"afea77eefe3272c0f53c4552183ecb16c091db41", GitTreeState:"clean", BuildDate:"2020-09-17T06:32:50Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

  • System information:

AKS

  • Prometheus version:

prom = v2.21.0
thanos-sidecar = v0.16.0

  • Logs:
{"caller":"main.go:346","level":"info","msg":"Starting Prometheus","ts":"2020-11-27T11:17:34.498Z","version":"(version=2.21.0, branch=HEAD, revision=e83ef207b6c2398919b69cd87d2693cfc2fb4127)"} {"build_context":"(go=go1.15.2, user=***@***, date=20200911-11:35:02)","caller":"main.go:347","level":"info","ts":"2020-11-27T11:17:34.498Z"} {"caller":"main.go:348","host_details":"(Linux 4.15.0-1091-azure #101~16.04.1-Ubuntu SMP Fri Jun 26 01:14:12 UTC 2020 x86_64 prometheus-prometheus-infrastructure-prometheus-0 (none))","level":"info","ts":"2020-11-27T11:17:34.498Z"} {"caller":"main.go:349","fd_limits":"(soft=1048576, hard=1048576)","level":"info","ts":"2020-11-27T11:17:34.498Z"} {"caller":"main.go:350","level":"info","ts":"2020-11-27T11:17:34.498Z","vm_limits":"(soft=unlimited, hard=unlimited)"} {"caller":"main.go:701","level":"info","msg":"Starting TSDB ...","ts":"2020-11-27T11:17:34.500Z"} {"address":"0.0.0.0:9090","caller":"web.go:523","component":"web","level":"info","msg":"Start listening for connections","ts":"2020-11-27T11:17:34.500Z"} {"caller":"head.go:644","component":"tsdb","level":"info","msg":"Replaying on-disk memory mappable chunks if any","ts":"2020-11-27T11:17:34.505Z"} {"caller":"head.go:658","component":"tsdb","duration":"29.203µs","level":"info","msg":"On-disk memory mappable chunks replay completed","ts":"2020-11-27T11:17:34.505Z"} {"caller":"head.go:664","component":"tsdb","level":"info","msg":"Replaying WAL, this may take a while","ts":"2020-11-27T11:17:34.505Z"} {"caller":"head.go:716","component":"tsdb","level":"info","maxSegment":4,"msg":"WAL segment loaded","segment":0,"ts":"2020-11-27T11:17:34.505Z"} {"caller":"head.go:716","component":"tsdb","level":"info","maxSegment":4,"msg":"WAL segment loaded","segment":1,"ts":"2020-11-27T11:18:30.088Z"} {"caller":"head.go:716","component":"tsdb","level":"info","maxSegment":4,"msg":"WAL segment loaded","segment":2,"ts":"2020-11-27T11:18:35.593Z"} {"caller":"head.go:716","component":"tsdb","level":"info","maxSegment":4,"msg":"WAL segment loaded","segment":3,"ts":"2020-11-27T11:18:40.579Z"} {"caller":"head.go:716","component":"tsdb","level":"info","maxSegment":4,"msg":"WAL segment loaded","segment":4,"ts":"2020-11-27T11:18:40.579Z"} {"caller":"head.go:719","checkpoint_replay_duration":"41.804µs","component":"tsdb","level":"info","msg":"WAL replay completed","total_replay_duration":"1m6.074749616s","ts":"2020-11-27T11:18:40.580Z","wal_replay_duration":"1m6.074648107s"} {"caller":"main.go:721","fs_type":"EXT4_SUPER_MAGIC","level":"info","ts":"2020-11-27T11:18:47.614Z"} {"caller":"main.go:724","level":"info","msg":"TSDB started","ts":"2020-11-27T11:18:47.614Z"} {"caller":"main.go:850","filename":"/etc/prometheus/config_out/prometheus.env.yaml","level":"info","msg":"Loading configuration file","ts":"2020-11-27T11:18:47.614Z"} {"caller":"kubernetes.go:263","component":"discovery manager scrape","discovery":"kubernetes","level":"info","msg":"Using pod service account via in-cluster config","ts":"2020-11-27T11:18:47.620Z"} {"caller":"kubernetes.go:263","component":"discovery manager scrape","discovery":"kubernetes","level":"info","msg":"Using pod service account via in-cluster config","ts":"2020-11-27T11:18:47.621Z"} {"caller":"kubernetes.go:263","component":"discovery manager scrape","discovery":"kubernetes","level":"info","msg":"Using pod service account via in-cluster config","ts":"2020-11-27T11:18:47.622Z"} {"caller":"kubernetes.go:263","component":"discovery manager scrape","discovery":"kubernetes","level":"info","msg":"Using pod service account via in-cluster config","ts":"2020-11-27T11:18:47.622Z"} {"caller":"kubernetes.go:263","component":"discovery manager scrape","discovery":"kubernetes","level":"info","msg":"Using pod service account via in-cluster config","ts":"2020-11-27T11:18:47.623Z"} {"caller":"kubernetes.go:263","component":"discovery manager scrape","discovery":"kubernetes","level":"info","msg":"Using pod service account via in-cluster config","ts":"2020-11-27T11:18:47.624Z"} {"caller":"kubernetes.go:263","component":"discovery manager scrape","discovery":"kubernetes","level":"info","msg":"Using pod service account via in-cluster config","ts":"2020-11-27T11:18:47.625Z"} {"caller":"kubernetes.go:263","component":"discovery manager scrape","discovery":"kubernetes","level":"info","msg":"Using pod service account via in-cluster config","ts":"2020-11-27T11:18:47.625Z"} {"caller":"main.go:881","filename":"/etc/prometheus/config_out/prometheus.env.yaml","level":"info","msg":"Completed loading of configuration file","notify":"1.1µs","notify_sd":"1.8µs","query_engine":"1.2µs","remote_storage":"2.201µs","rules":"555.852µs","scrape":"506.947µs","scrape_sd":"6.454105ms","totalDuration":"12.469367ms","ts":"2020-11-27T11:18:47.627Z","web_handler":"600ns"} {"caller":"main.go:673","level":"info","msg":"Server is ready to receive web requests.","ts":"2020-11-27T11:18:47.627Z"} {"caller":"main.go:551","level":"warn","msg":"Received SIGTERM, exiting gracefully...","ts":"2020-11-27T11:22:31.176Z"} {"caller":"main.go:574","level":"info","msg":"Stopping scrape discovery manager...","ts":"2020-11-27T11:22:31.176Z"} {"caller":"main.go:588","level":"info","msg":"Stopping notify discovery manager...","ts":"2020-11-27T11:22:31.176Z"} {"caller":"main.go:610","level":"info","msg":"Stopping scrape manager...","ts":"2020-11-27T11:22:31.176Z"} {"caller":"main.go:570","level":"info","msg":"Scrape discovery manager stopped","ts":"2020-11-27T11:22:31.176Z"} {"caller":"klog.go:86","component":"k8s_client_runtime","func":"Warningf","level":"warn","msg":"/app/discovery/kubernetes/kubernetes.go:428: watch of *v1.Pod ended with: an error on the server (\"unable to decode an event from the watch stream: context canceled\") has prevented the request from succeeding","ts":"2020-11-27T11:22:31.177Z"} {"caller":"klog.go:86","component":"k8s_client_runtime","func":"Warningf","level":"warn","msg":"/app/discovery/kubernetes/kubernetes.go:428: watch of *v1.Pod ended with: an error on the server (\"unable to decode an event from the watch stream: context canceled\") has prevented the request from succeeding","ts":"2020-11-27T11:22:31.177Z"} {"caller":"klog.go:86","component":"k8s_client_runtime","func":"Warningf","level":"warn","msg":"/app/discovery/kubernetes/kubernetes.go:428: watch of *v1.Pod ended with: an error on the server (\"unable to decode an event from the watch stream: context canceled\") has prevented the request from succeeding","ts":"2020-11-27T11:22:31.178Z"} {"caller":"klog.go:86","component":"k8s_client_runtime","func":"Warningf","level":"warn","msg":"/app/discovery/kubernetes/kubernetes.go:426: watch of *v1.Endpoints ended with: an error on the server (\"unable to decode an event from the watch stream: context canceled\") has prevented the request from succeeding","ts":"2020-11-27T11:22:31.178Z"} {"caller":"klog.go:86","component":"k8s_client_runtime","func":"Warningf","level":"warn","msg":"/app/discovery/kubernetes/kubernetes.go:427: watch of *v1.Service ended with: an error on the server (\"unable to decode an event from the watch stream: context canceled\") has prevented the request from succeeding","ts":"2020-11-27T11:22:31.178Z"} {"caller":"klog.go:86","component":"k8s_client_runtime","func":"Warningf","level":"warn","msg":"/app/discovery/kubernetes/kubernetes.go:428: watch of *v1.Pod ended with: an error on the server (\"unable to decode an event from the watch stream: context canceled\") has prevented the request from succeeding","ts":"2020-11-27T11:22:31.178Z"} {"caller":"main.go:584","level":"info","msg":"Notify discovery manager stopped","ts":"2020-11-27T11:22:31.179Z"} {"caller":"manager.go:908","component":"rule manager","level":"info","msg":"Stopping rule manager...","ts":"2020-11-27T11:23:33.678Z"} {"caller":"main.go:604","level":"info","msg":"Scrape manager stopped","ts":"2020-11-27T11:23:33.678Z"} {"caller":"manager.go:918","component":"rule manager","level":"info","msg":"Rule manager stopped","ts":"2020-11-27T11:23:35.384Z"} {"caller":"notifier.go:601","component":"notifier","level":"info","msg":"Stopping notification manager...","ts":"2020-11-27T11:23:35.428Z"} {"caller":"main.go:778","level":"info","msg":"Notifier manager stopped","ts":"2020-11-27T11:23:35.428Z"} {"caller":"main.go:790","level":"info","msg":"See you next time!","ts":"2020-11-27T11:23:35.430Z"}

b.ca...@pobox.com

unread,
Nov 27, 2020, 8:42:57 AM11/27/20
to Prometheus Users
> Received SIGTERM, exiting gracefully...

Kubernetes told the pod to exit.  Most likely the pod was evicted.  Look at the k8s information to see why it decided to evict it.  The node could be under memory or disk pressure, for instance.

Павел Александров

unread,
Dec 1, 2020, 10:42:48 AM12/1/20
to Prometheus Users
Thnx, i don't know why it was, i thinke because of a  problem with one node in a cluster.
пятница, 27 ноября 2020 г. в 16:42:57 UTC+3, b.ca...@pobox.com:
Reply all
Reply to author
Forward
0 new messages