Readiness probe failed: HTTP probe failed with statuscode: 503

3,592 views
Skip to first unread message

Mark Lyck

unread,
Sep 30, 2019, 4:28:35 PM9/30/19
to Prometheus Users
Having a problem with Prometheus never quite finishing starting up.

and navigating to `/` returns a 502 Server Error:

level=info ts=2019-09-30T20:15:44.548Z caller=main.go:285 msg="no time or size retention was set so using the default time retention" duration=15d
level=info ts=2019-09-30T20:15:44.548Z caller=main.go:321 msg="Starting Prometheus" version="(version=2.9.2, branch=HEAD, revision=d3245f15022551c6fc8281766ea62db4d71e2747)"
level=info ts=2019-09-30T20:15:44.548Z caller=main.go:322 build_context="(go=go1.12.4, user=root@1d43b6951e8f, date=20190424-15:32:31)"
level=info ts=2019-09-30T20:15:44.549Z caller=main.go:323 host_details="(Linux 4.14.65+ #1 SMP Thu Oct 25 10:42:50 PDT 2018 x86_64 platform-prometheus-server-0 (none))"
level=info ts=2019-09-30T20:15:44.549Z caller=main.go:324 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2019-09-30T20:15:44.549Z caller=main.go:325 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2019-09-30T20:15:44.552Z caller=main.go:640 msg="Starting TSDB ..."
level=info ts=2019-09-30T20:15:44.554Z caller=web.go:416 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2019-09-30T20:15:44.560Z caller=repair.go:47 component=tsdb msg="found healthy block" mint=1569346927411 maxt=1569391200000 ulid=01DNM17ZHJV5NRN4D2QFW2QQ7D
level=info ts=2019-09-30T20:15:44.562Z caller=repair.go:47 component=tsdb msg="found healthy block" mint=1569391200000 maxt=1569456000000 ulid=01DNNR5YZ9DJ9HX4FRFCE3G6QM
level=info ts=2019-09-30T20:15:44.563Z caller=repair.go:47 component=tsdb msg="found healthy block" mint=1569456000000 maxt=1569520800000 ulid=01DNQNZJ6EMV7WN8TYEP0NNJVJ
level=info ts=2019-09-30T20:15:44.564Z caller=repair.go:47 component=tsdb msg="found healthy block" mint=1569520800000 maxt=1569585600000 ulid=01DNSKV8ZCSBTAAFYF8MPAK0QG
level=info ts=2019-09-30T20:15:44.565Z caller=repair.go:47 component=tsdb msg="found healthy block" mint=1569650400000 maxt=1569657600000 ulid=01DNVHJA2AJVQD57GX20EJQVGC
level=info ts=2019-09-30T20:15:44.567Z caller=repair.go:47 component=tsdb msg="found healthy block" mint=1569585600000 maxt=1569650400000 ulid=01DNVHJHNHD3AYB926ZFHYVEXV
level=info ts=2019-09-30T20:15:44.567Z caller=repair.go:47 component=tsdb msg="found healthy block" mint=1569657600000 maxt=1569664800000 ulid=01DNVRE1A6BCV8D18X8KB7NEM1
level=info ts=2019-09-30T20:15:44.568Z caller=repair.go:47 component=tsdb msg="found healthy block" mint=1569664800000 maxt=1569672000000 ulid=01DNVZ9RJ68H48E14XREP0R1MN


Kubectl describe:

Events:
  Type     Reason     Age                  From                                 Message
  ----     ------     ----                 ----                                 -------
  Normal   Scheduled  3m15s                default-scheduler                    Successfully assigned platform/platform-prometheus-server-0 to sin-de080d0b-oesg-497e0cc5
  Normal   Created    3m13s                kubelet, sin-de080d0b-oesg-497e0cc5  Created container
  Normal   Started    3m13s                kubelet, sin-de080d0b-oesg-497e0cc5  Started container
  Normal   Pulled     3m13s                kubelet, sin-de080d0b-oesg-497e0cc5  Container image "busybox:latest" already present on machine
  Normal   Pulled     3m12s                kubelet, sin-de080d0b-oesg-497e0cc5  Container image "us.gcr.io/colony-develop/prometheus-colony-file-configs:1.5.9" already present on machine
  Normal   Pulled     3m12s                kubelet, sin-de080d0b-oesg-497e0cc5  Container image "jimmidyson/configmap-reload:v0.2.2" already present on machine
  Normal   Created    3m12s                kubelet, sin-de080d0b-oesg-497e0cc5  Created container
  Normal   Started    3m12s                kubelet, sin-de080d0b-oesg-497e0cc5  Started container
  Normal   Started    3m11s                kubelet, sin-de080d0b-oesg-497e0cc5  Started container
  Normal   Created    3m11s                kubelet, sin-de080d0b-oesg-497e0cc5  Created container
  Normal   Pulled     85s (x2 over 3m12s)  kubelet, sin-de080d0b-oesg-497e0cc5  Container image "prom/prometheus:v2.9.2" already present on machine
  Normal   Started    85s (x2 over 3m12s)  kubelet, sin-de080d0b-oesg-497e0cc5  Started container
  Normal   Created    85s (x2 over 3m12s)  kubelet, sin-de080d0b-oesg-497e0cc5  Created container
  Warning  Unhealthy  85s                  kubelet, sin-de080d0b-oesg-497e0cc5  Readiness probe failed: Get http://10.33.23.92:9090/-/ready: dial tcp 10.33.23.92:9090: connect: connection refused
  Warning  Unhealthy  45s (x9 over 2m35s)  kubelet, sin-de080d0b-oesg-497e0cc5  Readiness probe failed: HTTP probe failed with statuscode: 503


^ Error 

prometheus-server:
    Container ID:  docker://dcf5f40d9a3d1051b852acd7d649805974b6b6826a833d1004c144bddabbfe11
    Image:         prom/prometheus:v2.9.2
    Image ID:      docker-pullable://prom/prometheus@sha256:05350e0d1a577674442046961abf56b3e883dcd82346962f9e73f00667958f6b
    Port:          9090/TCP
    Host Port:     0/TCP
    Args:
      --config.file=/etc/config/prometheus.yml
      --storage.tsdb.path=/data
      --web.console.libraries=/etc/prometheus/console_libraries
      --web.console.templates=/etc/prometheus/consoles
      --web.enable-lifecycle
    State:          Running
      Started:      Mon, 30 Sep 2019 16:24:18 -0400
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Mon, 30 Sep 2019 16:22:31 -0400
      Finished:     Mon, 30 Sep 2019 16:24:18 -0400
    Ready:          False
    Restart Count:  1
    Limits:
      cpu:     2
      memory:  8G
    Requests:
      cpu:        1
      memory:     8G
    Liveness:     http-get http://:9090/-/healthy delay=30s timeout=30s period=10s #success=1 #failure=3
    Readiness:    http-get http://:9090/-/ready delay=30s timeout=30s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /data from storage-volume (rw)
      /etc/config from config-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from platform-prometheus-server-token-z6v4l (ro)

Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  storage-volume:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  storage-volume-platform-prometheus-server-0
    ReadOnly:   false
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      platform-prometheus-server
    Optional:  false
  platform-prometheus-server-token-z6v4l:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  platform-prometheus-server-token-z6v4l
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s



It should have enough CPU/MEM and storage available as far as I can see.

Any ideas why it's not finishing starting up?



Simon Pasquier

unread,
Oct 2, 2019, 5:39:14 AM10/2/19
to Mark Lyck, Prometheus Users
The kubectl output says "Reason: OOMKilled" so I suppose you need to
allocate more RAM.
> --
> You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/8abf0893-22e5-47fc-89f8-3ee3ccd5d2a4%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages