Hi,
I am trying to deploy a prometheus monitoring stack in a kubernetes sandbox on Google compute vm.
I have got kubernetes, helm, tiller up & running & I followed this blog post to install the prometheus server
So, as it suggested I made changes to my cpu, memory limits & resources &
tried setting extra args to both -
extraArgs:
storage.tsdb.retention: 744h
&
extraArgs:
storage.local.retention: 744h
& I then thought that the issue might be because of incorrect cpu, memory limits I had set.
I tried initially yesterday with these values -
resources:
limits:
cpu: 4Gi
memory: 4Gi
requests:
cpu: 4Gi
memory: 4Gi
& today went ahead & tried these -
resources:
limits:
cpu: 2
memory: 4Gi
requests:
cpu: 2
memory: 4Gi
Now, prometheus server pod is getting crashed on startup.
Here is the gist of the values.yaml I am using to install the chart -
https://gist.githubusercontent.com/iamdeadman/6f60810cb66f0d253ca7f85660ff5144/raw/31084e69cc86017647ea3f021b4c75ef233c3210/values_backup.yaml
& here's the command I am using to deploy the chart -
sudo helm install -f values_backup.yaml stable/prometheus
Here are the logs from the container that suggest that there is a configuration mistake here -
urtutors_dev@bitnami-kubernetessandbox-dm-8e00:~/prometheus$ kubectl logs -f kneeling-marsupial-prometheus-server-5f94f9595f-4cxb9 prometheus-server
level=info ts=2018-05-01T13:50:26.466206807Z caller=main.go:225 msg="Starting Prometheus" version="(version=2.1.0, branch=HEAD, revision=85f23d82a045d103ea7f3c89a91fba4a93e6367a)"
level=info ts=2018-05-01T13:50:26.466276087Z caller=main.go:226 build_context="(go=go1.9.2, user=root@6e784304d3ff, date=20180119-12:01:23)"
level=info ts=2018-05-01T13:50:26.466299226Z caller=main.go:227 host_details="(Linux 4.9.0-6-amd64 #1 SMP Debian 4.9.82-1+deb9u3 (2018-03-02) x86_64 kneeling-marsupial-prometheus-server-5f94f9595f-4cxb9 (none))"
level=info ts=2018-05-01T13:50:26.466316603Z caller=main.go:228 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2018-05-01T13:50:26.470127889Z caller=web.go:383 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2018-05-01T13:50:26.470120709Z caller=main.go:499 msg="Starting TSDB ..."
level=info ts=2018-05-01T13:50:26.480127293Z caller=main.go:509 msg="TSDB started"
level=info ts=2018-05-01T13:50:26.480199354Z caller=main.go:585 msg="Loading configuration file" filename=/etc/config/prometheus.yml
level=info ts=2018-05-01T13:50:26.480803755Z caller=main.go:386 msg="Stopping scrape discovery manager..."
level=info ts=2018-05-01T13:50:26.4808297Z caller=main.go:400 msg="Stopping notify discovery manager..."
level=info ts=2018-05-01T13:50:26.480840617Z caller=main.go:424 msg="Stopping scrape manager..."
level=info ts=2018-05-01T13:50:26.480921603Z caller=main.go:382 msg="Scrape discovery manager stopped"
level=info ts=2018-05-01T13:50:26.480969882Z caller=manager.go:460 component="rule manager" msg="Stopping rule manager..."
level=info ts=2018-05-01T13:50:26.48099561Z caller=manager.go:466 component="rule manager" msg="Rule manager stopped"
level=info ts=2018-05-01T13:50:26.481006337Z caller=notifier.go:493 component=notifier msg="Stopping notification manager..."
level=info ts=2018-05-01T13:50:26.481007667Z caller=manager.go:59 component="scrape manager" msg="Starting scrape manager..."
level=info ts=2018-05-01T13:50:26.481019699Z caller=main.go:570 msg="Notifier manager stopped"
level=info ts=2018-05-01T13:50:26.481027179Z caller=main.go:418 msg="Scrape manager stopped"
level=info ts=2018-05-01T13:50:26.481032803Z caller=main.go:396 msg="Notify discovery manager stopped"
level=error ts=2018-05-01T13:50:26.481112947Z caller=main.go:579 err="Error loading config couldn't load configuration (--config.file=/etc/config/prometheus.yml): parsing YAML file /etc/config/prometheus.yml: yaml: line 160: mapping values are not allowed in this context"
level=info ts=2018-05-01T13:50:26.481159397Z caller=main.go:581 msg="See you next time!
& here's the output of the kubectl describe pod <prometheus-server>
Name: kneeling-marsupial-prometheus-server-5f94f9595f-4cxb9
Namespace: default
Node: bitnami-kubernetessandbox-dm-8e00/10.128.0.2
Start Time: Tue, 01 May 2018 13:49:58 +0000
Labels: app=prometheus
component=server
pod-template-hash=1950951519
release=kneeling-marsupial
Annotations: <none>
Status: Running
IP: 10.32.0.73
Controlled By: ReplicaSet/kneeling-marsupial-prometheus-server-5f94f9595f
Init Containers:
init-chown-data:
Container ID: docker://fa1426a69f5506815abad677cd929947336f95cc082582c9849e78a1b4fa8625
Image: busybox:latest
Image ID: docker-pullable://busybox@sha256:58ac43b2cc92c687a32c8be6278e50a063579655fe3090125dcb2af0ff9e1a64
Port: <none>
Host Port: <none>
Command:
chown
-R
65534:65534
/data
State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 01 May 2018 13:50:01 +0000
Finished: Tue, 01 May 2018 13:50:01 +0000
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/data from storage-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kneeling-marsupial-prometheus-server-token-jkb48 (ro)
Containers:
prometheus-server-configmap-reload:
Container ID: docker://f59c7ea8563093d4db5509ceda4c7b591098da42b9159832a6bd328979ee97fe
Image: jimmidyson/configmap-reload:v0.1
Image ID: docker-pullable://jimmidyson/configmap-reload@sha256:2d40c2eaa6f435b2511d0cfc5f6c0a681eeb2eaa455a5d5ac25f88ce5139986e
Port: <none>
Host Port: <none>
Args:
--volume-dir=/etc/config
--webhook-url=http://localhost:9090/-/reload
State: Running
Started: Tue, 01 May 2018 13:50:03 +0000
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/etc/config from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kneeling-marsupial-prometheus-server-token-jkb48 (ro)
prometheus-server:
Container ID: docker://7d553aef080dc20a3e3a1a8aa6cbcfce839e5acc1991326f4beb9bd1adf948ae
Image: prom/prometheus:v2.1.0
Image ID: docker-pullable://prom/prometheus@sha256:7b987901dbc44d17a88e7bda42dbbbb743c161e3152662959acd9f35aeefb9a3
Port: 9090/TCP
Host Port: 0/TCP
Args:
--config.file=/etc/config/prometheus.yml
--storage.tsdb.path=/data
--web.console.libraries=/etc/prometheus/console_libraries
--web.console.templates=/etc/prometheus/consoles
--web.enable-lifecycle
--storage.tsdb.retention=744h
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 01 May 2018 14:00:52 +0000
Finished: Tue, 01 May 2018 14:00:52 +0000
Ready: False
Restart Count: 7
Limits:
cpu: 2
memory: 4Gi
Requests:
cpu: 2
memory: 4Gi
Liveness: http-get http://:9090/-/healthy delay=30s timeout=30s period=10s #success=1 #failure=3
Readiness: http-get http://:9090/-/ready delay=30s timeout=30s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/data from storage-volume (rw)
/etc/config from config-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kneeling-marsupial-prometheus-server-token-jkb48 (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: kneeling-marsupial-prometheus-server
Optional: false
storage-volume:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: kneeling-marsupial-prometheus-server
ReadOnly: false
kneeling-marsupial-prometheus-server-token-jkb48:
Type: Secret (a volume populated by a Secret)
SecretName: kneeling-marsupial-prometheus-server-token-jkb48
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 11m default-scheduler Successfully assigned kneeling-marsupial-prometheus-server-5f94f9595f-4cxb9 to bitnami-kubernetessandbox-dm-8e00
Normal SuccessfulMountVolume 11m kubelet, bitnami-kubernetessandbox-dm-8e00 MountVolume.SetUp succeeded for volume "config-volume"
Normal SuccessfulMountVolume 11m kubelet, bitnami-kubernetessandbox-dm-8e00 MountVolume.SetUp succeeded for volume "kneeling-marsupial-prometheus-server-token-jkb48"
Normal SuccessfulMountVolume 11m kubelet, bitnami-kubernetessandbox-dm-8e00 MountVolume.SetUp succeeded for volume "local-pv-3d955afd"
Normal Started 11m kubelet, bitnami-kubernetessandbox-dm-8e00 Started container
Normal Pulled 11m kubelet, bitnami-kubernetessandbox-dm-8e00 Container image "busybox:latest" already present on machine
Normal Created 11m kubelet, bitnami-kubernetessandbox-dm-8e00 Created container
Normal Pulling 11m kubelet, bitnami-kubernetessandbox-dm-8e00 pulling image "prom/prometheus:v2.1.0"
Normal Pulled 11m kubelet, bitnami-kubernetessandbox-dm-8e00 Container image "jimmidyson/configmap-reload:v0.1" already present on machine
Normal Started 11m kubelet, bitnami-kubernetessandbox-dm-8e00 Started container
Normal Created 11m kubelet, bitnami-kubernetessandbox-dm-8e00 Created container
Normal Pulled 11m kubelet, bitnami-kubernetessandbox-dm-8e00 Successfully pulled image "prom/prometheus:v2.1.0"
Normal Created 11m (x3 over 11m) kubelet, bitnami-kubernetessandbox-dm-8e00 Created container
Normal Started 11m (x3 over 11m) kubelet, bitnami-kubernetessandbox-dm-8e00 Started container
Normal Pulled 10m (x3 over 11m) kubelet, bitnami-kubernetessandbox-dm-8e00 Container image "prom/prometheus:v2.1.0" already present on machine
Warning BackOff 1m (x49 over 11m) kubelet, bitnami-kubernetessandbox-dm-8e00 Back-off restarting failed container
(END)
So, I don't think I am modifying any other unnecessary values from this values file -
but still If there is any configuration error in my gist, can someone point as to how to resolve it or debug this issue further.