./kubectl get pods -n monitoring
NAME READY STATUS RESTARTS AGE
prometheus-deployment-5cfdf8f756-mpctk 1/1 Running 0 4d
Could the name of my pod be causing the issue? Also I noticed the example config file below does not have the alert manager details but I thought the UI should pick up some of the metrics running on the cluster automatically?
Thanks in advance for any help!!
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-server-conf
labels:
name: prometheus-server-conf
namespace: monitoring
data:
prometheus.yml: |-
global:
scrape_interval: 5s
evaluation_interval: 5s
scrape_configs:
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
- job_name: 'kubernetes-nodes'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- job_name: 'kubernetes-cadvisor'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/7311214f-b398-47c6-8319-b341e5c53f06%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hi Simon
./kubectl get pods --namespace=monitoring
NAME READY STATUS RESTARTS AGE
prometheus-deployment-5cfdf8f756-mpctk 1/1 Running 0 4d
but when I ran the following it did not find any such pod:
./kubectl logs prometheus-deployment-5cfdf8f756-mpctk
Error from server (NotFound): pods "prometheus-deployment-5cfdf8f756-mpctk" not found
I then ran ./kubectl get pods and it doesn't show my prometheus pod there -
NAME READY STATUS RESTARTS AGE
cassandra-0 1/1 Running 0 5d
cassandra-1 1/1 Running 0 5d
cassandra-2 1/1 Running 0 5d
metricgen-0 1/1 Running 0 5d
metricrest-0 1/1 Running 0 5d
--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/4f238924-89ea-4bee-9bbc-2f5cba9d1377%40googlegroups.com.
level=error ts=2018-03-06T11:00:09.923995347Z caller=main.go:221 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:296: Failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:monitoring:default\" cannot list pods at the cluster scope"
level=error ts=2018-03-06T11:00:09.924030858Z caller=main.go:221 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:354: Failed to list *v1.Node: nodes is forbidden: User \"system:serviceaccount:monitoring:default\" cannot list nodes at the cluster scope"
level=error ts=2018-03-06T11:00:09.924066031Z caller=main.go:221 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:268: Failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:monitoring:default\" cannot list endpoints at the cluster scope"
level=error ts=2018-03-06T11:00:09.924076209Z caller=main.go:221 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:269: Failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:monitoring:default\" cannot list services at the cluster scope"
--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/a2b01206-b99a-44ae-b309-8137883df004%40googlegroups.com.
Hi SimonI changed the namespace to 'Monitoring' in the file and ran the following: kubectl apply -f rbac-setup.yaml fileIt seemed to run ok and created the service account 'prometheus' but when I checked the logs again they are still giving me the same errors as above. So the permissions must still not be set properly. Could you advise any further please?
./kubectl apply -f rbac-setup.yml
clusterrole "prometheus" created
serviceaccount "prometheus" created
clusterrolebinding "prometheus" created
Mary-Jos-MBP:darwin-amd64 maryjomcguinness$ ./kubectl get serviceaccounts --namespace=monitoring
NAME SECRETS AGE
default 1 5d
prometheus 1 24s
Hi SimonI changed the namespace to 'Monitoring' in the file and ran the following: kubectl apply -f rbac-setup.yaml fileIt seemed to run ok and created the service account 'prometheus' but when I checked the logs again they are still giving me the same errors as above. So the permissions must still not be set properly. Could you advise any further please?./kubectl apply -f rbac-setup.yml
clusterrole "prometheus" created
serviceaccount "prometheus" created
clusterrolebinding "prometheus" created
Mary-Jos-MBP:darwin-amd64 maryjomcguinness$ ./kubectl get serviceaccounts --namespace=monitoring
NAME SECRETS AGE
default 1 5d
prometheus 1 24s
Many thanks in advance!
On Tuesday, 6 March 2018 15:19:23 UTC, M McGuinness wrote:
Thanks Simon, so if I use the rbac-setup.yaml file and change the namespace to the one I am using 'monitoring' what command should I use to capture the changes made in it for the prometheus service account?Thanks in advance!
--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/6268966b-c6cf-48e5-9a3a-e2142d3101be%40googlegroups.com.
Hi Simon
./kubectl apply -f rbac-setup.yml
clusterrole "prometheus" configured
serviceaccount "prometheus" unchanged
clusterrolebinding "prometheus" configured
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: monitoring
Many thanks for your help!
## alertmanager ConfigMap entries
##
alertmanagerFiles:
alertmanager.yml: |-
global:
# slack_api_url: ''
resolve_timeout: 20s
receivers:
- name: default-receiver
# slack_configs:
# - channel: '@you'
# send_resolved: true
- name: 'webhook'
webhook_configs:
- send_resolved: true
url: '<webhook>'
route:
group_wait: 10s
group_interval: 5m
receiver: webhook
repeat_interval: 3hMany thanks again!
./kubectl create -f ./config-map.yaml -n monitoring
error: error validating "./config-map.yaml": error validating data: [ValidationError(ConfigMap): unknown field "alertmanagerFiles" in io.k8s.api.core.v1.ConfigMap, ValidationError(ConfigMap): unknown field "serverFiles" in io.k8s.api.core.v1.ConfigMap]; if you choose to ignore these errors, turn validation off with --validate=false
Would you mind taking a look at my file below and let me know if you notice what could be wrong please and thanks?
## alertmanager ConfigMap entries
##
alertmanagerFiles:
alertmanager.yml: |-
global:
# slack_api_url: ''
resolve_timeout: 20s
receivers:
- name: default-receiver
# slack_configs:
# - channel: '@you'
# send_resolved: true
- name: 'webhook'
webhook_configs:
- send_resolved: true
url: '<normalizer webhook>'
route:
group_wait: 10s
group_interval: 5m
receiver: webhook
repeat_interval: 3h
## Prometheus server ConfigMap entries
##
serverFiles:
rules: ""
alerts: |-
# host rules
ALERT high_node_load
IF node_load1 > 20
FOR 10s
LABELS { severity = "critical" }
ANNOTATIONS {
# summary defines the status if the condition is met
summary = "Node usage exceeded threshold",
# description reports the situtation of event
description = "Instance {{ $labels.instance }}, Job {{ $labels.job }}, Node load {{ $value }}",
}
ALERT high_memory_usage
IF (( node_memory_MemTotal - node_memory_MemFree ) / node_memory_MemTotal) * 100 > 100
FOR 10s
LABELS { severity = "warning" }
ANNOTATIONS {
# summary defines the status if the condition is met
summary = "Memory usage exceeded thershold",
# description reports the situtation of event
description = "Instance {{ $labels.instance }}, Job {{ $labels.job }}, Memory usage {{ humanize $value }}%",
}
ALERT high_storage_usage
IF (node_filesystem_size{fstype="ext4"} - node_filesystem_free{fstype="ext4"}) / node_filesystem_size{fstype="ext4"} * 100 > 90
FOR 10s
LABELS { severity = "warning" }
ANNOTATIONS {
# summary defines the status if the condition is met
summary = "Storge usage exceeded threshold",
# description reports the situtation of event
description = "Instance {{ $labels.instance }}, Job {{ $labels.job }}, Storage usage {{ humanize $value }}%",
}
prometheus.yml: |-
rule_files:
- /etc/config/rules
- /etc/config/alerts
scrape_configs:
- job_name: prometheus
static_configs:
- targets:
- localhost:9090