cluster name in HipChat notification message text from alertmanager

588 views
Skip to first unread message

joshua...@pearson.com

unread,
Feb 28, 2018, 7:51:42 PM2/28/18
to Prometheus Users
Hello,

I please need help figuring out how to get the cluster name (e.g. "sandbox", "dev", etc.) to show up as part of the message text in a notification sent to HipChat from alertmanager. (I'm running kube-prometheus/self-hosted-deploy.)

Here's an e.g. of what Labels (i.e. "alertname" through "severity") are included (in a notification from alertmanager to HipChat):
$ amtool alert query instance="172.17.0.8:8443" alertname="DeploymentReplicasNotUpdated" | head -2
Labels                                                                                                                                                                                                                                                 Annotations                                                                                                              Starts At                Ends At                  Generator URL
alertname="DeploymentReplicasNotUpdated" deployment="crasher" endpoint="https-main" instance="172.17.0.8:8443" job="kube-state-metrics" namespace="default" pod="kube-state-metrics-5799bbb88c-pc5d9" service="kube-state-metrics" severity="warning"  description="Replicas are not updated and available for deployment /crasher" summary="Deployment replicas are outdated"  2018-02-28 14:55:59 UTC  0001-01-01 00:00:00 UTC  http://prometheus-k8s-0:9090/graph?g0.expr=%28%28kube_deployment_status_replicas_updated+%21%3D+kube_deployment_spec_replicas%29+or+%28kube_deployment_status_replicas_available+%21%3D+kube_deployment_spec_replicas%29%29+unless+%28kube_deployment_spec_paused+%3D%3D+1%29&g0.tab=1

Idea #1: I came across the Controlling the instance label blog post, which describes how to change the "instance" label's value (via adding a "relabel_configs" to the prometheus.yaml config file) to be the ec2 node's name. I'm using custom terraform scripts to create my K8s cluster in AWS, and my scripts do include the cluster name in the name of each node - so this would get the cluster name into the "instance" label (the JOBS AND INSTANCES page mentions that "instance" is one of the labels always attached to targets scraped by prometheus). But I believe (based on Prometheus Operator with custom prometheus.yml) with kube-prometheus the only way to modify the prometheus.yaml would be if I forked my own copy of prometheus-operator and then made a custom modification to the code, which I'd rather not do.

Idea #2: So then I thought I'd create my own label (on each node), e.g. "cluster=sandbox" - and then somehow get that label to show up in alertmanager.
Target labels, not static scraped labels mentions creating a label for a target, but I Googled and couldn't find info about how exactly to do this.
When creating my cluster, I passed a `--node-labels='cluster=sandbox'` arg to kubelet service. Then I ran `kubectl get node my-node-name --show-labels` and sure enough the label was there.
But how do I get that label to be picked up by prometheus? 
  • Please excuse a prometheus newbie question, but does prometheus only grab/include labels that you specifically tell it to (via a "relabel_configs")? Evidence that seems to suggest the answer is yes:
    • It seems like every target in prometheus has the same labels (endpoint, instance, namespace, pod, & service). From this call to append in promcfg.go, it looks like prometheus-operator specifically adds the "namespace" (and other) labels.

Idea #3: I could add the following (in the "global:" section of prometheus.yaml):
  external_labels:
    cluster: 'sandbox'

So the only ideas I've been able to come up with look like they'd involve me forking my own copy of prometheus-operator. Could someone please tell me what idea (not necessarily limited to the ones I described) you think is best, and how I should go about getting the cluster name into HipChat notifications?

Any assistance would be greatly appreciated. thanks!

Brian Brazil

unread,
Mar 1, 2018, 3:29:00 AM3/1/18
to joshua...@pearson.com, Prometheus Users
On 1 March 2018 at 00:51, <joshua...@pearson.com> wrote:
Hello,

I please need help figuring out how to get the cluster name (e.g. "sandbox", "dev", etc.) to show up as part of the message text in a notification sent to HipChat from alertmanager. (I'm running kube-prometheus/self-hosted-deploy.)

Here's an e.g. of what Labels (i.e. "alertname" through "severity") are included (in a notification from alertmanager to HipChat):
$ amtool alert query instance="172.17.0.8:8443" alertname="DeploymentReplicasNotUpdated" | head -2
Labels                                                                                                                                                                                                                                                 Annotations                                                                                                              Starts At                Ends At                  Generator URL
alertname="DeploymentReplicasNotUpdated" deployment="crasher" endpoint="https-main" instance="172.17.0.8:8443" job="kube-state-metrics" namespace="default" pod="kube-state-metrics-5799bbb88c-pc5d9" service="kube-state-metrics" severity="warning"  description="Replicas are not updated and available for deployment /crasher" summary="Deployment replicas are outdated"  2018-02-28 14:55:59 UTC  0001-01-01 00:00:00 UTC  http://prometheus-k8s-0:9090/graph?g0.expr=%28%28kube_deployment_status_replicas_updated+%21%3D+kube_deployment_spec_replicas%29+or+%28kube_deployment_status_replicas_available+%21%3D+kube_deployment_spec_replicas%29%29+unless+%28kube_deployment_spec_paused+%3D%3D+1%29&g0.tab=1

Idea #1: I came across the Controlling the instance label blog post, which describes how to change the "instance" label's value (via adding a "relabel_configs" to the prometheus.yaml config file) to be the ec2 node's name. I'm using custom terraform scripts to create my K8s cluster in AWS, and my scripts do include the cluster name in the name of each node - so this would get the cluster name into the "instance" label (the JOBS AND INSTANCES page mentions that "instance" is one of the labels always attached to targets scraped by prometheus). But I believe (based on Prometheus Operator with custom prometheus.yml) with kube-prometheus the only way to modify the prometheus.yaml would be if I forked my own copy of prometheus-operator and then made a custom modification to the code, which I'd rather not do.

Idea #2: So then I thought I'd create my own label (on each node), e.g. "cluster=sandbox" - and then somehow get that label to show up in alertmanager.
Target labels, not static scraped labels mentions creating a label for a target, but I Googled and couldn't find info about how exactly to do this.
When creating my cluster, I passed a `--node-labels='cluster=sandbox'` arg to kubelet service. Then I ran `kubectl get node my-node-name --show-labels` and sure enough the label was there.
But how do I get that label to be picked up by prometheus? 

You can use a replace relabel action to copy it from __meta_kubernetes_node_label_cluster. 
  • Please excuse a prometheus newbie question, but does prometheus only grab/include labels that you specifically tell it to (via a "relabel_configs")? Evidence that seems to suggest the answer is yes:
Yes.
    • It seems like every target in prometheus has the same labels (endpoint, instance, namespace, pod, & service). From this call to append in promcfg.go, it looks like prometheus-operator specifically adds the "namespace" (and other) labels.

Idea #3: I could add the following (in the "global:" section of prometheus.yaml):
  external_labels:
    cluster: 'sandbox'

If you're running one Prometheus per cluster (which would be usual) this is the way to do it.

Brian
 

 

So the only ideas I've been able to come up with look like they'd involve me forking my own copy of prometheus-operator. Could someone please tell me what idea (not necessarily limited to the ones I described) you think is best, and how I should go about getting the cluster name into HipChat notifications?

Any assistance would be greatly appreciated. thanks!

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/be6aaf64-5379-49d6-827f-10ac53681706%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

joshua...@pearson.com

unread,
Mar 1, 2018, 9:03:55 AM3/1/18
to Prometheus Users
Thanks Brian. I think i'll go with #3 then.

Can someone please tell me if it's the norm that most prometheus-operator users end up needing to fork & modify the repo to fit their needs (e.g. for #3)? That doesn't sound so bad - it's just that when i occasionally need to sync/copy the latest contents from the official prometheus-operator repo to mine, then I might need to resolve merge conflicts in the files I customized in my private repo.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.



--

joshua...@pearson.com

unread,
Mar 2, 2018, 10:33:16 PM3/2/18
to Prometheus Users
I figured out how to do this (i.e. customize the prometheus.yaml config file from prometheus-operator so that a specified cluster name is included in every notification sent from alertmanager to HipChat) - here's more info (for anyone who might be interested):

NOTEFrederic Branczyk said "If you want to use the custom configuration because there is something you can't do with the Prometheus Operator it is likely a missing feature". 
 * It wasn't til I was almost done writing this that I had the idea to search the prometheus-operator issues for external_labels. And then when I read 455 I did a major facepalm when I realized all I needed to do was simply this:
~/prometheus-operator/contrib/kube-prometheus/manifests
$ git diff prometheus/prometheus-k8s.yaml
diff --git a/contrib/kube-prometheus/manifests/prometheus/prometheus-k8s.yaml b/contrib/kube-prometheus/manifests/prometheus/prometheus-k8s.yaml
   serviceMonitorSelector:
     matchExpressions:
     - {key: k8s-app, operator: Exists}
+  externalLabels:
+    cluster: josh-minikube
 * but I still decided to post this, in case it helps anyone who truly needs to modify prometheus.yaml for something that prometheus-operator doesn't support.

-----------------------------

Initial Setup

Before I test this in a K8s cluster on AWS, I wanted to test it via minikube (on my Mac).

The following is basically borrowed from kube-prometheus/README.md:
$ minikube start --memory 4096 --kubernetes-version v1.8.0 --bootstrapper=kubeadm --extra-config=kubelet.authentication-token-webhook=true --extra-config=kubelet.authorization-mode=Webhook --extra-config=scheduler.address=0.0.0.0 --extra-config=controller-manager.address=0.0.0.0

And then I deployed kube-prometheus:
$ cd ~ 
$ cd prometheus-operator/contrib/kube-prometheus/
$ ./hack/cluster-monitoring/minikube-deploy

Kudos to Joe Creager

I found a great blog post that gives an e.g. of specifying your own prometheus.yaml (and your own rule files - though my use case is simpler because I don't need to do that; and the following blog uses prometheus-operator but not kube-prometheus): Custom Configurations with Prometheus Operator 

Let's take a look at how prometheus-operator sets things up out-of-the-box

promcfg.go generates the prometheus.yaml - and here's how to retrieve it:
$ kubectl get secret prometheus-k8s -o json -n monitoring | jq -r '.data["prometheus.yaml"]' | base64 -D >~/raw_prometheus.yaml

Here's the other piece of data in this secret:
$ kubectl get secret prometheus-k8s -o json -n monitoring | jq -r '.data["configmaps.json"]' | base64 -D >~/raw_configmaps.json
$ cat ~/raw_configmaps.json
{"items":[{"key":"monitoring/prometheus-k8s-rules","checksum":"f8e6d9b3bc5cf6fdb4185b8d4f3ebc0d501fa54b9f17ba6895413d8f85c40665"}]}

Side note: 
 * for the life of me I couldn't figure out how to reproduce that checksum. From the make_secrets.sh script on the aforementioned blog post, it looks like the following command should reproduce it (but as you can see this checksum is different):
~/prometheus-operator
$ cat contrib/kube-prometheus/manifests/prometheus/prometheus-k8s-rules.yaml | shasum -b -a 256
3ffc6601640fa837fc1cba95f65d282c55713eba4aadab72afd929da10ec1ff7 *-
 * but after more Googling I found that Frederic Branczyk said "The checksum is only to signify whether the content of a ConfigMap has changed". So then I decided it didn't matter :)

(I didn't need to modify the configmaps.json / prometheus rules, but in a moment you'll see I needed to retrieve them in order to not blow them away.)

Disable the serviceMonitorSelector

The aforementioned blog post kinda glosses this over, but it's explicitly mentioned by custom-configuration.md ("the serviceMonitorSelector field has to be left empty"). Here's what you need to do: make the following change (and to be thorough I then ran minikube-teardown and then minikube-deploy):
~/prometheus-operator/contrib/kube-prometheus/manifests
$ git diff prometheus/prometheus-k8s.yaml
diff --git a/contrib/kube-prometheus/manifests/prometheus/prometheus-k8s.yaml b/contrib/kube-prometheus/manifests/prometheus/prometheus-k8s.yaml
-  serviceMonitorSelector:
-    matchExpressions:
-    - {key: k8s-app, operator: Exists}
+  #serviceMonitorSelector:
+  #  matchExpressions:
+  #  - {key: k8s-app, operator: Exists}

Customize alertmanager.yaml (to point to your own HipChat server & room)


I created my own alertmanager config file:
$ edit ./alertmanager.yaml

Then I applied my file to the running alertmanager:
$ kubectl -n monitoring create secret generic alertmanager-main --from-literal=alertmanager.yaml="$(< ./alertmanager.yaml)" --dry-run -oyaml | kubectl -n monitoring replace secret --filename=-

Customize prometheus.yaml (to add the cluster name label)

I made my change to customize the config:
$ cp ~/raw_prometheus.yaml ~/raw_prometheus_MODIFIED.yaml
$ edit ~/raw_prometheus_MODIFIED.yaml
$ diff ~/raw_prometheus.yaml ~/raw_prometheus_MODIFIED.yaml
4c4,5
<   external_labels: {}
---
>   external_labels:
>     cluster: josh-minikube

Then I updated the secret with my modified prometheus.yaml (and I retained the same configmaps.json):
$ kubectl -n monitoring create secret generic prometheus-k8s --from-literal=prometheus.yaml="$(< ~/raw_prometheus_MODIFIED.yaml)" --from-literal=configmaps.json="$(< ~/raw_configmaps.json)" --dry-run -oyaml | kubectl -n monitoring replace secret --filename=-

Cause an alert to fire (& see the notification in HipChat)

Borrowing a trick from another blog post (10 Most Common Reasons Kubernetes Deployments Fail (Part 1)), here's how I caused some alerts to fire:
$ kubectl run crasher --image=rosskukulinski/crashing-app
$ kubectl run fail --image=rosskukulinski/dne:v1.0.0
$ kubectl get pods --all-namespaces | egrep -i "fail|crash"
default       crasher-679745dd49-kkh4j               0/1       CrashLoopBackOff    5          5m
default       fail-ddd94648b-nqc9c                   0/1       ErrImagePull        0          5m

Though DeadMansSwitch is sufficient to verify the external label I created shows up (the Alertmanager README.md has more info about amtool):
$ amtool alert
Labels                                                        Annotations                                                                                                                                Starts At                Ends At                  Generator URL
alertname="DeadMansSwitch" cluster="josh-minikube" severity="none"  description="This is a DeadMansSwitch meant to ensure that the entire Alerting pipeline is functional." summary="Alerting DeadMansSwitch"  2018-03-03 00:47:37 UTC  0001-01-01 00:00:00 UTC  http://prometheus-k8s-1:9090/graph?g0.expr=vector%281%29&g0.tab=1

I hope this helps!

-Josh
Reply all
Reply to author
Forward
0 new messages