Alertmanager through a cluster is not sending any emails

385 views
Skip to first unread message

Nikhil Rao JL

unread,
Oct 26, 2021, 11:48:02 AM10/26/21
to Prometheus Users
I have deployed Alertmanager using a helm chart from prometheus-community/alertmanager. I have configured values.yaml to send an email when an alert is received. 
Below is the config
``` CONFIG
# Default values for alertmanager.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

replicaCount: 1

image:
  pullPolicy: IfNotPresent
  # Overrides the image tag whose default is the chart appVersion.
  tag: ""

extraArgs: {}

imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""

serviceAccount:
  # Specifies whether a service account should be created
  create: true
  # Annotations to add to the service account
  annotations: {}
  # The name of the service account to use.
  # If not set and create is true, a name is generated using the fullname template
  name:

podSecurityContext:
  fsGroup: 65534
dnsConfig: {}
  # nameservers:
  #   - 1.2.3.4
  # searches:
  #   - ns1.svc.cluster-domain.example
  #   - my.dns.search.suffix
  # options:
  #   - name: ndots
  #     value: "2"
  #   - name: edns0
securityContext:
  # capabilities:
  #   drop:
  #   - ALL
  # readOnlyRootFilesystem: true
  runAsUser: 65534
  runAsNonRoot: true
  runAsGroup: 65534

additionalPeers: []

service:
  annotations: {}
  type: ClusterIP
  port: 9093
  # if you want to force a specific nodePort. Must be use with service.type=NodePort
  # nodePort:

ingress:
  enabled: false
  className: ""
  annotations: {}
    # kubernetes.io/tls-acme: "true"
  hosts:
      paths:
        - path: /
          pathType: ImplementationSpecific
  tls: []
  #  - secretName: chart-example-tls
  #    hosts:

resources: {}
  # We usually recommend not to specify default resources and to leave this as a conscious
  # choice for the user. This also increases chances charts run on environments with little
  # resources, such as Minikube. If you do want to specify resources, uncomment the following
  # lines, adjust them as necessary, and remove the curly braces after 'resources:'.
  # limits:
  #   cpu: 100m
  #   memory: 128Mi
  # requests:
  #   cpu: 10m
  #   memory: 32Mi

nodeSelector: {}

tolerations: []

affinity: {}

statefulSet:
  annotations: {}

podAnnotations: {}
podLabels: {}

podDisruptionBudget: {}
  # maxUnavailable: 1
  # minAvailable: 1

command: []

persistence:
  enabled: true
  ## Persistent Volume Storage Class
  ## If defined, storageClassName: <storageClass>
  ## If set to "-", storageClassName: "", which disables dynamic provisioning
  ## If undefined (the default) or set to null, no storageClassName spec is
  ## set, choosing the default provisioner.
  ##
  # storageClass: "-"
  accessModes:
    - ReadWriteOnce
  size: 50Mi

config:
  global: {}

  receivers:
    - name: email-me
      email_configs:
      - to: <gmail account 1>
        from: <gmail account 2>
        smarthost: smtp.gmail.com:587
        auth_username: <gmail account 2>
        auth_identity: <gmail account 2>
        auth_password: <gmail account 2 password>

  route:
    receiver: email-me
## Monitors ConfigMap changes and POSTs to a URL
##
configmapReload:
  ## If false, the configmap-reload container will not be deployed
  ##
  enabled: false

  ## configmap-reload container name
  ##
  name: configmap-reload

  ## configmap-reload container image
  ##
  image:
    repository: jimmidyson/configmap-reload
    tag: v0.5.0
    pullPolicy: IfNotPresent

  ## configmap-reload resource requests and limits
  ##
  resources: {}

templates: {}
#   alertmanager.tmpl: |-

```
Only thing modified from the default values.yaml is config key (to send email)

Then when I create an alert using the below command
``` SHELL COMMAND
curl -H "Content-Type: application/json" -d '[{"labels":{"alertname":"alert_rules"}}]' localhost:9093/api/v1/alerts
```
Alert is being shown in the dashboard but the email isn't being sent

kubectl logs says connection refused
``` KUBECTL LOGS
level=info ts=2021-10-26T15:27:23.959Z caller=main.go:225 msg="Starting Alertmanager" version="(version=0.23.0, branch=HEAD, revision=61046b17771a57cfd4c4a51be370ab930a4d7d54)"
level=info ts=2021-10-26T15:27:23.959Z caller=main.go:226 build_context="(go=go1.16.7, user=root@e21a959be8d2, date=20210825-10:48:55)"
level=info ts=2021-10-26T15:27:23.960Z caller=cluster.go:184 component=cluster msg="setting advertise address explicitly" addr=172.17.0.5 port=9094
level=info ts=2021-10-26T15:27:23.961Z caller=cluster.go:671 component=cluster msg="Waiting for gossip to settle..." interval=2s
level=info ts=2021-10-26T15:27:23.981Z caller=coordinator.go:113 component=configuration msg="Loading configuration file" file=/etc/alertmanager/alertmanager.yml
level=info ts=2021-10-26T15:27:23.981Z caller=coordinator.go:126 component=configuration msg="Completed loading of configuration file" file=/etc/alertmanager/alertmanager.yml
level=info ts=2021-10-26T15:27:23.983Z caller=main.go:518 msg=Listening address=:9093
level=info ts=2021-10-26T15:27:23.983Z caller=tls_config.go:191 msg="TLS is disabled." http2=false
level=info ts=2021-10-26T15:27:25.961Z caller=cluster.go:696 component=cluster msg="gossip not settled" polls=0 before=0 now=1 elapsed=2.000616465s
level=info ts=2021-10-26T15:27:33.964Z caller=cluster.go:688 component=cluster msg="gossip settled; proceeding" elapsed=10.003041734s
level=warn ts=2021-10-26T15:30:00.606Z caller=notify.go:724 component=dispatcher receiver=email-me integration=email[0] msg="Notify attempt failed, will retry later" attempts=1 err="establish connection to server: dial tcp 95.216.67.149:587: connect: connection refused"
level=warn ts=2021-10-26T15:30:57.631Z caller=notify.go:724 component=dispatcher receiver=email-me integration=email[0] msg="Notify attempt failed, will retry later" attempts=2 err="establish connection to server: dial tcp 195.201.199.239:587: i/o timeout"
level=error ts=2021-10-26T15:32:50.573Z caller=dispatch.go:354 component=dispatcher msg="Notify for alerts failed" num_alerts=1 err="email-me/email[0]: notify retry canceled after 5 attempts: establish connection to server: dial tcp 95.216.67.149:587: connect: connection refused"
```
The same config and test works from a docker container.

Is there something wrong with the config or am I missing something. Please help me understand this.

Reply all
Reply to author
Forward
0 new messages