[ALERTMANAGER][ERROR] err="Post <redacted>: x509: certificate signed by unknown authority"

1,250 views
Skip to first unread message

BDT

unread,
Mar 7, 2020, 12:01:26 PM3/7/20
to Prometheus Users
Hi all !

I have a problem to send alerts to slack via webhook. I have a traefik proxy and alertmanager which run in docker swarm.
So the communication between prometheus and alert is done by docker network service (alermanager:9093).

Traefik generates certficates with acme let's encrypt and working well but when alertmanager push an alert to slack, i get this error: "Post <redacted>: x509: certificate signed by unknown authority"
I don't know if something has changed with let s encrypt certificate or slack webhook or alertmanager version (v0.19) but it worked before.

My alertmanager configuration:

route:
  group_by: [alertname, job]
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  # If an alert isn't caught by a route, send it slack.
  receiver: slack_general
  routes:
  # Send severity=slack alerts to slack.
  - match:
      severity: info|warning|critical
    receiver: slack_general
receivers:
- name: slack_general
  slack_configs:
  - api_url: *****
    channel: '****
    username: "****"
    send_resolved: true
    color: '{{ if eq .Status "firing" }}danger{{ else }}good{{ end }}'
    title: '{{ template "custom.title" . }}'
    title_link: '{{ template "slack.default.titlelink" . }}'
    pretext: '{{ .CommonAnnotations.summary }}'
    text: |-
      {{ range .Alerts }}
        *Alert:* {{ .Annotations.summary }} - `{{ .Labels.severity }}`
        *Description:* {{ .Annotations.description }}
        *Details:*
        {{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}`
        {{ end }}
      {{ end }}
    fallback: '{{ template "slack.default.fallback" . }}'
    icon_url: '{{ template "slack.default.iconurl" . }}'
templates:
- /etc/alertmanager/templates/notification.tmpl

I have no tls configuration in alertmanager but I think I don't need it because it's traefik that encrypt the alert and send it to slack. Am I wrong ?

Thanks for your time.

Christian Hoffmann

unread,
Mar 8, 2020, 4:35:58 PM3/8/20
to BDT, Prometheus Users
Hi,

On 3/7/20 6:01 PM, BDT wrote:
> I have a problem to send alerts to slack via webhook. I have a traefik
> proxy and alertmanager which run in docker swarm.
> So the communication between prometheus and alert is done by docker
> network service (alermanager:9093).
>
> Traefik generates certficates with acme let's encrypt and working well
> but when alertmanager push an alert to slack, i get this error: "Post
> <redacted>: x509: certificate signed by unknown authority"
> I don't know if something has changed with let s encrypt certificate or
> slack webhook or alertmanager version (v0.19) but it worked before.

Not sure if I understand your setup completely. Some ideas nevertheless:

Could it be that you are affected by the recent Let's encrypt cert
revocations?
https://community.letsencrypt.org/t/revoking-certain-certificates-on-march-4/114864

If you have confirmed that this is not the case, it may help to get some
more debugging insights:

- Increase --log.level to debug
- Capture the traffic using tcpdump and analyze it (wireshark is
probably helpful) -- what is the actual certificate? does it look alright?

Kind regards,
Christian
Message has been deleted
Message has been deleted

BDT

unread,
Mar 9, 2020, 5:12:41 AM3/9/20
to Prometheus Users
Hi,

Ok I did a mistake when I explain my setup. I have a traefik reverse proxy but alertmanager sends directly the alert to slack. There is not traefik between.
(alertmanager -> loadbalancer ovh -> internet -> slack)

I have enabled debug level:

=dispatch.go:104 component=dispatcher msg="Received alert" alert=InstanceDown[921d528][active]

level=debug ts=2020-03-09T08:46:12.118Z caller=dispatch.go:104 component=dispatcher msg="Received alert" alert=InstanceDown[0d9a507][active]

level=debug ts=2020-03-09T08:46:12.118Z caller=dispatch.go:104 component=dispatcher msg="Received alert" alert=InstanceDown[ef6c116][active]

level=debug ts=2020-03-09T08:46:12.119Z caller=dispatch.go:432 component=dispatcher aggrGroup="{}:{alertname=\"InstanceDown\", job=\"dockerd-exporter\"}" msg=flushing alerts="[InstanceDown[921d528][active] InstanceDown[0d9a507][active] InstanceDown[ef6c116][active]]"

level=debug ts=2020-03-09T08:46:12.143Z caller=notify.go:667 component=dispatcher msg="Notify attempt failed" attempt=1 integration=slack receiver=slack_general err="Post <redacted>: x509: certificate signed by unknown authority"


My certificate is valid and I have checked for revocation, it's ok.

You can go to https://alertmanager.cloud.patrowl.io and check. Just cancel the http auth



Logs for tcpdump port https - Alertmanager container:


09:01:36.128928 IP (tos 0x0, ttl 64, id 43540, offset 0, flags [DF], proto TCP (6), length 60)
    ****.36078 > server-54-240-168-90.ams54.r.cloudfront.net.443: Flags [S], cksum 0x9b9c (incorrect -> 0x0dac), seq 1115653932, win 29200, options [mss 1460,sackOK,TS val 1409908855 ecr 0,nop,wscale 7], length 0
09:01:36.135810 IP (tos 0x0, ttl 240, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    server-54-240-168-90.ams54.r.cloudfront.net.443 > ****.36078: Flags [S.], cksum 0xcf86 (correct), seq 3649616458, ack 1115653933, win 28960, options [mss 1460,sackOK,TS val 105751778 ecr 1409908855,nop,wscale 8], length 0
09:01:36.135860 IP (tos 0x0, ttl 64, id 43541, offset 0, flags [DF], proto TCP (6), length 52)
    ****.36078 > server-54-240-168-90.ams54.r.cloudfront.net.443: Flags [.], cksum 0x9b94 (incorrect -> 0x6e88), ack 1, win 229, options [nop,nop,TS val 1409908862 ecr 105751778], length 0
09:01:36.136204 IP (tos 0x0, ttl 64, id 43542, offset 0, flags [DF], proto TCP (6), length 267)
    ****.36078 > server-54-240-168-90.ams54.r.cloudfront.net.443: Flags [P.], cksum 0x9c6b (incorrect -> 0x8d07), seq 1:216, ack 1, win 229, options [nop,nop,TS val 1409908863 ecr 105751778], length 215
09:01:36.143236 IP (tos 0x0, ttl 240, id 7848, offset 0, flags [DF], proto TCP (6), length 52)
    server-54-240-168-90.ams54.r.cloudfront.net.443 > ****.36078: Flags [.], cksum 0x6e1e (correct), ack 216, win 118, options [nop,nop,TS val 105751779 ecr 1409908863], length 0
09:01:36.143255 IP (tos 0x0, ttl 240, id 7849, offset 0, flags [DF], proto TCP (6), length 2948)
    server-54-240-168-90.ams54.r.cloudfront.net.443 > ****.36078: Flags [.], cksum 0xa6e4 (incorrect -> 0x9386), seq 1:2897, ack 216, win 118, options [nop,nop,TS val 105751779 ecr 1409908863], length 2896
09:01:36.143302 IP (tos 0x0, ttl 64, id 43543, offset 0, flags [DF], proto TCP (6), length 52)
    ****.36078 > server-54-240-168-90.ams54.r.cloudfront.net.443: Flags [.], cksum 0x9b94 (incorrect -> 0x622b), ack 2897, win 274, options [nop,nop,TS val 1409908870 ecr 105751779], length 0
09:01:36.145411 IP (tos 0x0, ttl 240, id 7851, offset 0, flags [DF], proto TCP (6), length 1102)
    server-54-240-168-90.ams54.r.cloudfront.net.443 > ****.36078: Flags [P.], cksum 0x46af (correct), seq 2897:3947, ack 216, win 118, options [nop,nop,TS val 105751779 ecr 1409908863], length 1050
09:01:48.657107 IP (tos 0x0, ttl 64, id 64508, offset 0, flags [DF], proto TCP (6), length 60)
    ****.36086 > server-54-240-168-90.ams54.r.cloudfront.net.443: Flags [S], cksum 0x9b9c (incorrect -> 0x8c7c), seq 1465565832, win 29200, options [mss 1460,sackOK,TS val 1409921383 ecr 0,nop,wscale 7], length 0
09:01:48.664267 IP (tos 0x0, ttl 240, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    server-54-240-168-90.ams54.r.cloudfront.net.443 > ****.36086: Flags [S.], cksum 0x1678 (correct), seq 284513267, ack 1465565833, win 28960, options [mss 1460,sackOK,TS val 101911527 ecr 1409921383,nop,wscale 8], length 0
09:01:48.664321 IP (tos 0x0, ttl 64, id 64509, offset 0, flags [DF], proto TCP (6), length 52)
    ****.36086 > server-54-240-168-90.ams54.r.cloudfront.net.443: Flags [.], cksum 0x9b94 (incorrect -> 0xb578), ack 1, win 229, options [nop,nop,TS val 1409921391 ecr 101911527], length 0
09:01:48.664630 IP (tos 0x0, ttl 64, id 64510, offset 0, flags [DF], proto TCP (6), length 267)
    ****.36086 > server-54-240-168-90.ams54.r.cloudfront.net.443: Flags [P.], cksum 0x9c6b (incorrect -> 0x4b95), seq 1:216, ack 1, win 229, options [nop,nop,TS val 1409921391 ecr 101911527], length 215
09:01:48.671642 IP (tos 0x0, ttl 240, id 22556, offset 0, flags [DF], proto TCP (6), length 52)
    server-54-240-168-90.ams54.r.cloudfront.net.443 > ****.36086: Flags [.], cksum 0xb50f (correct), ack 216, win 118, options [nop,nop,TS val 101911528 ecr 1409921391], length 0
09:01:48.671874 IP (tos 0x0, ttl 240, id 22557, offset 0, flags [DF], proto TCP


feel free to ask questions if you don't understand something


Thanks for your help


Best regards.

Brian Candler

unread,
Mar 9, 2020, 6:44:28 AM3/9/20
to Prometheus Users
On Monday, 9 March 2020 09:12:41 UTC, BDT wrote:

level=debug ts=2020-03-09T08:46:12.143Z caller=notify.go:667 component=dispatcher msg="Notify attempt failed" attempt=1 integration=slack receiver=slack_general err="Post <redacted>: x509: certificate signed by unknown authority"



The bit you've redacted - at least, the hostname in the URL - is the important part.  The error seems to be saying the *remote* server's certificate is bad, but not knowing the hostname you're connecting to, we can't check that.

Is it showing the same URL as the slack "api_url: *****" in your config?

Can you post to it using curl -v?  That would prove whether the certificate is OK or not.

Are you behind some sort of nasty corporate firewall which breaks TLS by performing man-in-the-middle decryption?
 

My certificate is valid and I have checked for revocation, it's ok.

You can go to https://alertmanager.cloud.patrowl.io and check. Just cancel the http auth



Canceling the http auth just gives 401 forbidden.

BDT

unread,
Mar 9, 2020, 7:06:59 AM3/9/20
to Prometheus Users
Hi Brian,

I can't give you all url but I have set up:
- api_url: https://hooks.slack.com/services/****

I have tested from my alertmanager container:

curl  https://hooks.slack.com/services/**** -d '{"text": "Hello, world."}'
ok

I have received the message

To check if the certificate is ok, just cancel the http auth and click on the padlock of the adress bar and check the tls certificate
I am behind the firewall of ovh but didn't config it to perfom mitm decryption

I have also create an inte plateform with traefik friday and generate new certificate but same problem with alertmanager

Thanks for your time

Best regards

Jakub Jakubik

unread,
Mar 9, 2020, 7:08:34 AM3/9/20
to Brian Candler, Prometheus Users
check if you have the proper CAs mounted in the docker image of alertmanager.
/etc/ssl/certs or the equivalent in your distro - if not, then there is no source of truth for trusted CAs

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/f2c1e343-8014-4b9e-8bdc-6da67a6a1b61%40googlegroups.com.


--

Kuba Jakubik

SRE Tech Lead

Netguru - Building software for world changers

jakub....@netguru.com
netguru.com
facebooktwitterlinkedin

BDT

unread,
Mar 9, 2020, 8:49:49 AM3/9/20
to Prometheus Users
The Ca of slack is DigiCert Global Root CA and I have it in my /etc/ssl/certs folder.

I could also send a message via slack API using the url webhook and it worked



Le lundi 9 mars 2020 12:08:34 UTC+1, Jakub Jakubik a écrit :
check if you have the proper CAs mounted in the docker image of alertmanager.
/etc/ssl/certs or the equivalent in your distro - if not, then there is no source of truth for trusted CAs

On Mon, Mar 9, 2020 at 11:44 AM Brian Candler <b.ca...@pobox.com> wrote:
On Monday, 9 March 2020 09:12:41 UTC, BDT wrote:

level=debug ts=2020-03-09T08:46:12.143Z caller=notify.go:667 component=dispatcher msg="Notify attempt failed" attempt=1 integration=slack receiver=slack_general err="Post <redacted>: x509: certificate signed by unknown authority"



The bit you've redacted - at least, the hostname in the URL - is the important part.  The error seems to be saying the *remote* server's certificate is bad, but not knowing the hostname you're connecting to, we can't check that.

Is it showing the same URL as the slack "api_url: *****" in your config?

Can you post to it using curl -v?  That would prove whether the certificate is OK or not.

Are you behind some sort of nasty corporate firewall which breaks TLS by performing man-in-the-middle decryption?
 

My certificate is valid and I have checked for revocation, it's ok.

You can go to https://alertmanager.cloud.patrowl.io and check. Just cancel the http auth



Canceling the http auth just gives 401 forbidden.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to promethe...@googlegroups.com.

Brian Candler

unread,
Mar 9, 2020, 9:59:17 AM3/9/20
to Prometheus Users
On Monday, 9 March 2020 11:06:59 UTC, BDT wrote:
To check if the certificate is ok, just cancel the http auth and click on the padlock of the adress bar and check the tls certificate
I am behind the firewall of ovh but didn't config it to perfom mitm decryption


The problem *appears* to be that alertmanager is saying the certificate of "hooks.slack.com" is wrong.  The certificate looks OK to me, although it's a wildcard:

    Signature Algorithm: sha256WithRSAEncryption
        Issuer: C=US, O=DigiCert Inc, CN=DigiCert SHA2 Secure Server CA
        Validity
            Not Before: Feb  8 00:00:00 2018 GMT
            Not After : Feb 12 12:00:00 2021 GMT
        Subject: C=US, ST=CA, L=San Francisco, O=Slack Technologies, Inc., CN=slack.com
...
            X509v3 Subject Alternative Name:
                DNS:slack.com, DNS:*.slack.com

So as was already suggested: first check that certificate validation *inside your alertmanager docker container* is working.  e.g.

docker exec -it <containername> bash

Maybe you are missing ca-certificates inside the container?

If not, then I don't know.  Presumably lots of other people are sending to hooks.slack.com successfully, which means that the wildcard cert validation is working.

Note: slack is fronted by cloudfront, and I don't get the certificate unless I include servername (SNI) extension:

$ openssl s_client -connect hooks.slack.com:443 -servername hooks.slack.com

I'm *fairly* sure golang/prometheus will do this by default, but there's a way to override it if necessary:

http_config:
  tls_config:
    server_name: hooks.slack.com

The other thing you could try temporarily, just while you debug the problem, is:

http_config:
  tls_config:
    insecure_skip_verify: true

If that makes the problem go away, you know for sure it's something to do with alertmanager incorrectly validating the certificate from slack.  if it doesn't, then you at least don't need to keep barking up the wrong tree.

BDT

unread,
Mar 9, 2020, 10:59:45 AM3/9/20
to Prometheus Users
Ok I would try to add
http_config:
  tls_config:
    insecure_skip_verify: true

The doc of alertmanager:

# Configures the TLS settings.
tls_config:
  [ <tls_config> ]

# Optional proxy URL.
[ proxy_url: <string> ]


# CA certificate to validate the server certificate with.
[ ca_file: <filepath> ]

# Certificate and key files for client cert authentication to the server.
[ cert_file: <filepath> ]
[ key_file: <filepath> ]

# ServerName extension to indicate the name of the server.
# http://tools.ietf.org/html/rfc4366#section-3.1
[ server_name: <string> ]

# Disable validation of the server certificate.
[ insecure_skip_verify: <boolean> | default = false]

So I try this:

tls_config:
# CA certificate to validate the server certificate with.
ca_file: /etc/ssl/DigiCert_Global_Root_CA.pem

# ServerName extension to indicate the name of the server.
# http://tools.ietf.org/html/rfc4366#section-3.1
server_name: hooks.slack.com

# Disable validation of the server certificate.
insecure_skip_verify: false

Get an error in the config. Sure I have missed something ^^

Brian Candler

unread,
Mar 9, 2020, 11:12:59 AM3/9/20
to Prometheus Users
On Monday, 9 March 2020 14:59:45 UTC, BDT wrote:
The doc of alertmanager:


At the top level is an item called "receivers"

Under this is a list of items of type <receiver>

One possibility for <receiver> is slack_configs

Under slack_configs is a list of items of type <slack_config>

This type includes various settings, one of which is http_config of type <http_config>

<http_config> has various settings including tls_config of type <tls_config>

This includes various settings, including insecure_skip_verify.

So at the top level, you will end up with something like this (untested):

route:
  ...
receivers:
- name: slack_general
  slack_configs:
    - channel: ...
      actions: ...
      http_config:
        tls_config:
          insecure_skip_verify: true

BDT

unread,
Mar 9, 2020, 12:08:30 PM3/9/20
to Prometheus Users
OK, your trick fix my problem ^^
I had imported my CA but didnt specify it in alertmanager configuration file

receivers:
- name: slack_general
  slack_configs:
  - api_url: ***
    channel: '****'
    username: "****"
    send_resolved: true
    http_config:
      tls_config:
        ca_file: /usr/local/share/ca-certificates/DigiCertGlobalRootCA.crt
        server_name: "hooks.slack.com"

The thing is, I could do curl -v https://hooks.slack.com and it worked so I couldn't imagine that was a certificate problem but nevermind it's working !

Thank you Brian and thank you guys !

BDT

unread,
Mar 9, 2020, 12:19:43 PM3/9/20
to Prometheus Users
I didn't verify but I miss to tell you something about docker image. I have build my own one like this. If the alertmanager image contains CA, I think it didn't copy automatically to my new image.
I'm not an expert with multi stating docker image but it could be my problem. I have to be carefull ...

ARG ALERT_TAG=latest
FROM prom/alertmanager:${ALERT_TAG} as build

FROM alpine:3.10.2
RUN apk add gettext

COPY --from=build /bin/alertmanager /bin/alertmanager

Le samedi 7 mars 2020 18:01:26 UTC+1, BDT a écrit :

Brian Candler

unread,
Mar 9, 2020, 1:54:57 PM3/9/20
to Prometheus Users
Cool!

If you installed it into /usr/local/share/ca-certificates then running "update-ca-certificates" should pick it up into the global trust store (Ubuntu at least).  However, I would have thought installing the package "ca-certificates" ought to bring in all the normal ones, so not sure why alertmanager isn't using it.

Also, see if you can remove the "server_name" option now, and just leave "ca_file".

Brian Candler

unread,
Mar 9, 2020, 1:59:26 PM3/9/20
to Prometheus Users


On Monday, 9 March 2020 16:19:43 UTC, BDT wrote:
I didn't verify but I miss to tell you something about docker image. I have build my own one like this. If the alertmanager image contains CA, I think it didn't copy automatically to my new image.
I'm not an expert with multi stating docker image but it could be my problem. I have to be carefull ...

ARG ALERT_TAG=latest
FROM prom/alertmanager:${ALERT_TAG} as build

FROM alpine:3.10.2
RUN apk add gettext

COPY --from=build /bin/alertmanager /bin/alertmanager


Ah I see - you just pulled out the binary.  So it would be up to you to set up everything else required, such as "apk add ca-certificates".
Reply all
Reply to author
Forward
0 new messages