implement smtp connection with blackbox exporter

437 views
Skip to first unread message

nina guo

unread,
May 18, 2022, 6:34:19 AM5/18/22
to Prometheus Users
Hi,

Can I use the following mode to implement Nagios check_smtp?

smtp_starttls:

prober: tcp

timeout: 5s

tcp:

query_response:

- expect: "^220 ([^ ]+) ESMTP (.+)$"

- send: "EHLO prober\r"

- expect: "^250-STARTTLS"

- send: "STARTTLS\r"

- expect: "^220"

- starttls: true

- send: "EHLO prober\r"

- expect: "^250-AUTH"

- send: "QUIT\r"

smtp_starttls:

prober: tcp

timeout: 5s

tcp:

query_response:

- expect: "^220 ([^ ]+) ESMTP (.+)$"

- send: "EHLO prober\r"

- expect: "^250-STARTTLS"

- send: "STARTTLS\r"

- expect: "^220"

- starttls: true

- send: "EHLO prober\r"

- expect: "^250-AUTH"

- send: "QUIT\r"

Matthias Rampke

unread,
May 18, 2022, 7:18:17 AM5/18/22
to nina guo, Prometheus Users
Yes, that should work. If you have trouble, open(curl) the scrape URL but add &debug=true to see what happens.

/MR

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/22fa8e2d-dd7f-4a02-8c26-efb1b12824edn%40googlegroups.com.

nina guo

unread,
May 18, 2022, 7:36:37 AM5/18/22
to Prometheus Users
Thank you very much.

nina guo

unread,
May 18, 2022, 7:55:28 AM5/18/22
to Prometheus Users
Is there a way to know what does smtp_startTLS module exactly do?

nina guo

unread,
May 18, 2022, 8:11:00 AM5/18/22
to Prometheus Users

still have 2 questions:
- which module can be used to implement Nagios check_ldap_startTLS?
- Is there a way to know what does smtp_startTLS module exactly do?

Matthias Rampke

unread,
May 18, 2022, 8:21:43 AM5/18/22
to nina guo, Prometheus Users
It's really just TCP with some scripted interactions. Look at the steps in your configuration to see exactly what they are.

/MR

nina guo

unread,
May 18, 2022, 8:41:37 AM5/18/22
to Prometheus Users
could you help to review if the following configuration is right?

- job_name: smtp
        metrics_path: /probe
        params:
          module: [smtp_starttls]
        static_configs:
          - targets:
              - 127.0.0.1
        relabel_configs:
          - source_labels: [__address__]
            target_label: __param_target
          - source_labels: [__param_target]
            target_label: instance
          - target_label: __address__
            replacement: prometheus-blackbox-exporter:9115

smtp_starttls:
        prober: tcp
        timeout: 5s
        tcp:
          query_response:
            - expect: "^220 ([^ ]+) ESMTP (.+)$"
            - send: "EHLO prober\r"
            - expect: "^250-STARTTLS"
            - send: "STARTTLS\r"
            - expect: "^220"
            - starttls: true
            - send: "EHLO prober\r"
            - expect: "^250-AUTH"
            - send: "QUIT\r"

Brian Candler

unread,
May 18, 2022, 11:00:50 AM5/18/22
to Prometheus Users
On Wednesday, 18 May 2022 at 13:41:37 UTC+1 ninag...@gmail.com wrote:
        static_configs:
          - targets:
              - 127.0.0.1
        relabel_configs:
          - source_labels: [__address__]
            target_label: __param_target
          - source_labels: [__param_target]
            target_label: instance
          - target_label: __address__
            replacement: prometheus-blackbox-exporter:9115


This is saying that you are talking to blackbox_exporter running on host "prometheus-blackbox-exporter", but blackbox_exporter will be testing the SMTP server on 127.0.0.1, i.e. the mail server that you are testing is running on the same server as blackbox_exporter.

If that's what you want, then it looks OK to me.  If you test it and it doesn't do what you expect, then you can start debugging it.

> - which module can be used to implement Nagios check_ldap_startTLS?

LDAP is a binary protocol. You won't be able to test this using blackbox_exporter, except for checking that a connection is accepted on port 389.

You could look for a specific LDAP exporter.  Or you could write a script which performs the check (e.g. by shelling out to "ldapsearch") and returns prometheus metrics.  You can then run this script from cron and get it to write a file to be picked up by node_exporter's textfile collector; or you can run it under exporter_exporter; or you can write your own exporter (basically just a small webserver which listens for scrapes).

Alternatively, if you already have a Nagios check module which does what you want, you can run it under nrpe_exporter.

> - Is there a way to know what does smtp_startTLS module exactly do?

It negotiates a TLS connection on the TCP stream.  If it's successful, all the subsequent communication on this stream is TLS-encrypted.  I'm not sure what else to say about it!

nina guo

unread,
May 19, 2022, 10:40:06 AM5/19/22
to Prometheus Users
Thank you Brian.

If I only want to test if SMTP connection can be setup or not, like Nagios check_smtp, can I use smtp_starttls module? I got failed status of smtp_starttls, and from the log of blackbox exporter, I even cannot see the output of "smtp_starttls" module. I curled, the response returned all the HTLM info without useful information.

Brian Candler

unread,
May 19, 2022, 3:28:59 PM5/19/22
to Prometheus Users
Hit blackbox_exporter with curl, and add &debug=true to see what's happening.

My guess is that the target mailserver doesn't have a valid TLS certificate. This is almost certainly true if you are connecting to it as "127.0.0.1" instead of using its real hostname.

nina guo

unread,
Jul 4, 2022, 5:27:32 AM7/4/22
to Prometheus Users
Hi Brian,

I test and got the following info.

curl -g 'http://0.0.0.0:9115/probe?module=smtp_starttls&target=1.1.1.1:25&debug=true'
Logs for the probe:
ts=2022-07-04T07:40:59.871275007Z caller=main.go:320 module=smtp_starttls target= 1.1.1.1  :25 level=info msg="Beginning probe" probe=tcp timeout_seconds=5
ts=2022-07-04T07:40:59.871360243Z caller=tcp.go:40 module=smtp_starttls target= 1.1.1.1  :25 level=info msg="Resolving target address" ip_protocol=ip6
ts=2022-07-04T07:40:59.871381401Z caller=tcp.go:40 module=smtp_starttls target= 1.1.1.1  :25 level=info msg="Resolving target address" ip_protocol=ip4
ts=2022-07-04T07:40:59.871398489Z caller=tcp.go:40 module=smtp_starttls target= 1.1.1.1  :25 level=info msg="Resolved target address" ip=100.70.222.51
ts=2022-07-04T07:40:59.871417521Z caller=tcp.go:121 module=smtp_starttls target= 1.1.1.1  :25 level=info msg="Dialing TCP without TLS"
ts=2022-07-04T07:40:59.882814373Z caller=main.go:130 module=smtp_starttls target= 1.1.1.1  :25 level=info msg="Successfully dialed"
ts=2022-07-04T07:40:59.882850655Z caller=main.go:130 module=smtp_starttls target= 1.1.1.1  :25 level=info msg="Processing query response entry" entry_number=0
ts=2022-07-04T07:40:59.89189959Z caller=main.go:130 module=smtp_starttls target= 1.1.1.1  :25 level=debug msg="Read line" line="220 ... ESMTP"
ts=2022-07-04T07:41:04.871897159Z caller=main.go:130 module=smtp_starttls target= 1.1.1.1  :25 level=error msg="Error reading from connection" err="read tcp4 0.0.0.0:35004->1.1.1.1:25: i/o timeout"
ts=2022-07-04T07:41:04.871975741Z caller=main.go:320 module=smtp_starttls target= 1.1.1.1  :25 level=error msg="Probe failed" duration_seconds=5.000650342



Metrics that would have been returned:
# HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds
# TYPE probe_dns_lookup_time_seconds gauge
probe_dns_lookup_time_seconds 5.7111e-05
# HELP probe_duration_seconds Returns how long the probe took to complete in seconds
# TYPE probe_duration_seconds gauge
probe_duration_seconds 5.000650342
# HELP probe_failed_due_to_regex Indicates if probe failed due to regex
# TYPE probe_failed_due_to_regex gauge
probe_failed_due_to_regex 0
# HELP probe_ip_addr_hash Specifies the hash of IP address. It's useful to detect if the IP address changes.
# TYPE probe_ip_addr_hash gauge
probe_ip_addr_hash 2.493392506e+09
# HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6
# TYPE probe_ip_protocol gauge
probe_ip_protocol 4
# HELP probe_success Displays whether or not the probe was a success
# TYPE probe_success gauge
probe_success 0



Module configuration:
prober: tcp
timeout: 5s
http:
    ip_protocol_fallback: true
    follow_redirects: true
tcp:
    ip_protocol_fallback: true

    query_response:
        - expect: ^220 ([^ ]+) ESMTP (.+)$
        - send: "EHLO prober\r"
        - expect: ^250-STARTTLS
        - send: "STARTTLS\r"
        - expect: ^220
        - starttls: true
        - send: "EHLO prober\r"
        - expect: ^250-AUTH
        - send: "QUIT\r"
icmp:
    ip_protocol_fallback: true
dns:
    ip_protocol_fallback: true

Brian Candler

unread,
Jul 4, 2022, 5:55:09 AM7/4/22
to Prometheus Users
The following works for me with blackbox_exporter 0.21.0 (maybe your first 'expect' regexp needs to be quoted?)

modules:
  smtp_starttls:
      prober: tcp
      timeout: 5s
      tcp:
        query_response:
          - expect: "^220 ([^ ]+) ESMTP (.+)$"
            send: "EHLO prober"
          - expect: "^250-STARTTLS"
          - expect: "^250 .*$"
            send: "STARTTLS"
          - expect: "^220"

            starttls: true
          - send: "EHLO prober"
          - expect: "^250 .*$"
            send: "QUIT"


Result:

# curl -g 'localhost:9115/probe?module=smtp_starttls&target=aspmx.l.google.com:25&debug=true'
Logs for the probe:
ts=2022-07-04T09:50:46.764604425Z caller=main.go:351 module=smtp_starttls target=aspmx.l.google.com:25 level=info msg="Beginning probe" probe=tcp timeout_seconds=5
ts=2022-07-04T09:50:46.764875668Z caller=tcp.go:40 module=smtp_starttls target=aspmx.l.google.com:25 level=info msg="Resolving target address" target=aspmx.l.google.com ip_protocol=ip6
ts=2022-07-04T09:50:46.765954096Z caller=tcp.go:40 module=smtp_starttls target=aspmx.l.google.com:25 level=info msg="Resolved target address" target=aspmx.l.google.com ip=2a00:1450:400c:c07::1b
ts=2022-07-04T09:50:46.766011111Z caller=tcp.go:121 module=smtp_starttls target=aspmx.l.google.com:25 level=info msg="Dialing TCP without TLS"
ts=2022-07-04T09:50:46.776962616Z caller=main.go:144 module=smtp_starttls target=aspmx.l.google.com:25 level=info msg="Successfully dialed"
ts=2022-07-04T09:50:46.77708032Z caller=main.go:144 module=smtp_starttls target=aspmx.l.google.com:25 level=info msg="Processing query response entry" entry_number=0
ts=2022-07-04T09:50:46.788075017Z caller=main.go:144 module=smtp_starttls target=aspmx.l.google.com:25 level=debug msg="Read line" line="220 mx.google.com ESMTP u1-20020a056000038100b0021bbcef3a69si36572085wrf.542 - gsmtp"
ts=2022-07-04T09:50:46.788168204Z caller=main.go:144 module=smtp_starttls target=aspmx.l.google.com:25 level=info msg="Regexp matched" regexp="^220 ([^ ]+) ESMTP (.+)$" line="220 mx.google.com ESMTP u1-20020a056000038100b0021bbcef3a69si36572085wrf.542 - gsmtp"
ts=2022-07-04T09:50:46.788223914Z caller=main.go:144 module=smtp_starttls target=aspmx.l.google.com:25 level=debug msg="Sending line" line="EHLO prober"
ts=2022-07-04T09:50:46.788362926Z caller=main.go:144 module=smtp_starttls target=aspmx.l.google.com:25 level=info msg="Processing query response entry" entry_number=1
ts=2022-07-04T09:50:46.801755535Z caller=main.go:144 module=smtp_starttls target=aspmx.l.google.com:25 level=debug msg="Read line" line="250-mx.google.com at your service, [xx:xx:xx:xx::33]"
ts=2022-07-04T09:50:46.801856147Z caller=main.go:144 module=smtp_starttls target=aspmx.l.google.com:25 level=debug msg="Read line" line="250-SIZE 157286400"
ts=2022-07-04T09:50:46.801889927Z caller=main.go:144 module=smtp_starttls target=aspmx.l.google.com:25 level=debug msg="Read line" line=250-8BITMIME
ts=2022-07-04T09:50:46.801922312Z caller=main.go:144 module=smtp_starttls target=aspmx.l.google.com:25 level=debug msg="Read line" line=250-STARTTLS
ts=2022-07-04T09:50:46.801955499Z caller=main.go:144 module=smtp_starttls target=aspmx.l.google.com:25 level=info msg="Regexp matched" regexp=^250-STARTTLS line=250-STARTTLS
ts=2022-07-04T09:50:46.801989646Z caller=main.go:144 module=smtp_starttls target=aspmx.l.google.com:25 level=info msg="Processing query response entry" entry_number=2
ts=2022-07-04T09:50:46.802029066Z caller=main.go:144 module=smtp_starttls target=aspmx.l.google.com:25 level=debug msg="Read line" line=250-ENHANCEDSTATUSCODES
ts=2022-07-04T09:50:46.802063318Z caller=main.go:144 module=smtp_starttls target=aspmx.l.google.com:25 level=debug msg="Read line" line=250-PIPELINING
ts=2022-07-04T09:50:46.802094713Z caller=main.go:144 module=smtp_starttls target=aspmx.l.google.com:25 level=debug msg="Read line" line=250-CHUNKING
ts=2022-07-04T09:50:46.802126198Z caller=main.go:144 module=smtp_starttls target=aspmx.l.google.com:25 level=debug msg="Read line" line="250 SMTPUTF8"
ts=2022-07-04T09:50:46.802159858Z caller=main.go:144 module=smtp_starttls target=aspmx.l.google.com:25 level=info msg="Regexp matched" regexp="^250 .*$" line="250 SMTPUTF8"
ts=2022-07-04T09:50:46.802191065Z caller=main.go:144 module=smtp_starttls target=aspmx.l.google.com:25 level=debug msg="Sending line" line=STARTTLS
ts=2022-07-04T09:50:46.802271697Z caller=main.go:144 module=smtp_starttls target=aspmx.l.google.com:25 level=info msg="Processing query response entry" entry_number=3
ts=2022-07-04T09:50:46.812211682Z caller=main.go:144 module=smtp_starttls target=aspmx.l.google.com:25 level=debug msg="Read line" line="220 2.0.0 Ready to start TLS"
ts=2022-07-04T09:50:46.812279339Z caller=main.go:144 module=smtp_starttls target=aspmx.l.google.com:25 level=info msg="Regexp matched" regexp=^220 line="220 2.0.0 Ready to start TLS"
ts=2022-07-04T09:50:46.833573801Z caller=main.go:144 module=smtp_starttls target=aspmx.l.google.com:25 level=info msg="TLS Handshake (client) succeeded."
ts=2022-07-04T09:50:46.833793385Z caller=main.go:144 module=smtp_starttls target=aspmx.l.google.com:25 level=info msg="Processing query response entry" entry_number=4
ts=2022-07-04T09:50:46.833838422Z caller=main.go:144 module=smtp_starttls target=aspmx.l.google.com:25 level=debug msg="Sending line" line="EHLO prober"
ts=2022-07-04T09:50:46.833960852Z caller=main.go:144 module=smtp_starttls target=aspmx.l.google.com:25 level=info msg="Processing query response entry" entry_number=5
ts=2022-07-04T09:50:46.84464181Z caller=main.go:144 module=smtp_starttls target=aspmx.l.google.com:25 level=debug msg="Read line" line="250-mx.google.com at your service, [xx:xx:xx:xx::33]"
ts=2022-07-04T09:50:46.844771942Z caller=main.go:144 module=smtp_starttls target=aspmx.l.google.com:25 level=debug msg="Read line" line="250-SIZE 157286400"
ts=2022-07-04T09:50:46.844839074Z caller=main.go:144 module=smtp_starttls target=aspmx.l.google.com:25 level=debug msg="Read line" line=250-8BITMIME
ts=2022-07-04T09:50:46.844910271Z caller=main.go:144 module=smtp_starttls target=aspmx.l.google.com:25 level=debug msg="Read line" line=250-ENHANCEDSTATUSCODES
ts=2022-07-04T09:50:46.844974538Z caller=main.go:144 module=smtp_starttls target=aspmx.l.google.com:25 level=debug msg="Read line" line=250-PIPELINING
ts=2022-07-04T09:50:46.845061058Z caller=main.go:144 module=smtp_starttls target=aspmx.l.google.com:25 level=debug msg="Read line" line=250-CHUNKING
ts=2022-07-04T09:50:46.84512579Z caller=main.go:144 module=smtp_starttls target=aspmx.l.google.com:25 level=debug msg="Read line" line="250 SMTPUTF8"
ts=2022-07-04T09:50:46.845193403Z caller=main.go:144 module=smtp_starttls target=aspmx.l.google.com:25 level=info msg="Regexp matched" regexp="^250 .*$" line="250 SMTPUTF8"
ts=2022-07-04T09:50:46.84525809Z caller=main.go:144 module=smtp_starttls target=aspmx.l.google.com:25 level=debug msg="Sending line" line=QUIT
ts=2022-07-04T09:50:46.845583228Z caller=main.go:351 module=smtp_starttls target=aspmx.l.google.com:25 level=info msg="Probe succeeded" duration_seconds=0.080912196




Metrics that would have been returned:
# HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds
# TYPE probe_dns_lookup_time_seconds gauge
probe_dns_lookup_time_seconds 0.001103057

# HELP probe_duration_seconds Returns how long the probe took to complete in seconds
# TYPE probe_duration_seconds gauge
probe_duration_seconds 0.080912196

# HELP probe_failed_due_to_regex Indicates if probe failed due to regex
# TYPE probe_failed_due_to_regex gauge
probe_failed_due_to_regex 0
# HELP probe_ip_addr_hash Specifies the hash of IP address. It's useful to detect if the IP address changes.
# TYPE probe_ip_addr_hash gauge
probe_ip_addr_hash 2.766777767e+09

# HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6
# TYPE probe_ip_protocol gauge
probe_ip_protocol 6
# HELP probe_ssl_earliest_cert_expiry Returns earliest SSL cert expiry date
# TYPE probe_ssl_earliest_cert_expiry gauge
probe_ssl_earliest_cert_expiry 1.661764429e+09
# HELP probe_ssl_last_chain_expiry_timestamp_seconds Returns last SSL chain expiry in unixtime
# TYPE probe_ssl_last_chain_expiry_timestamp_seconds gauge
probe_ssl_last_chain_expiry_timestamp_seconds 1.661764429e+09

# HELP probe_success Displays whether or not the probe was a success
# TYPE probe_success gauge
probe_success 1




Module configuration:
prober: tcp
timeout: 5s
http:
    ip_protocol_fallback: true
    follow_redirects: true
    enable_http2: true

tcp:
    ip_protocol_fallback: true
    query_response:
        - expect: ^220 ([^ ]+) ESMTP (.+)$
          send: EHLO prober
        - expect: ^250-STARTTLS
        - expect: ^250 .*$
          send: STARTTLS
        - expect: ^220

          starttls: true
        - send: EHLO prober
        - expect: ^250 .*$
          send: QUIT

icmp:
    ip_protocol_fallback: true
dns:
    ip_protocol_fallback: true
    recursion_desired: true

nina guo

unread,
Jul 4, 2022, 6:14:23 AM7/4/22
to Prometheus Users

Yes in my code the first expect is quoted.

smtp_starttls:
        prober: tcp
        timeout: 5s
        tcp:
          query_response:
            - expect: "^220 ([^ ]+) ESMTP (.+)$"
            - send: "EHLO prober\r"
            - expect: "^250-STARTTLS"
            - send: "STARTTLS\r"
            - expect: "^220"
            - starttls: true
            - send: "EHLO prober\r"
            - expect: "^250-AUTH"
            - send: "QUIT\r"

Probably due to the "timeout" is too less? Should I adjust 5s to 15s or 60s?

Brian Candler

unread,
Jul 4, 2022, 7:22:08 AM7/4/22
to Prometheus Users
Could you first just try copy-pasting the working config I gave you, and see if it gives a different result to yours?

I doubt a mail server would take more than 5 seconds to give a banner, but it's possible.  Simply doing "telnet x.x.x.x 25" will show you how your mail server behaves.

nina guo

unread,
Jul 4, 2022, 7:50:47 AM7/4/22
to Prometheus Users
Still received the same error msg:

curl -g 'http://0.0.0.0:9115/probe?module=smtp_starttls&target=1.1.1.1:25&debug=true'
Logs for the probe:
ts=2022-07-04T11:47:59.071097704Z caller=main.go:320 module=smtp_starttls target= 1.1.1.1  :25 level=info msg="Beginning probe" probe=tcp timeout_seconds=5
ts=2022-07-04T11:47:59.071251831Z caller=tcp.go:40 module=smtp_starttls target=1 1.1.1.1  :25 level=info msg="Resolving target address" ip_protocol=ip6
ts=2022-07-04T11:47:59.071292289Z caller=tcp.go:40 module=smtp_starttls target= 1.1.1.1  :25 level=info msg="Resolving target address" ip_protocol=ip4
ts=2022-07-04T11:47:59.071327224Z caller=tcp.go:40 module=smtp_starttls target= 1.1.1.1  :25 level=info msg="Resolved target address" ip= 1.1.1.1
ts=2022-07-04T11:47:59.071354819Z caller=tcp.go:121 module=smtp_starttls target= 1.1.1.1  :25 level=info msg="Dialing TCP without TLS"
ts=2022-07-04T11:47:59.072469713Z caller=main.go:130 module=smtp_starttls target= 1.1.1.1  :25 level=info msg="Successfully dialed"
ts=2022-07-04T11:47:59.072508076Z caller=main.go:130 module=smtp_starttls target= 1.1.1.1  :25 level=info msg="Processing query response entry" entry_number=0
ts=2022-07-04T11:47:59.073453639Z caller=main.go:130 module=smtp_starttls target= 1.1.1.1  :25 level=debug msg="Read line" line="220  ESMTP"
ts=2022-07-04T11:48:04.072081695Z caller=main.go:130 module=smtp_starttls target= 1.1.1.1  :25 level=error msg="Error reading from connection" err="read tcp4 0.0.0.0:53872->1.1.1.1:25: i/o timeout"
ts=2022-07-04T11:48:04.072269643Z caller=main.go:320 module=smtp_starttls target= 1.1.1.1  :25 level=error msg="Probe failed" duration_seconds=5.00106792




Metrics that would have been returned:
# HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds
# TYPE probe_dns_lookup_time_seconds gauge
probe_dns_lookup_time_seconds 0.000101683

# HELP probe_duration_seconds Returns how long the probe took to complete in seconds
# TYPE probe_duration_seconds gauge
probe_duration_seconds 5.00106792

# HELP probe_failed_due_to_regex Indicates if probe failed due to regex
# TYPE probe_failed_due_to_regex gauge
probe_failed_due_to_regex 0
# HELP probe_ip_addr_hash Specifies the hash of IP address. It's useful to detect if the IP address changes.
# TYPE probe_ip_addr_hash gauge
probe_ip_addr_hash 2.493392506e+09

# HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6
# TYPE probe_ip_protocol gauge
probe_ip_protocol 4

# HELP probe_success Displays whether or not the probe was a success
# TYPE probe_success gauge
probe_success 0




Module configuration:
prober: tcp
timeout: 5s
http:
    ip_protocol_fallback: true
    follow_redirects: true
tcp:
    ip_protocol_fallback: true
    query_response:
        - expect: ^220 ([^ ]+) ESMTP (.+)$
          send: EHLO prober
        - expect: ^250-STARTTLS
        - expect: ^250 .*$
          send: STARTTLS
        - expect: ^220
          starttls: true
        - send: EHLO prober
        - expect: ^250 .*$
          send: QUIT
icmp:
    ip_protocol_fallback: true
dns:
    ip_protocol_fallback: true

Brian Candler

unread,
Jul 4, 2022, 8:44:49 AM7/4/22
to Prometheus Users
And if you try it with Google's mail server?
curl -g 'localhost:9115/probe?module=smtp_starttls&target=aspmx.l.google.com:25&debug=true'

Message has been deleted

nina guo

unread,
Jul 4, 2022, 11:11:48 PM7/4/22
to Prometheus Users
Probe failed due to TLS issue...it is not the same issue with my mail server..
There is no "i/o timeout" issue for google's mail server.

curl -g 'http://0.0.0.0:9115/probe?module=smtp_starttls&target=142.250.102.27:25&debug=true'
Logs for the probe:
ts=2022-07-05T02:04:33.373763474Z caller=main.go:320 module=smtp_starttls target=142.250.102.27:25 level=info msg="Beginning probe" probe=tcp timeout_seconds=5
ts=2022-07-05T02:04:33.373850167Z caller=tcp.go:40 module=smtp_starttls target=142.250.102.27:25 level=info msg="Resolving target address" ip_protocol=ip6
ts=2022-07-05T02:04:33.373867876Z caller=tcp.go:40 module=smtp_starttls target=142.250.102.27:25 level=info msg="Resolving target address" ip_protocol=ip4
ts=2022-07-05T02:04:33.373882263Z caller=tcp.go:40 module=smtp_starttls target=142.250.102.27:25 level=info msg="Resolved target address" ip=142.250.102.27
ts=2022-07-05T02:04:33.373903614Z caller=tcp.go:121 module=smtp_starttls target=142.250.102.27:25 level=info msg="Dialing TCP without TLS"
ts=2022-07-05T02:04:33.385587664Z caller=main.go:130 module=smtp_starttls target=142.250.102.27:25 level=info msg="Successfully dialed"
ts=2022-07-05T02:04:33.385613189Z caller=main.go:130 module=smtp_starttls target=142.250.102.27:25 level=info msg="Processing query response entry" entry_number=0
ts=2022-07-05T02:04:33.419235886Z caller=main.go:130 module=smtp_starttls target=142.250.102.27:25 level=debug msg="Read line" line="220 mx.google.com ESMTP dn22-20020a17090794d600b006dfc3945326si6666807ejc.152 - gsmtp"
ts=2022-07-05T02:04:33.419280493Z caller=main.go:130 module=smtp_starttls target=142.250.102.27:25 level=info msg="Regexp matched" regexp="^220 ([^ ]+) ESMTP (.+)$" line="220 mx.google.com ESMTP dn22-20020a17090794d600b006dfc3945326si6666807ejc.152 - gsmtp"
ts=2022-07-05T02:04:33.419310506Z caller=main.go:130 module=smtp_starttls target=142.250.102.27:25 level=debug msg="Sending line" line="EHLO prober"
ts=2022-07-05T02:04:33.419356415Z caller=main.go:130 module=smtp_starttls target=142.250.102.27:25 level=info msg="Processing query response entry" entry_number=1
ts=2022-07-05T02:04:33.43519464Z caller=main.go:130 module=smtp_starttls target=142.250.102.27:25 level=debug msg="Read line" line="250-mx.google.com at your service, [130.214.226.41]"
ts=2022-07-05T02:04:33.435229899Z caller=main.go:130 module=smtp_starttls target=142.250.102.27:25 level=debug msg="Read line" line="250-SIZE 157286400"
ts=2022-07-05T02:04:33.435242182Z caller=main.go:130 module=smtp_starttls target=142.250.102.27:25 level=debug msg="Read line" line=250-8BITMIME
ts=2022-07-05T02:04:33.435254457Z caller=main.go:130 module=smtp_starttls target=142.250.102.27:25 level=debug msg="Read line" line=250-STARTTLS
ts=2022-07-05T02:04:33.435270364Z caller=main.go:130 module=smtp_starttls target=142.250.102.27:25 level=info msg="Regexp matched" regexp=^250-STARTTLS line=250-STARTTLS
ts=2022-07-05T02:04:33.435285968Z caller=main.go:130 module=smtp_starttls target=142.250.102.27:25 level=info msg="Processing query response entry" entry_number=2
ts=2022-07-05T02:04:33.435310813Z caller=main.go:130 module=smtp_starttls target=142.250.102.27:25 level=debug msg="Read line" line=250-ENHANCEDSTATUSCODES
ts=2022-07-05T02:04:33.435323258Z caller=main.go:130 module=smtp_starttls target=142.250.102.27:25 level=debug msg="Read line" line=250-PIPELINING
ts=2022-07-05T02:04:33.435337373Z caller=main.go:130 module=smtp_starttls target=142.250.102.27:25 level=debug msg="Read line" line=250-CHUNKING
ts=2022-07-05T02:04:33.435373931Z caller=main.go:130 module=smtp_starttls target=142.250.102.27:25 level=debug msg="Read line" line="250 SMTPUTF8"
ts=2022-07-05T02:04:33.435392537Z caller=main.go:130 module=smtp_starttls target=142.250.102.27:25 level=info msg="Regexp matched" regexp="^250 .*$" line="250 SMTPUTF8"
ts=2022-07-05T02:04:33.435409638Z caller=main.go:130 module=smtp_starttls target=142.250.102.27:25 level=debug msg="Sending line" line=STARTTLS
ts=2022-07-05T02:04:33.435451742Z caller=main.go:130 module=smtp_starttls target=142.250.102.27:25 level=info msg="Processing query response entry" entry_number=3
ts=2022-07-05T02:04:33.449015201Z caller=main.go:130 module=smtp_starttls target=142.250.102.27:25 level=debug msg="Read line" line="220 2.0.0 Ready to start TLS"
ts=2022-07-05T02:04:33.449039996Z caller=main.go:130 module=smtp_starttls target=142.250.102.27:25 level=info msg="Regexp matched" regexp=^220 line="220 2.0.0 Ready to start TLS"
ts=2022-07-05T02:04:33.476697731Z caller=main.go:130 module=smtp_starttls target=142.250.102.27:25 level=error msg="TLS Handshake (client) failed" err="x509: cannot validate certificate for 142.250.102.27 because it doesn't contain any IP SANs"
ts=2022-07-05T02:04:33.47676603Z caller=main.go:320 module=smtp_starttls target=142.250.102.27:25 level=error msg="Probe failed" duration_seconds=0.102953678




Metrics that would have been returned:
# HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds
# TYPE probe_dns_lookup_time_seconds gauge
probe_dns_lookup_time_seconds 5.1452e-05

# HELP probe_duration_seconds Returns how long the probe took to complete in seconds
# TYPE probe_duration_seconds gauge
probe_duration_seconds 0.102953678

# HELP probe_failed_due_to_regex Indicates if probe failed due to regex
# TYPE probe_failed_due_to_regex gauge
probe_failed_due_to_regex 0
# HELP probe_ip_addr_hash Specifies the hash of IP address. It's useful to detect if the IP address changes.
# TYPE probe_ip_addr_hash gauge
probe_ip_addr_hash 9.60720384e+08
Message has been deleted
Message has been deleted

nina guo

unread,
Jul 5, 2022, 12:03:50 AM7/5/22
to Prometheus Users
I may find the root cause - comparing the output between google's mail server and my mail server

my mail server
ts=2022-07-05T02:29:41.608566903Z caller=main.go:130 module=smtp_starttls target=1.1.1.1:25 level=info msg="Successfully dialed"
ts=2022-07-05T02:29:41.608652361Z caller=main.go:130 module=smtp_starttls target=1.1.1.1:25 level=info msg="Processing query response entry" entry_number=0
ts=2022-07-05T02:29:41.609572861Z caller=main.go:130 module=smtp_starttls target=1.1.1.1:25 level=debug msg="Read line" line="220 mail.com ESMTP"
ts=2022-07-05T02:29:46.607798497Z caller=main.go:130 module=smtp_starttls target=11.1.1.1:25 level=error msg="Error reading from connection" err="read tcp4 0.0.0.0:57074->1.1.1.1:25: i/o timeout"

google's mail server
ts=2022-07-05T02:04:33.385587664Z caller=main.go:130 module=smtp_starttls target=142.250.102.27:25 level=info msg="Successfully dialed"
ts=2022-07-05T02:04:33.385613189Z caller=main.go:130 module=smtp_starttls target=142.250.102.27:25 level=info msg="Processing query response entry" entry_number=0
ts=2022-07-05T02:04:33.419235886Z caller=main.go:130 module=smtp_starttls target=142.250.102.27:25 level=debug msg="Read line" line="220 mx.google.com ESMTP dn22-20020a17090794d600b006dfc3945326si6666807ejc.152 - gsmtp"
ts=2022-07-05T02:04:33.419280493Z caller=main.go:130 module=smtp_starttls target=142.250.102.27:25 level=info msg="Regexp matched" regexp="^220 ([^ ]+) ESMTP (.+)$" line="220 mx.google.com ESMTP dn22-20020a17090794d600b006dfc3945326si6666807ejc.152 - gsmtp"

From the output it shows the regex is not matched for my mail server - ^220 ([^ ]+) ESMTP (.+)$

In config file
smtp_starttls:
        prober: tcp
        timeout: 20s

        tcp:
          query_response:
            - expect: "^220 ([^ ]+) ESMTP (.+)$"
              send: "EHLO prober\r"
            - expect: "^250-STARTTLS"
            - expect: "^250 .*$"
              send: "STARTTLS\r"

            - expect: "^220"
              starttls: true
            - send: "EHLO prober\r"
            - expect: "^250 .*$"
              send: "QUIT\r"
On Tuesday, July 5, 2022 at 11:40:06 AM UTC+8 nina guo wrote:
I searched this info  blackbox_exporter for SMTP monitoring w/ TCP module : PrometheusMonitoring (reddit.com) , added "\r", but still got the same issue..

On Tuesday, July 5, 2022 at 11:26:49 AM UTC+8 nina guo wrote:
telnet got succeed.

telnet 1.1.1.1 25
Trying 1.1.1.1...
Connected to 1.1.1.1.
Escape character is '^]'.
220 ... ESMTP

nina guo

unread,
Jul 5, 2022, 12:53:46 AM7/5/22
to Prometheus Users
I tried to change the first expect to ^220 ([^ ]+) ESMTP$. But I'm not sure if the change is acceptable from blackbox's view.

smtp_starttls:
        prober: tcp
        timeout: 20s
        tcp:
          query_response:
            #- expect: "^220 ([^ ]+) ESMTP (.+)$"
            - expect: "^220 ([^ ]+) ESMTP$"

              send: "EHLO prober\r"
            - expect: "^250-STARTTLS"
            - expect: "^250 .*$"
              send: "STARTTLS\r"
            - expect: "^220"
              starttls: true
            - send: "EHLO prober\r"
            - expect: "^250 .*$"
              send: "QUIT\r"


Curled and got the following error. Does it mean the mail server doesn't contain any IP SANs? And on mail server certificate cannot be validated?
ts=2022-07-05T04:18:10.539326118Z caller=main.go:130 module=smtp_starttls target=1.1.1.1:25 level=error msg="TLS Handshake (client) failed" err="x509: cannot validate certificate for 1.1.1.1 because it doesn't contain any IP SANs"
ts=2022-07-05T04:18:10.527958378Z caller=main.go:320 module=smtp_starttls target=1.1.1.1:25 level=info msg="Beginning probe" probe=tcp timeout_seconds=20
ts=2022-07-05T04:18:10.528040568Z caller=tcp.go:40 module=smtp_starttls target=1.1.1.1:25 level=info msg="Resolving target address" ip_protocol=ip6
ts=2022-07-05T04:18:10.528063021Z caller=tcp.go:40 module=smtp_starttls target=1.1.1.1:25 level=info msg="Resolving target address" ip_protocol=ip4
ts=2022-07-05T04:18:10.528076698Z caller=tcp.go:40 module=smtp_starttls target=1.1.1.1:25 level=info msg="Resolved target address" ip=1.1.1.1
ts=2022-07-05T04:18:10.528115109Z caller=tcp.go:121 module=smtp_starttls target=1.1.1.1:25 level=info msg="Dialing TCP without TLS"
ts=2022-07-05T04:18:10.529223463Z caller=main.go:130 module=smtp_starttls target=1.1.1.1:25 level=info msg="Successfully dialed"
ts=2022-07-05T04:18:10.529258037Z caller=main.go:130 module=smtp_starttls target=1.1.1.1:25 level=info msg="Processing query response entry" entry_number=0
ts=2022-07-05T04:18:10.530236839Z caller=main.go:130 module=smtp_starttls target=1.1.1.1:25 level=debug msg="Read line" line="220 mail.com ESMTP"
ts=2022-07-05T04:18:10.530278493Z caller=main.go:130 module=smtp_starttls target=1.1.1.1:25 level=info msg="Regexp matched" regexp="^220 ([^ ]+) ESMTP$" line="220 mail.com ESMTP"
ts=2022-07-05T04:18:10.53030557Z caller=main.go:130 module=smtp_starttls target=1.1.1.1:25 level=debug msg="Sending line" line="EHLO prober\r"
ts=2022-07-05T04:18:10.530348868Z caller=main.go:130 module=smtp_starttls target=1.1.1.1:25 level=info msg="Processing query response entry" entry_number=1
ts=2022-07-05T04:18:10.530919204Z caller=main.go:130 module=smtp_starttls target=1.1.1.1:25 level=debug msg="Read line" line=250-mail.com
ts=2022-07-05T04:18:10.530955761Z caller=main.go:130 module=smtp_starttls target=1.1.1.1:25 level=debug msg="Read line" line=250-PIPELINING
ts=2022-07-05T04:18:10.530971243Z caller=main.go:130 module=smtp_starttls target=1.1.1.1:25 level=debug msg="Read line" line="250-SIZE 10240000"
ts=2022-07-05T04:18:10.530985701Z caller=main.go:130 module=smtp_starttls target=1.1.1.1:25 level=debug msg="Read line" line=250-ETRN
ts=2022-07-05T04:18:10.531006196Z caller=main.go:130 module=smtp_starttls target=1.1.1.1:25 level=debug msg="Read line" line=250-STARTTLS
ts=2022-07-05T04:18:10.531025088Z caller=main.go:130 module=smtp_starttls target=1.1.1.1:25 level=info msg="Regexp matched" regexp=^250-STARTTLS line=250-STARTTLS
ts=2022-07-05T04:18:10.531045174Z caller=main.go:130 module=smtp_starttls target=1.1.1.1:25 level=info msg="Processing query response entry" entry_number=2
ts=2022-07-05T04:18:10.531064549Z caller=main.go:130 module=smtp_starttls target=1.1.1.1:25 level=debug msg="Read line" line="250-AUTH PLAIN LOGIN"
ts=2022-07-05T04:18:10.531077325Z caller=main.go:130 module=smtp_starttls target=1.1.1.1:25 level=debug msg="Read line" line=250-ENHANCEDSTATUSCODES
ts=2022-07-05T04:18:10.531088466Z caller=main.go:130 module=smtp_starttls target=1.1.1.1:25 level=debug msg="Read line" line=250-8BITMIME
ts=2022-07-05T04:18:10.53110099Z caller=main.go:130 module=smtp_starttls target=1.1.1.1:25 level=debug msg="Read line" line="250 DSN"
ts=2022-07-05T04:18:10.531134286Z caller=main.go:130 module=smtp_starttls target=1.1.1.1:25 level=info msg="Regexp matched" regexp="^250 .*$" line="250 DSN"
ts=2022-07-05T04:18:10.531186418Z caller=main.go:130 module=smtp_starttls target=1.1.1.1:25 level=debug msg="Sending line" line="STARTTLS\r"
ts=2022-07-05T04:18:10.531234743Z caller=main.go:130 module=smtp_starttls target=1.1.1.1:25 level=info msg="Processing query response entry" entry_number=3
ts=2022-07-05T04:18:10.531762621Z caller=main.go:130 module=smtp_starttls target=1.1.1.1:25 level=debug msg="Read line" line="220 2.0.0 Ready to start TLS"
ts=2022-07-05T04:18:10.531779841Z caller=main.go:130 module=smtp_starttls target=1.1.1.1:25 level=info msg="Regexp matched" regexp=^220 line="220 2.0.0 Ready to start TLS"
ts=2022-07-05T04:18:10.539326118Z caller=main.go:130 module=smtp_starttls target=1.1.1.1:25 level=error msg="TLS Handshake (client) failed" err="x509: cannot validate certificate for 1.1.1.1 because it doesn't contain any IP SANs"
ts=2022-07-05T04:18:10.53937693Z caller=main.go:320 module=smtp_starttls target=1.1.1.1:25 level=error msg="Probe failed" duration_seconds=0.011375261
Message has been deleted

Brian Candler

unread,
Jul 5, 2022, 3:58:09 AM7/5/22
to Prometheus Users
Problem 1: timeout / regexp mismatch

> ts=2022-07-05T02:29:41.609572861Z caller=main.go:130 module=smtp_starttls target=1.1.1.1:25 level=debug msg="Read line" line="220 mail.com ESMTP"
> ts=2022-07-05T02:29:46.607798497Z caller=main.go:130 module=smtp_starttls target=11.1.1.1:25 level=error msg="Error reading from connection" err="read tcp4 0.0.0.0:57074->1.1.1.1:25: i/o timeout"

You asked it to match "^220 ([^ ]+) ESMTP (.+)$" which means:

- "220"
- <space>
- 1 or more <non-space>
- <space>
- "ESMTP"
- <space>
- 1 or more <any-character>
- end of line

But in the banner you got back from the target host, there's nothing after the word "ESMTP": the line ends there immediately.  Hence you were right to change the regexp.

"I'm not sure if the change is acceptable from blackbox's view"  - Blackbox will do whatever you ask it to do.  If you want to match both cases (i.e. mail servers which do and don't put any text after "ESMTP"), then use a regexp which does that:

"^220 ([^ ]+) ESMTP( .+)?$"

or

"^220 ([^ ]+) ESMTP.*$"

or even just

"^220 ([^ ]+) ESMTP"

(no need to match all the way to the end of the line)

Problem 2: TLS handshake failed

* You asked Blackbox Exporter to connect to a host by IP address (1.1.1.1), instead of using a DNS name (smtp.example.com)
* Therefore, Blackbox Exporter cannot verify the certificate unless the certificate itself contains an IP address

It's the same as if you put https://1.1.1.1/ in your web browser.  There's no way the browser can validate the certificate, unless the certificate itself contains IP address "1.1.1.1".  And that is a very uncommon sort of certificate.

Please, try *exactly* the command I gave you before:
curl -g 'localhost:9115/probe?module=smtp_starttls&target=aspmx.l.google.com:25&debug=true'

No changes. Copy-paste exactly that, with the hostname "aspmx.l.google.com" in it.  You should find that it verifies correctly, as long as the ESMTP regexp matches OK.

Then to get it to work with your own mail server, you will need to do exactly the same:
* Refer to your mail server by its DNS name, not IP address, when asking blackbox_exporter to test it
* The mail server must have a certificate which contains this name [^1]
* The mail server's certificate must be signed by a certificate authority that blackbox_exporter trusts [^2]
* The mail server's certificate must not have expired, i.e. the current time is within its notBefore and notAfter times

[^1] If you still want to connect to the server by IP address but check for a different name in the certificate, then use the "server_name" setting in tls_config: see https://prometheus.io/docs/prometheus/latest/configuration/configuration/#tls_config

[^2] If the certificate authority that signed your mail server's certificate is not in the system trust store then use the "ca_file" setting in tls_config: see https://github.com/prometheus/blackbox_exporter/blob/v0.21.1/example.yml#L53-L54

Or you can give up, and use "insecure_skip_verify: true" to disable certificate validation altogether: https://github.com/prometheus/blackbox_exporter/blob/v0.21.1/example.yml#L27-L28
(although then you lose valuable tests, because if your certificate was valid and becomes invalid you won't know.  I also note that if you can't get blackbox_exporter to validate the certificate, then your clients probably won't be able to validate it either)

At this point, it isn't really an issue with blackbox_exporter any more: it's a case of understanding how certificates and certificate validation work.  

nina guo

unread,
Jul 5, 2022, 4:09:08 AM7/5/22
to Prometheus Users
Thank you very much.

I changed to connect with DNS name. So according to the error msg, following your kind suggestion, I need to go with [^2], am I right?

ts=2022-07-05T07:32:17.495596017Z caller=main.go:130 module=smtp_starttls target=mailserver:25 level=error msg="TLS Handshake (client) failed" err="x509: certificate signed by unknown authority"

nina guo

unread,
Jul 5, 2022, 4:21:23 AM7/5/22
to Prometheus Users
Added as below:

smtp_starttls:
        prober: tcp
        timeout: 20s
        tcp:
          tls_config:  # add here?
              ca_file: "/certs/my_cert.crt" # add here?

          query_response:
            - expect: "^220 ([^ ]+) ESMTP"
              send: "EHLO prober\r"
            - expect: "^250-STARTTLS"
            - expect: "^250 .*$"
              send: "STARTTLS\r"
            - expect: "^220"
              starttls: true
            - send: "EHLO prober\r"
            - expect: "^250 .*$"
              send: "QUIT\r"

My blackbox is running as a pod in a k8s cluster. Do you have good suggestion to store the cert file?

Brian Candler

unread,
Jul 5, 2022, 5:10:24 AM7/5/22
to Prometheus Users
> My blackbox is running as a pod in a k8s cluster. Do you have good suggestion to store the cert file?

A configMap is how I'd do it.

nina guo

unread,
Jul 5, 2022, 5:20:28 AM7/5/22
to Prometheus Users
Thank you Brian for your great help.
Reply all
Reply to author
Forward
Message has been deleted
Message has been deleted
0 new messages