Notifier does not seem to be detecting 200-response from the webhook endpoint resulting in continuous retries

44 views
Skip to first unread message

Christopher Parsons

unread,
Feb 1, 2021, 6:45:43 PM2/1/21
to Prometheus Users
Hello,

I have a third party tool that is listening for incoming HTTP messages on a specific port.  When we point AlertManager to this endpoint, notifications are delivered and consumed by the third party tool and inserted into our monitoring database successfully.  However, AlertManager continuously retries the notification and slowly backs off on the retries.

In order to try to figure out what's going on, we set up packet captures on AlertManager server and one on the endpoint server.  As a control group, we also configured a simple node.js server which responds back with a 200.  The only difference, as far as I can tell, from the TCP streams in Wireshark is the response headers.  The server forcing AlertManager to continuously retry replies with a simple "HTTP/1.1 200 OK".  However, the node.js control group which does not force AlertManager to send subsequent replies is responding with something like...

HTTP/1.1 200 OK

Date: Mon, 01 Feb 2021 22:56:09 GMT

Connection: keep-alive

Transfer-Encoding: chunked

Does the go-lang web server expect more than a simple "HTTP/1.1 200 OK" as a response?  The third party tool is not open source but I may be able to get them to issue a future update if this is considered unexpected behavior.  Any suggestions or workarounds would be greatly appreciated.  I've also attached a couple of screenshots from the Wireshark TCP stream traces for reference.

Normal behavior:

logserver normal behavior.PNG

Repeat notification behavior:
mon101 repeat behavior.PNG

Please let me know if there are any other details I can provide!

Thanks,
Chris

Julius Volz

unread,
Feb 2, 2021, 4:25:13 AM2/2/21
to Christopher Parsons, Prometheus Users
Hi Christopher,

It's not clear from your Wireshark stream, maybe show the data as hex instead: does the problematic server insert an empty line after the "HTTP/1.1 200 OK" line and then terminate the connection? So a CRLF after the 200 line and another CRLF for the empty line. This is required for the response to be read as a valid HTTP response (the body is optional).

Julius

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/014b62ad-72bb-41af-b794-f3bbe61bf705n%40googlegroups.com.


--
Julius Volz
PromLabs - promlabs.com

Christopher Parsons

unread,
Feb 2, 2021, 9:35:05 AM2/2/21
to Prometheus Users
It appears there's just a single CRLF (0d 0a) after the 200 OK line.  Are you saying there needs to be another one in order for AlertManager to detect a valid 200 response?

mon101 response with hex.PNG

Julius Volz

unread,
Feb 2, 2021, 5:46:05 PM2/2/21
to Christopher Parsons, Prometheus Users
Yep, exactly, you need two newlines (2 times 0x0d0a) to generate a completely empty line, which HTTP uses to signal that the header section is over.

Christopher Parsons

unread,
Feb 2, 2021, 6:02:37 PM2/2/21
to Prometheus Users
Thanks Julius!  Is one newline character ever considered valid? I'm wondering if my next steps are to report this as a bug/feature request for the AlertManager project to support single newline responses or if I need to work with the third party vendor to update how their server is responding.  I am pretty sure the service is just using standard Perl packages to handle the incoming HTTP messages so I'm not sure why they would ignore the 2nd newline character.

Julius Volz

unread,
Feb 2, 2021, 6:29:02 PM2/2/21
to Christopher Parsons, Prometheus Users
If it's really just emitting a single newline, then that is indeed an incorrect HTTP response by your third-party vendor and not a problem in Alertmanager. As far as Alertmanager is concerned, it's simply waiting for the headers to continue.

Christopher Parsons

unread,
Feb 2, 2021, 8:59:35 PM2/2/21
to Julius Volz, Prometheus Users
Great, thanks!  In parallel, I opened a ticket with the vendor to report the findings and they referenced a future update which I've not yet moved to where this behavior may be corrected.  I am going to run some tests and see where that gets me.

Thanks for the speedy replies everyone!  I was racking my brain with this one, it's nice to have some closure!
Reply all
Reply to author
Forward
0 new messages