tcp-client never seems to recover from a server crash

155 views
Skip to first unread message

Fabien Wernli

unread,
Nov 9, 2023, 12:59:44 PM11/9/23
to Riemann Users
Hi again,

I'm tracking down a problem in my monitoring stack.
The problem is that forwarding events over to another riemann on the network
seems to fail after a server crash, and never to recover.

I made a minimal config to show the problem using a docker-compose file.
You can find it here:


There's also a README with instructions.
It boils down to : if riemann A sends to riemann B using forward tcp-client and riemann B dies and comes back, riemann A never recovers, even if using an async-queue.

Thanks so much in advance if you can have a look !

Fabien Wernli

unread,
Nov 10, 2023, 6:33:50 AM11/10/23
to Riemann Users
It seems the `forward` stream doesn't ever timeout if I understand its source code correctly.
So I'm now experimenting with the following alternative:

; forward with timeout
(defn forward-timeout
  "Sends an event or a collection of events through a Riemann client. Times out after :timeout milliseconds."
  [client & {:keys [timeout] :or {timeout 5000}}]
  (fn stream [es]
    (if (map? es)
      (deref (riemann.client/send-event client es) timeout ::timeout)
      (deref (riemann.client/send-events client es) timeout ::timeout))))

Fabien Wernli

unread,
Nov 10, 2023, 7:03:16 AM11/10/23
to Riemann Users
The good news is that the upstream riemann now recovers from a failed destination.
The bad news is that it doesn't queue messages, but why?

Fabien Wernli

unread,
Jan 18, 2024, 9:26:33 AMJan 18
to Riemann Users
If anyone has some time to reproduce my docker compose config, I'd be grateful !
Reply all
Reply to author
Forward
0 new messages