merging expired events

9 views
Skip to first unread message

Nickolaos Kas

unread,
Sep 2, 2022, 11:18:15 AM9/2/22
to Riemann Users

Hello everyone,
I am monitoring various services using the process and threshold modules. Every server is checking the collectd process and in many of them we are checking other processes like docker, or carbon.cache etc.
For alerting purposes when I see a process count_ps event expiring I send an alert. The annoyance is that if a system is monitoring multiple processes and it crashes, then multiple email alerts are generated.
I was wondering what a good strategy would be to merge those event and send a single alert of those multiple processes expiring. I am not sure if I could somehow delay for a short period and then somehow gather all the expired events of this period and send the alert.
I am already using rollup to send up to 5 emails in an hour, so I was hoping to use some other mechanism.

Thanks
Nick

ap...@aphyr.com

unread,
Sep 2, 2022, 11:59:48 AM9/2/22
to rieman...@googlegroups.com
Yeah, this is just what rollup is for! Use it twice. :-)

--
You received this message because you are subscribed to the Google Groups "Riemann Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to riemann-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/riemann-users/03161992-9ee9-4d8b-bc87-69c7cde064d8n%40googlegroups.com.

Nickolaos Kas

unread,
Sep 5, 2022, 8:22:29 AM9/5/22
to Riemann Users
(rollup 0 15 <children>)
does the trick. For some reason I thought that if the first number was 0, it would keep rolling events up forever

Thanks for the help

Nickolaos Kas

unread,
Sep 6, 2022, 6:02:31 AM9/6/22
to Riemann Users
Ok there is another complication I am trying to work out, in relation to this issue.

I have changed the code to merge the events coming from the same host using the following
(by :host
   (rollup 0 15
      (smap (fn [events] (reduce merge_events (peek events) (pop events)))
         #(info %)
      )
   )
)

The merge_events function used in the reduce is shown below for reference
(defn merge_events
   [e1 e2]
   (assoc e1 :service (str (:service e1) "\n\t\t" (:service e2))
              :metric (str (:metric e1) "\n\t\t" (:metric e2))
    )
)

The problem I have now is that I have broken the streams per host using the '(by :hosts ...)' function and I would like to merge those streams together. The first rollup as shown above is used to merge the events from a single host. I would like to use the second rollup to limit the alerts I receive by email, so somehow I need to merge the streams from all hosts, and apply the rollup there. Any suggestion on how this can be achieved?

Thanks
Nick

Nickolaos Kas

unread,
Sep 6, 2022, 8:04:35 AM9/6/22
to Riemann Users
Ok I finally managed to figure it out with the help of the official howto.
I modified the code with a local aggregate symbol in a let, and all the messages go in the rollup in that let

(let [aggregate (rollup 2 1800 (email "mye...@test.com"))]
   (by :host
      (rollup 0 14

         (smap (fn [events] (reduce merge_events (peek events) (pop events)))
            aggregate
         )
      )
   )
)
Reply all
Reply to author
Forward
0 new messages