Peculiar stream behaviour

30 views
Skip to first unread message

Nickolaos Kas

unread,
Oct 20, 2022, 1:37:26 PM10/20/22
to Riemann Users
Hello again,
I have a peculiar issue again, which sounds more like a riemann issue rather than clojure, but I could be wrong.
I am trying to get a library of functions to help people to create new alerts more straight forward. I seem to have an issue with an if inside my code, which I don't understand. I am pasting the code below, and I have isolated the issue in the if condition statement highlighted in red.

(defn alert_performance
   [event service_list keyword email_list window_duration threshold rate_threshold rate_deadline]
   (project* (vec (map (fn [metric] (fn [e] (= (:service e) metric) )) service_list))
      (smap folds/sum
         (coalesce 3
            (smap flatten_one_vector
               (moving-time-window window_duration
                  (where (> (count event) (- (* 3/12 window_duration) 1))
                     (smap flatten_multiple_vectors
                        #(info (type rate_threshold) (type 0) %)
                        (if (> rate_threshold 0)
                           (alert_threshold_with_rate event keyword threshold email_list rate_threshold rate_deadline)
                           (alert_threshold event keyword threshold email_list)

                        )))))))))

(defn performance_alerts
   [event]
   (by :host
      (alert_performance event
            ["memory/percent-used" "memory/percent-slab_unrecl"]
            "Memory"
            default_mailist
            60 95 70 20
      )))

The code above gives me this error message
WARN [2022-10-20 19:24:35,537] riemann task 2 - riemann.streams - renesas.etc.alerts$alert_threshold_specific$stream__9318__auto____1176@24a271d5 threw
java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Number

If I put true or false as the if condition then the code executed normally. The info I have placed in green, above the if, gives the following message which clearly identifies both   rate_threshold and 0 as java.lang.Long (highlighted in bold)
INFO [2022-10-20 19:24:35,535] riemann task 2 - renesas.etc.alerts - java.lang.Long java.lang.Long #riemann.codec.Event{:host ree-du1rws3, :service memory/percent-used, :state warning, :description nil, :metric 89.20038750319594, :tags [collectd vm etx], :time 1666286669, :ttl 6.0, :plugin memory, :type percent, :ds_name value, :type_instance used, :ds_type gauge, :ds_index 0, :rate -1.6856476024429412E-4}

I am sure the problem must be something simple I don't see it and I need an extra pair of eyes. Any help is greatly appreciated

Sanel Zukan

unread,
Oct 20, 2022, 2:36:47 PM10/20/22
to Riemann Users
Pasted error points to "alert_threshold_specific" function, which isn't
shown here. Also, you'll need to provide more context, like how it is
used within alert_* functions and show the full stacktrace, if possible.

Best,
Sanel
> If I put *true* or *false* as the if condition then the code executed
> normally. The info I have placed in green, above the if, gives the
> following message which clearly identifies both *rate_threshold *and *0*
> as java.lang.Long (highlighted in bold)
> INFO [2022-10-20 19:24:35,535] riemann task 2 - renesas.etc.alerts - *java.lang.Long
> java.lang.Long* #riemann.codec.Event{:host ree-du1rws3, :service
> memory/percent-used, :state warning, :description nil, :metric
> 89.20038750319594, :tags [collectd vm etx], :time 1666286669, :ttl 6.0,
> :plugin memory, :type percent, :ds_name value, :type_instance used,
> :ds_type gauge, :ds_index 0, :rate -1.6856476024429412E-4}
>
> I am sure the problem must be something simple I don't see it and I need an
> extra pair of eyes. Any help is greatly appreciated
>
> --
> You received this message because you are subscribed to the Google Groups "Riemann Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to riemann-user...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/riemann-users/898e55cc-d405-4aa0-85d1-382397c1c717n%40googlegroups.com.

Nickolaos Kas

unread,
Oct 20, 2022, 5:12:37 PM10/20/22
to Riemann Users
I changed the name of alert_threshold_specific to alert_performance just to declutter the code and concentrate on the more important parts
In practise I have another function in-between performance_alerts and alert_performance which is a multy-arity function calling alert_performance with a few default value combinations. For example there is a version that takes default values for various threshold and rates. I  can paste the full unmodified code tomorrow if you wish and the stack trace of the error

Nickolaos Kas

unread,
Oct 21, 2022, 8:02:24 AM10/21/22
to Riemann Users
I actually found the issue the downstream function of the else had a parameter the wrong way round. The two parameters highligted were the wrong way round, and thus a string (keyword) was compared with a number. That misled me into thinking the problem was in the if, and when I put '(if true ...)' it actually worked.

(defn alert_threshold
   ([event keyword]                      (alert_threshold_specific event 92 keyword default_mailist false 0 0))
   ([event keyword email_list]           (alert_threshold_specific event 92 keyword email_list false 0 0))
   ([event keyword threshold email_list] (alert_threshold_specific event threshold keyword email_list false 0 0))
)

At least I improved my understanding on reading the stack trace which was quite scary until now
Thanks for the help.

Sanel Zukan

unread,
Oct 21, 2022, 9:47:36 AM10/21/22
to Riemann Users
Great work!

This is the reason why, for multi-arity functions, it is a good practice
too keep the order of parameters, otherwise things can get messy. Even
better if you can express function through itself with default values,
e.g.:

(defn alert_threshold
([event keyword]
(alert_threshold event keyword default_mailist 92))

([event keyword email_list]
(alert_threshold event keyword email_list 92))

([event keyword email_list threshold]
(alert_threshold_specific event threshold keyword email_list false 0 0)))

If you want to make it even more flexible, use a map and fallback to
default values through keyword arguments [1]:

(defn alert_threshold [args]
(let [{:keys [event keyword threshold email_list]
:or {threshold 92, email_list default_mailist}}] ; default values
(alert_threshold_specific event threshold keyword email_list false 0 0)))

And call it with:

(alert_threshold {:event event, :keyword "xxx", :threshold 92})

Best,
Sanel

[1] https://clojure.org/guides/destructuring#_keyword_arguments

Nickolaos Kas <redh...@gmail.com> writes:
> I actually found the issue the downstream function of the *else *had a
> parameter the wrong way round. The two parameters highligted were the wrong
> way round, and thus a string (keyword) was compared with a number. That
> misled me into thinking the problem was in the if, and when I put '(if true
> ...)' it actually worked.
>
> (defn alert_threshold
> ([event keyword] (alert_threshold_specific event 92
> keyword default_mailist false 0 0))
> ([event keyword email_list] (alert_threshold_specific event 92
> keyword email_list false 0 0))
> ([event keyword threshold email_list] (alert_threshold_specific event threshold
> keyword email_list false 0 0))
> )
>
> At least I improved my understanding on reading the stack trace which was
> quite scary until now
> Thanks for the help.
>
> On Thursday, 20 October 2022 at 23:12:37 UTC+2 Nickolaos Kas wrote:
>
>> I changed the name of *alert_threshold_specific* to *alert_performance*
> To view this discussion on the web visit https://groups.google.com/d/msgid/riemann-users/0e922b1c-2ca7-4f16-830c-d77a43b3b7f1n%40googlegroups.com.

Nickolaos Kas

unread,
Oct 21, 2022, 11:48:30 AM10/21/22
to Riemann Users
Thanks for the reply.
I am not sure I fully understand the 2nd example of how to replace the multi-arity functions. I am guessing I have to somehow use the args  parameter and use a map to pass this function. I understand the first example though, and already incorporated to the code

I have another issue now that is quite weird. It happens downstream. I am pasting the full code with no changes, I had to add a 'dummy' (where true ...) statement in alert_threshold_specific to be able to print the current variables and event. Line 172 which is the last in the stack trace has been highligted in red

(defn alert_threshold_specific
   [event threshold keyword email_list rate_threshold rate_deadline]
   (where true
   #(info "alert_threshold_specific" threshold keyword email_list rate_threshold rate_deadline %)
   (where (> (:metric event) threshold)
      (with {:service keyword
             :description (str keyword " at critical level")}
         (throttle 1 43200
            #(warn (str keyword " usage above " threshold) %)
            (email email_list))
      )
      (else
         (when (> rate_threshold 0)
            (alert_rate event rate_threshold rate_deadline keyword email_list))))))

(defn alert_threshold_with_rate
   ([event keyword]                                                   (alert_threshold_with_rate event 92 keyword default_mailist 0 0))
   ([event keyword email_list]                                        (alert_threshold_with_rate event 92 keyword email_list 0 0))
   ([event keyword threshold email_list]                              (alert_threshold_with_rate event threshold keyword email_list 0 0))
   ([event keyword threshold email_list rate_threshold rate_deadline] (alert_threshold_specific  event threshold keyword email_list rate_threshold rate_deadline)))


(defn alert_performance_specific
   [event tag service_list keyword email_list window_duration threshold rate_threshold rate_deadline]

   (project* (vec (map (fn [metric] (fn [e] (= (:service e) metric) )) service_list))
      (smap folds/sum
         (coalesce 3
            (smap flatten_one_vector
               (moving-time-window window_duration
                  ;Note for the 0.25 * window_duration. We need to wait fill the window before generating alerts. We are looking for 75%
                  ;of the window events to be present. Since we receive a new event every 3 seconds, the factor is 0.75/3
                  (where (>= (count event) (* 0.25 window_duration) )
                     (smap flatten_multiple_vectors
                        (alert_threshold_with_rate event keyword threshold email_list rate_threshold rate_deadline)))))))))


Here is the error with the stack trace. It also includes the info printout before the if statement. Unless I read something wrong there is no nill in there.

INFO [2022-10-21 17:10:35,477] riemann task 3 - renesas.etc.alerts - alert_threshold_specific 30 Memory (bob.the...@example.com) 0 0 #riemann.codec.Event{:host ree-du1rws3, :service memory/percent-buffered, :state ok, :description nil, :metric 1.1666448619992935, :tags [collectd vm etx], :time 1666365026, :ttl 6.0, :plugin memory, :type percent, :ds_name value, :type_instance buffered, :ds_type gauge, :ds_index 0, :rate 0.5132878512984761}
WARN [2022-10-21 17:10:35,480] riemann task 3 - renesas.etc.alerts -  threw
java.lang.NullPointerException: null
    at renesas.etc.alerts$alert_threshold_specific$stream__9318__auto____1176$fn__1205.invoke(alerts.clj:172)
    at renesas.etc.alerts$alert_threshold_specific$stream__9318__auto____1176.invoke(alerts.clj:172)
    at renesas.etc.alerts$alert_threshold_specific$stream__9318__auto____1235$fn__1240.invoke(alerts.clj:170)
    at renesas.etc.alerts$alert_threshold_specific$stream__9318__auto____1235.invoke(alerts.clj:170)
    at riemann.streams$smap$stream__7207$fn__7222.invoke(streams.clj:175)
    at riemann.streams$smap$stream__7207.invoke(streams.clj:175)
    at renesas.etc.alerts$alert_performance_specific$stream__9318__auto____1304$fn__1309.invoke(alerts.clj:205)
    at renesas.etc.alerts$alert_performance_specific$stream__9318__auto____1304.invoke(alerts.clj:205)
    at riemann.streams$moving_time_window$stream__7565$fn__7588.invoke(streams.clj:353)
    at riemann.streams$moving_time_window$stream__7565.invoke(streams.clj:353)
    at riemann.streams$smap$stream__7207$fn__7222.invoke(streams.clj:175)
    at riemann.streams$smap$stream__7207.invoke(streams.clj:175)
    at riemann.streams$coalesce$callback__8658$fn__8677.invoke(streams.clj:1237)
    at riemann.streams$coalesce$callback__8658.invoke(streams.clj:1237)
    at riemann.streams$periodically_until_expired$wrapper__7695.invoke(streams.clj:515)
    at riemann.time.Every.run(time.clj:55)
    at riemann.time$run_tasks_BANG_$fn__5402$fn__5403.invoke(time.clj:154)
    at riemann.time$run_tasks_BANG_$fn__5402.invoke(time.clj:153)
    at riemann.time$run_tasks_BANG_.invokeStatic(time.clj:147)
    at riemann.time$run_tasks_BANG_.invoke(time.clj:142)
    at riemann.time$start_BANG_$fn__5422$fn__5423.invoke(time.clj:193)
    at clojure.lang.AFn.applyToHelper(AFn.java:152)
    at clojure.lang.AFn.applyTo(AFn.java:144)
    at clojure.core$apply.invokeStatic(core.clj:657)
    at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1965)
    at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1965)
    at clojure.lang.RestFn.invoke(RestFn.java:425)
    at clojure.lang.AFn.applyToHelper(AFn.java:156)
    at clojure.lang.RestFn.applyTo(RestFn.java:132)
    at clojure.core$apply.invokeStatic(core.clj:661)
    at clojure.core$bound_fn_STAR_$fn__5471.doInvoke(core.clj:1995)
    at clojure.lang.RestFn.invoke(RestFn.java:397)
    at clojure.lang.AFn.run(AFn.java:22)
    at java.lang.Thread.run(Thread.java:750)

Is there a better way to find which symbol has a null value other than the stack trace?

Nickolaos Kas

unread,
Oct 22, 2022, 4:27:15 PM10/22/22
to Riemann Users
I have actually figured the it out, and the issue was 8 lines down inside the else of that where
    (else
         (when (> rate_threshold 0)
            (alert_rate event rate_threshold rate_deadline keyword email_list))))))

changing the second line from
(when (> rate_threshold 0)
to
(where (> rate_threshold 0)
did the trick.

I am not quite sure why this was the issue though. If I am not mistaken riemann builds the streams when it first runs and then passes the events through those streams. The when/where statement is static given rate_threshold is defined as a constant in the configuration, so there must be something null on the next statement which is the alert_rate function. Given I was printing the event and parameters with an info and they appear before the stack trace, I am puzzled as to what may be the source of the null.

Thanks for the help, and if anyone can clarify the above I would really appreciate it

Sanel Zukan

unread,
Oct 26, 2022, 2:30:04 PM10/26/22
to Riemann Users
The issue here is that you are mixing "when" and "where". "when" is a
Clojure construct that will evaluate block only when expression is true,
but "where" is a Riemann construct that will create a function which
will *call children streams* with approriate event, when expression is
true.

For example:

(when (> rate_threshold 0)
(alert_rate ...))

The code above will call (alert_rate) that will return a function. It
will do nothing then. However:

(where (> rate_threshold 0)
(alert_rate ...))

will call (alert_rate) and call a function it returns with
an event. "where" is a magic macro that will rewrite the code and use
placeholders like "event" or "metric" and fill it with event values. See
documentation [1].

Also, "where" will look for "else" expression, but "when" will not.

In short, you should never mix "when" and "where", unless you know what you
are doing. "when" should go inside Clojure functions, and "where" inside
Riemann DSL code. For example:

;; A less magic 'where' that accepts a function but does not rewrite the code
(where* (fn [e]
(when (> (:metric e) 100)
true))
#(info "test:" %))

;; is the same as
(when (> metric 100)
#(info "test:" %))

[1] http://riemann.io/api/riemann.streams.html#var-where

Best,
Sanel


Nickolaos Kas <redh...@gmail.com> writes:
> I have actually figured the it out, and the issue was 8 lines down inside
> the *else* of that *where*
>
>
>
> * (else (when (> rate_threshold 0) (alert_rate event
> rate_threshold rate_deadline keyword email_list))))))*
> changing the second line from
> (when (> rate_threshold 0)
> to
> (where (> rate_threshold 0)
> did the trick*.*

Nickolaos Kas

unread,
Oct 29, 2022, 3:43:26 PM10/29/22
to Riemann Users
Thank you for the info. It makes more sense now
Reply all
Reply to author
Forward
0 new messages