Meaning of exception in Riemann

20 views
Skip to first unread message

Daniel MB

unread,
Jul 18, 2018, 4:31:37 AM7/18/18
to Riemann Users
Hi all,

Being myself not a coder, I was wondering if the community could help me in determining a potential root cause of the following exception and which means I could use to correct it.

My understanding from the exception description is that Riemann (0.2.14) is getting short of resources to process all events. In particular, what the exception says is that 128 out of 128 threads are used and therefore, tasks are being queued.


WARN [2018-07-18 09:55:57,239] riemann task 2 - riemann.streams - riemann.streams$execute_on$stream__7093@7d108f6e threw
java.util.concurrent.RejectedExecutionException: Task clojure.core$bound_fn_STAR_$fn__4671@e11791e rejected from java.util.concurrent.ThreadPoolExecutor@441d62aa[Running, pool size = 128, active threads = 128, queued tasks = 10000, completed tasks = 716]
        at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
        at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
        at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
        at riemann.service.ExecutorServiceService.execute(service.clj:173)
        at riemann.streams$execute_on$stream__7093.invoke(streams.clj:266)
        at riemann.streams$batch$flush__7957$fn__7969.invoke(streams.clj:1154)
        at riemann.streams$batch$flush__7957.invoke(streams.clj:1154)
        at riemann.streams$part_time_simple$tick__7301.invoke(streams.clj:610)
        at riemann.time.Once.run(time.clj:42)
        at riemann.time$run_tasks_BANG_$fn__5370$fn__5371.invoke(time.clj:154)
        at riemann.time$run_tasks_BANG_$fn__5370.invoke(time.clj:153)
        at riemann.time$run_tasks_BANG_.invokeStatic(time.clj:147)
        at riemann.time$run_tasks_BANG_.invoke(time.clj:142)
        at riemann.time$start_BANG_$fn__5386$fn__5387.invoke(time.clj:189)
        at clojure.lang.AFn.applyToHelper(AFn.java:152)
        at clojure.lang.AFn.applyTo(AFn.java:144)
        at clojure.core$apply.invokeStatic(core.clj:646)
        at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1881)
        at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1881)
        at clojure.lang.RestFn.invoke(RestFn.java:425)
        at clojure.lang.AFn.applyToHelper(AFn.java:156)
        at clojure.lang.RestFn.applyTo(RestFn.java:132)
        at clojure.core$apply.invokeStatic(core.clj:650)
        at clojure.core$bound_fn_STAR_$fn__4671.doInvoke(core.clj:1911)
        at clojure.lang.RestFn.invoke(RestFn.java:397)
        at clojure.lang.AFn.run(AFn.java:22)
        at java.lang.Thread.run(Thread.java:745)


The question I would have is if there is a way to locate which stream is the one complaining. In my config file there are several streams and I am not sure which is the one that is complaining so that I can try to increase the assigned threadpool. Of course I can iterate on all, but I was hoping not to do things in the dark.

Thanks everyone,
Daniel

aphyr

unread,
Jul 18, 2018, 8:45:06 AM7/18/18
to rieman...@googlegroups.com
I think this might involve an async-queue in your config, in which case there ought to be metrics from riemann itself indicating the status of each queue! Or, if you look at the stacktrace, you might recognize some of the streams being called--theres a batch, for instance.

--
You received this message because you are subscribed to the Google Groups "Riemann Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to riemann-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Daniel MB

unread,
Jul 24, 2018, 5:52:09 AM7/24/18
to Riemann Users
Thank you for the reply!  I did not notice the "batch" found in the stack trace, thanks for that!

You're right, I have a couple of places in my config file where a batch stream calls to an async queue like this:

 ; Forward events to downstream Riemann
 
(batch 1000 1 fw2riemann))

where fw2riemann has been defined like this:

; forward to riemann without filtering
     fw2riemann
(async-queue!
                 
:fw2
                 
{:queue-size 20000                       ; 20000 events max
                   
:core-pool-size 4                       ; Minimum 4 threads
                   
:max-pools-size 100}                    ; Maximum 100 threads
                 
(forward
                   
(riemann.client/tcp-client :host "172.21.8.93" :port 5555)))


According yo what I understand,from here the reason why the queue is spilling over is because there are too many un-acknowledged events and therefore the queue fills up. So this would point out to a problem to the other end (a downstream riemann) not being capable of sending the correspondent ACKs.

I will dig a bit more if I find anything in the other end. Possibly one of the reasons could be the misalignment in sending connection threads to listening threads that make the receiver unable to process the amount of information sent.

Thanks a lot once again!

Fabien Wernli

unread,
Jul 26, 2018, 3:03:08 AM7/26/18
to Riemann Users
Let me point out that if the batch stream processes more than 1000 events per second, your queue will fill after 20 million events (20'000 batches of 1000 events). This seems rather a lot to me (20GB of memory if one event is 1kB).
Reply all
Reply to author
Forward
0 new messages