On 03/06/2013 02:35 AM, Jonas Tehler wrote:
> INFO [2013-03-06 10:16:10,768] Thread-7 - riemann.config - lb-75 nil
> load
>
> WARN [2013-03-06 10:16:10,769] Thread-7 - riemann.streams -
> riemann.streams$smap$stream__4498@94a3f4b threw
> java.lang.ArithmeticException: Divide by zero
>
> The value for the events somehow is nil which leads to the divide by
> zero error.
Good catch; this happens because there were zero events in a window, and
the mean of zero events is undefined. There's nothing that (folds/mean)
can do in this case, but I can suppress the error message. I'll add a
ticket to do that.
> Other metrics with lower number of events per second sends to librato
> without problem (a few events every five seconds). I'm using the
> latest Riemann from github.
This is probably a consequence of backpressure. Riemann's TCP protocol
is synchronous; it won't return an ack for a given event until it's been
processed by all streams. The Librato Metrics stream pushes events
synchronously to Librato, which means:
1. You're *guaranteed* by the time the client receives an ack that your
event has actually gone to Librato.
2. Clients won't overload Riemann; the whole system slows down to avoid
backing up queues.
2. Your client has to wait for the round-trip latency (TCP handshake +
HTTP req) to Librato.
One option is just to add more clients, but eventually you'll back up
behind the netty stream executor pool. The right thing to do for
improved throughput is to defer events asynchronously to a
linkedblockingqueue and threadpool executor like so:
(let [librato-gauge (async-queue! :librato-gauge
{:queue-size 1000
:core-pool-size 10
:max-pool-size 50}
(librato :gauge))]
(streams
(librato-gauge)))
This comes with all the fun of managing queues, so you'll probably want
to watch the logs to make sure messages aren't being dropped; you have
no idea whether an event successfully made it to librato, etc. You may
want to wrap the async-queue! stream in an (exception-stream (email
"...")) to inform you about errors.
Later, I'm going to extend the librato-metrics streams to accept vectors
of events, and then you can use things like (fixed-event-window) to
batch your metrics together into single HTTP requests. That should
further improve performance.
Warning: this feature is very new! I have not tuned it in production and
I have no idea how it will behave for your particular balance of IO,
concurrency, latency, client demand, etc. Queues are a dark art. ;-)
--Kyle