On 01/31/2014 07:10 AM, Tommy Atkinson wrote:
I don't have a fix, per se, but here's what I can tell you.
Your heap size (256MB) is small--which is nice for me to analyze, but
probably puts Riemann under a lot more GC load than you'd really like.
I'm guessing (we'll get to that) that you're putting a lot of events
through Riemann, so a bigger heap is almost certainly a good idea and
may prevent the memory issues you saw.
Almost all the memory in this dump is consumed by 6 TCP channels, each
of which is retaining 40MB in its writebufferqueue. That's a *huge*
number of acknowledgement messages that Riemann is trying to send back
to the client, but the kernel couldn't flush.
This suggests to me two things.
One: your client is likely (intentionally or otherwise) not respecting
Riemann's backpressure; it sent a ton of messages in a row without
waiting for Riemann's acknowledgements. Use the synchronous TCP methods,
or try to limit the number of outstanding messages you send on the wire.
Some pipelining is desirable for performance, but you can cut memory use
dramatically by limiting in-flight requests to, I dunno, less than a
thousand.
Two: something *weird* happened--possibly a network hiccup? You noticed
thousands of closed connections, which suggests to me that either the
Riemann process was really overloaded to start with, and this pushed it
over the edge--or that the network was, I dunno, delivering packets in
one direction but not the other? Or maybe this is normal behavior for
your clients? I dunno!
I'm not *exactly* sure how to address this problem. On the one hand, I
consider any crash in Riemann a bug, so we need to figure out how to fix
this. On the other hand, it's not clear how to distinguish this case
from an intentional highly-pipelined connection, except in the depth of
the write queue--and where the appropriate place is to add backpressure
on the TCP stack via set_writable. I've got some feelers out to the
Netty channel for advice, but meanwhile, you may want to consider your
client use case.
Hope this helps!
--Kyle