Production monitoring tools?

1,319 views
Skip to first unread message

Chris Toomey

unread,
Mar 25, 2014, 10:25:38 AM3/25/14
to akka...@googlegroups.com
We're developing our first Akka application and are thus new to the larger ecosystem.  I've spent the last hour or so searching for Akka monitoring tools but haven't come across much of anything as far as production monitoring tools.  I see that Typesafe Console is aimed at development time and not recommended for production usage, and I've seen a few things on github, but that's about it.

What tools are folks using or would you recommend using to monitor production Akka applications for Akka-specific metrics like message rates, mailbox sizes, actor counts, etc.?

thx,
Chris

Chris Toomey

unread,
Mar 26, 2014, 2:48:42 PM3/26/14
to akka...@googlegroups.com
Are any of you folks in this group running production Akka systems, and if so, which of the following classes of metrics are you collecting and monitoring?

1) Generic JVM metrics

2) Generic Akka metrics (mailbox sizes, message rates, etc.)

3) Application-specific metrics

If you're collecting/monitoring generic Akka metrics, how are you doing it?

If you're not, is it because 1) and 3) are sufficient, or because you've not found a good way to do 2)?

Thanks for any guidance.

Chris

Edward Steel

unread,
Mar 26, 2014, 3:55:46 PM3/26/14
to Akka User List
We're using a combination of things at work. Sensu when alerts are needed and graphite/statsd for collecting, displaying, monitoring metrics. Statsd's pretty powerful, and graphite does the job as far as powering dashboards (there are some alternatives though).

More specifically:

1) java.lang.management._ + statsd -- an actor schedules itself at the rate we need to report heap size and gc time. We also monitor sub-jvm system stats in statsd using diamond.

2) Statsd, ad-hoc as needed. Mailbox size isn't really accessible, and trying to monitor it is discouraged (http://letitcrash.com/post/17707262394/why-no-mailboxsize-in-akka-2). Instead you can monitor time between the message being sent from one actor and worked on by another (by injecting timestamps in the message), or you can monitor all requests and see if there's a time lag between reducing the load on the sender and a corresponding reduction appearing on the receiver (while it chews through its mailbox).  However if you monitor message count and various timings, especially if you measure timing for specific components and compare them, you can generally see bottlenecks pretty clearly.

3) Statsd again. We use it a ton. And since the client is nio/udp and supports sampling, we never see a problem with overhead. The worst you'll see is strange graphs if you overload graphite itself.

You're very welcome to try our statsd client. It's on github at github.com/hootsuite/statsd-client . It's pretty much a wrapper around etsy's client with some scala niceties and typesafe config.  It's missing some examples but is pretty straightforward (see https://github.com/hootsuite/statsd-client/blob/master/src/main/scala/com/hootsuite/statsd/StatsdReporting.scala)

Would always love to hear of better solutions! I've heard vague rumours typesafe are working on something for production.

Cheers,
Edd


--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Dragisa Krsmanovic

unread,
Mar 26, 2014, 4:04:55 PM3/26/14
to akka...@googlegroups.com
We use pillage (https://github.com/bigtoast/pillage) to capture application metrics and Graphite for display.

Ivan Topolnjak

unread,
Mar 26, 2014, 6:00:51 PM3/26/14
to akka...@googlegroups.com
Chris,

1) Generic JVM metrics

There are many approaches on how to get JVM metrics out there, and Edward's suggestion seems pretty cool.. I'll just skip this item.

2) Generic Akka metrics (mailbox sizes, message rates, etc.)

Here comes the tricky part. I totally agree that having access to the mailbox size inside an actor is a bad idea and have no complaints about its removal from the actor API, but I would never say that it is discouraged to monitor mailbox sizes for some of your actors! In my experience with Akka, the most common cause of OutOfMemoryErrors are unbounded mailboxes pilling up messages because some part of the application got slower than we thought, in some areas of our apps where we didn't think that back pressure control was necessary... sometimes a little increase in processing times for an actor might seem harmless, if you only look at processing times, but it can end up pilling thousands of messages in a mailbox over time and a simple processing time increase might turn into a app crash.. you definitely want to keep an eye on some mailboxes on your application. Also, monitoring min and max mailbox sizes you get the correct answers to questions like ¿do I need to make this actor a router?, ¿do I need to tweak this dispatcher?, never underestimate the amount of info you can infer from looking at some mailbox metrics!

You mentioned in your first post that the Typesafe Console is not recommended for production usage, but as far as I known there is no such limitation, I suggest you to ask for more information on the typesafe-console mailing list [1] to make sure you get the most up to date info. The applications we currently have in production are monitored by Kamon [2] and we are pushing our metrics data to NewRelic, but Kamon can offer you the metrics you want and you can then push those metrics to whatever metrics backend you prefer. We heard a lot about statsd lately so I just created an issue to track integration with statsd [3], which should be really easy to achieve. As mentioned on the "Why no mailbox size in Akka 2?" blog, the collections backing some mailbox implementations have O(n) time implementations of .size(), we are experimenting on attaching counters to message queues to overcome this problem and have a constant time implementation of .numberOfMessages, but that shouldn't be a limitation to monitoring mailboxes. I'll be writing more docs about how we do metrics collection and update you guys as soon as it is available in our site. Oh! we also have processing-time and time-in-mailbox metrics!

3) Application-specific metrics

Kamon also provides you with tracing facilities [4], the current documentation available on the site just talks about passing a TraceContext around, but that's not the only thing we do with traces, once a trace is finished we generate elapsed time metrics for the entire trace and Kamon can also automatically recognize when a request starts/ends for Spray applications (Play! support is almost there!). Kamon also has custom metrics support but it's API needs some more love before we spread the word. I can't show our NewRelic custom metrics dashboard since they contain some sensitive information, but here is a overview page for one of our applications [5] that is generated with the data we get out of Kamon.

Currently Kamon can do a lot more than what you can see on our site, just give us some time and the info will be there :), feel free to ask anything you need on the mailing list [6], we are here to help. I hope you find this information useful, best regards!

Chris Toomey

unread,
Mar 26, 2014, 8:08:03 PM3/26/14
to akka...@googlegroups.com
Thanks Edward, very helpful.   We already use statsd and graphite for our other apps and are also big fans of it.  Will take a look at your client, thanks for sharing.

Chris


On Wed, Mar 26, 2014 at 12:55 PM, Edward Steel <edward...@gmail.com> wrote:
You received this message because you are subscribed to a topic in the Google Groups "Akka User List" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/akka-user/lZ3UjPoy1go/unsubscribe.
To unsubscribe from this group and all its topics, send an email to akka-user+...@googlegroups.com.

Chris Toomey

unread,
Mar 26, 2014, 8:12:16 PM3/26/14
to akka...@googlegroups.com
Thanks Dragisa, haven't come across pillage before, will take a look at that.

Chris


You received this message because you are subscribed to a topic in the Google Groups "Akka User List" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/akka-user/lZ3UjPoy1go/unsubscribe.
To unsubscribe from this group and all its topics, send an email to akka-user+...@googlegroups.com.

Chris Toomey

unread,
Mar 26, 2014, 8:28:46 PM3/26/14
to akka...@googlegroups.com
Thanks Ivan, I'll definitely check out Kamon and inquire further about Typesafe Console.

One thing you mentioned that I didn't understand: "As mentioned on the "Why no mailbox size in Akka 2?" blog, the collections backing some mailbox implementations have O(n) time implementations of .size(), we are experimenting on attaching counters to message queues to overcome this problem and have a constant time implementation of .numberOfMessages, but that shouldn't be a limitation to monitoring mailboxes."

Are you're saying that the O(n) time .size() implementation shouldn't stop folks from monitoring mailboxes (contrary to the blog post and you guys working on queue counters instead)?  If so, can you explain, or if not, what did you mean?

thx,
Chris




--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to a topic in the Google Groups "Akka User List" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/akka-user/lZ3UjPoy1go/unsubscribe.
To unsubscribe from this group and all its topics, send an email to akka-user+...@googlegroups.com.

Ivan Topolnjak

unread,
Mar 26, 2014, 8:52:10 PM3/26/14
to akka...@googlegroups.com
Chris,

sorry, I didn't express myself correctly, what I wanted to say is that not having constant time .size() implementations by default doesn't mean that we shouldn't monitor mailbox sizes, but instead find a alternative, efficient way to get those numbers, which in my opinion are very important when monitoring akka actors. Glad to know you are interested in Kamon, hope you find it useful and that this answer clarifies your doubt, best regards!

Roland Kuhn

unread,
Mar 27, 2014, 3:27:59 AM3/27/14
to akka-user
Hi Ivan,

I completely agree that monitoring the queues in your system (which translates to Actor mailboxes here) is a very useful thing to do, and the reasoning for removing access to the mailbox size from the Actor itself does indeed not apply to external monitoring. My thinking goes into the direction of using bounded queues way more than today—both for efficiency reasons and to maintain responsiveness as well (the fewer things queued before your actor, the shorter the response latency of that actor). The mailbox could for example fire monitoring events when its size crosses certain watermarks or runs full, alerting ops personnel of the condition immediately (where unexpected). This can be done extremely efficiently for an array-backed ring-buffer mailbox.

We currently don’t have resources to implement this in the core team, but I know that there are several skilled hakkers out there who might eventually get around to such a fun project ;-)

Regards,

Roland

27 mar 2014 kl. 01:52 skrev Ivan Topolnjak <ivan...@gmail.com>:

Chris,

sorry, I didn't express myself correctly, what I wanted to say is that not having constant time .size() implementations by default doesn't mean that we shouldn't monitor mailbox sizes, but instead find a alternative, efficient way to get those numbers, which in my opinion are very important when monitoring akka actors. Glad to know you are interested in Kamon, hope you find it useful and that this answer clarifies your doubt, best regards!

--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.

To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.



Dr. Roland Kuhn
Akka Tech Lead
Typesafe – Reactive apps on the JVM.
twitter: @rolandkuhn


Roland Kuhn

unread,
Mar 27, 2014, 3:22:10 AM3/27/14
to akka-user
Thanks for sharing, this will be very useful for people searching this forum!

Regards,

Roland

Patrik Nordwall

unread,
Mar 27, 2014, 4:43:48 AM3/27/14
to akka...@googlegroups.com
The LoggingMailbox has helped me debugging bottlenecks many times.
/Patrik

Patrik Nordwall
Typesafe Reactive apps on the JVM
Twitter: @patriknw

Ivan Topolnjak

unread,
Mar 27, 2014, 10:05:59 AM3/27/14
to akka...@googlegroups.com
Roland,

I certainly feel like moving towards bounded queues as the rule rather than the exception is the way to go. Setting reasonable bounds is a task requiring deep knowledge of the application domain, actor/dispatcher dynamics and running platforms (just to name a few), and it is difficult to propose a checklist that will guide people through the optimal bounds for their case, but I guess that with time and more information taken out of live systems the community will develop a better understanding on this topics and it will be a lot easier and common to set bounds. Maybe introducing bounds in terms of low-latency and performance might rise more interest in the topic.

Your proposal about firing monitoring events upon reaching some mailbox size watermarks seems pretty neat, we will certainly consider adding some of that goodness into Kamon :), is there any ticket tracking that feature in assembla/github?, maybe we can give a hand on adding this feature to akka as well.

Patrick,

thanks for mentioning LoggingMailbox, it certainly looks pretty nice and people can use it right away, will keep it on my toolbox.


have a nice day guys, best regards!

Edward Steel

unread,
Mar 27, 2014, 11:49:45 AM3/27/14
to Akka User List

I completely agree that monitoring the queues in your system (which translates to Actor mailboxes here) is a very useful thing to do, and the reasoning for removing access to the mailbox size from the Actor itself does indeed not apply to external monitoring.

Good to know! I'll look out for developments in this area.

Thanks Ivan for the information, I'll also keep an eye on kamon, which looks really cool.

Patrik, is LoggingMailbox part of any release? Or just some useful code to keep around? (I certainly will)

Cheers,
Edd

Patrik Nordwall

unread,
Mar 27, 2014, 12:27:36 PM3/27/14
to akka...@googlegroups.com



27 mar 2014 kl. 16:49 skrev Edward Steel <edward...@gmail.com>:


I completely agree that monitoring the queues in your system (which translates to Actor mailboxes here) is a very useful thing to do, and the reasoning for removing access to the mailbox size from the Actor itself does indeed not apply to external monitoring.

Good to know! I'll look out for developments in this area.

Thanks Ivan for the information, I'll also keep an eye on kamon, which looks really cool.

Patrik, is LoggingMailbox part of any release? Or just some useful code to keep around? (I certainly will)

It's only something to copy (and modify) into your project. It has significant overhead, so don't use it as default mailbox in prod.

/Patrik

Chris Toomey

unread,
Mar 27, 2014, 1:31:45 PM3/27/14
to akka...@googlegroups.com
Thanks Patrik, that looks like a great, simple template for monitoring mailbox size with statsd, etc.

Chris

You received this message because you are subscribed to a topic in the Google Groups "Akka User List" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/akka-user/lZ3UjPoy1go/unsubscribe.
To unsubscribe from this group and all its topics, send an email to akka-user+...@googlegroups.com.

Eugene Vigdorchik

unread,
Mar 28, 2014, 5:15:41 AM3/28/14
to akka...@googlegroups.com


четверг, 27 марта 2014 г., 12:43:48 UTC+4 пользователь Patrik Nordwall написал:
The LoggingMailbox has helped me debugging bottlenecks many times.
/Patrik
Hi Patrik,
this is very nice indeed, I'll incorporate this into rxmon.

Thamks,
Eugene.

Eugene Vigdorchik

unread,
Mar 28, 2014, 5:30:50 AM3/28/14
to akka...@googlegroups.com
Hi Ivan,
In rxmon https://github.com/vigdorchik/rxmon I try to allow for dynamic rules configuration. That is, the user should be able to specify a rule that makes the actor routing to subactors based on the stream of values that are obtained. One such rule may be to set a router if the mailbox size grows too fast. Whether this is specific to a particular application or is universally applicable still remains to be investigated though.

Cheers,
Eugene.

четверг, 27 марта 2014 г., 18:05:59 UTC+4 пользователь Ivan Topolnjak написал:

Ivan Topolnjak

unread,
Mar 28, 2014, 10:40:20 AM3/28/14
to akka...@googlegroups.com
Eugene, using rxmon seems like an interesting approach, I'll keep an one on that and let you know if something comes out of it!

Eugene Vigdorchik

unread,
Mar 31, 2014, 4:07:19 AM3/31/14
to akka...@googlegroups.com


пятница, 28 марта 2014 г., 18:40:20 UTC+4 пользователь Ivan Topolnjak написал:
Eugene, using rxmon seems like an interesting approach, I'll keep an one on that and let you know if something comes out of it!
Hi Ivan,
I'm really curious to know about your findings!
Eugene.
Reply all
Reply to author
Forward
0 new messages