Performance Monitoring in NodeJS without external Services (StrongLoop, AppDynamics, nodetime, etc.)

376 views
Skip to first unread message

Andreas Marschke

unread,
Nov 10, 2014, 3:03:17 PM11/10/14
to nod...@googlegroups.com
Hi,

first off: Don't get me wrong I like the fact, that there are companys out ther supporting nodejs with 
extra services and "consulting". I would however prefer to be the master over my data and learn from it
myself without a frontend that takes away most of the insights they (probably) learned the hard way.

So, my question is this: What do you currently prefer to monitor your applications and profile their performance?

As far as I've seen there is only a limited amount of still viable solutions. I'm about to checkout tracegl. But I feel 
that it would be lacking any helpfull understanding as seeing that the project itself - in full Mozilla Fashion (see 
i18n-abide for another example) -  is barely documented and seems to support a limited amount of platforms
currently(Mostly those that support WebGL and have it enabled, which is not the case for me most of the time).

Other Solutions (ie. node-profiler ) have been gutted to an extend as their upstream dependency (v8) has 
removed most of the functionality they so heavily depend on. 

Another pain point as I see it in profiling my nodecode is, that I don't run on illu(o)mos which also makes 
30% - 40% of the documentation in the Joyent ("patron" of nodejs) best practices for debugging/optimizing
profiling worthless to me. 

Is there something I'm missing at this point? I would LOVE to understand where my code may go wrong or
where it is excessively brittle performance-wise, but at this stage it is very hard for me to see much in the 
way of a proper toolchain here. 

Have there been any noteworthy projects around Node performance that I have simply been missing? 
Is it worth reading the Node C/C++ code to get a full grasp of how it works? Is it simply an issue of p
promotion in the sense that not all parts of the community have embraced it?

I am happy for any and every suggestions you may write underneath this. Feel free also to lambast me on my 
maybe ignorant views towards services like nodetime and StrongLoop.

Thanks in advance, 

Andreas Marschke,

Ben Noordhuis

unread,
Nov 10, 2014, 7:13:28 PM11/10/14
to nod...@googlegroups.com
Full disclosure: I'm the author of node-profiler and node-heapdump and
a StrongLoop founder.

Thorsten Lorenz maintains a curated list of profiling and debugging
tools, see [0]. Be warned that the free software landscape is still
pretty scattered and that won't change anytime soon. Writing good,
integrated performance/debugging tools takes time and is not very
rewarding in itself; most of the existing tools scratch a particular
itch and stop there.

If I can circle back to a comment you made at the start of your post -
being the master of your own data; the StrongLoop agent is moving to a
model where it simply collects the raw or aggregated metrics and
leaves it up to you to process or store it. We provide integration
for popular services like statsd[1], Splunk[2], Datadog[3], etc. Here
is an example of our statsd integration[4]:

var agent = require('strong-agent');
var statsd = require('strong-agent-statsd')();
agent.use(statsd); // that's all, metrics are now reported to statsd

You can find more details in the README[5]; scroll down to the section
on the metrics API. Questions or suggestions welcome.

[0] https://github.com/thlorenz/v8-perf

[1] https://github.com/etsy/statsd

[2] http://www.splunk.com/

[3] https://www.datadoghq.com/

[4] https://github.com/strongloop/strong-agent-statsd

[5] https://github.com/strongloop/strong-agent/blob/master/README.md

Andreas Marschke

unread,
Nov 11, 2014, 6:04:36 AM11/11/14
to nod...@googlegroups.com
Hi Ben, 

thanks for your pointers. 

The fact that I need an API-key and have to (If I understand you correctly) still send my data to the service provider (in worst case 
even through the same network interface as my data I need to process comes in from) means I still have no control over my data
and sorry but some more security conscious potential customers may take an issue with that. Besides the fact that I'll have to 
doubly saturate my existing bandwidth to get it somehwere else is slightly questionable. What I actually meant by "master of 
my own data" (granted the terminology is a bit messy here) is 1) I keep my data on my premises and get to validate and evaluate
it myself. A solution (and still use StrongLoop) would be providing an appliance or installation package for "on-premise" deployment
for StrongLoop.

Since you said that the landscape is "scattered" regarding performance and profiling and the like. Would it be an improvement to 
commence something like a roundtable of the elders to start "Working Group" or (in theory more agile than a committee) that would
define, support and create standards as to what to profile? Maybe even ask people from the Companies in the community (ie. 
yourself and breatheren) with blink/v8/node developers together and think of a proper way to guide community as a whole towards 
a common goal of a good standard and development of good tools. I've seen it work for years in the networking community (see 
RIPE working groups) sure there will be the occasional vitriol and unnecessary bikeshedding but if it helps more than a handful of 
developers to better understand the node environment from a performance point of view. This could be a sister project to what has 
been instantiated next to the recently unveiled nodejs group.

Just throwing it out there may it'll stick. 

Either way, thanks for the input I will review the sources. 

Cheers,

Andreas Marschke.

Bruno Jouhier

unread,
Nov 11, 2014, 6:56:01 PM11/11/14
to nod...@googlegroups.com
Hi Andreas,

My little contribution to the topic: https://github.com/Sage/streamline-flamegraph. It does not depend on an external service but it assumes that you are writing your code with streamline.js.

The cool feature is that it gives you insights into your "long" stack traces and it generates two graphs: with and without I/O waits. So you can quickly figure out where your program is spending its time both in terms of elapsed times (including I/O) and CPU times. The drawback is that it does not tell you anything about your sync code. It lets you identify the CPU bound operations at a macro level but you may need another tool to dive into them. So, as it is, this is a tool for the applicative level, not the driver level.

I implemented the instrumentation inside streamline but something similar could probably be done with other sync solutions (fibers or generator based). You just need a small set of instrumentation hooks in the library to do this. On the other hand, it won't be of any help if the code is written with raw callbacks.

Standards could help. For example I'm using a perl script to convert the stack trace recordings into a flamegraph. If we had a standard JSON format for stack trace recordings maybe someone would come up with a JS flamegraph converter or with cool JS widgets to manipulate the flamegraphs (my solution only provides static graphs - would be nice to be able to zoom into a specific subtree).

Bruno

Ben Noordhuis

unread,
Nov 11, 2014, 7:11:54 PM11/11/14
to nod...@googlegroups.com
On Tue, Nov 11, 2014 at 12:04 PM, Andreas Marschke
<andreas....@gmail.com> wrote:
> Hi Ben,
>
> thanks for your pointers.
>
> The fact that I need an API-key and have to (If I understand you correctly)
> still send my data to the service provider (in worst case
> even through the same network interface as my data I need to process comes
> in from) means I still have no control over my data
> and sorry but some more security conscious potential customers may take an
> issue with that. Besides the fact that I'll have to
> doubly saturate my existing bandwidth to get it somehwere else is slightly
> questionable. What I actually meant by "master of
> my own data" (granted the terminology is a bit messy here) is 1) I keep my
> data on my premises and get to validate and evaluate
> it myself. A solution (and still use StrongLoop) would be providing an
> appliance or installation package for "on-premise" deployment
> for StrongLoop.

Maybe I didn't explain it well but in metrics-only mode, no data is
sent to our collector. Basically, the agent collects metrics
in-process and your agent.use() callback is the data sink. We have
more advanced options for clustered applications but that's the basic
mode of operation.

> Since you said that the landscape is "scattered" regarding performance and
> profiling and the like. Would it be an improvement to
> commence something like a roundtable of the elders to start "Working Group"
> or (in theory more agile than a committee) that would
> define, support and create standards as to what to profile? Maybe even ask
> people from the Companies in the community (ie.
> yourself and breatheren) with blink/v8/node developers together and think of
> a proper way to guide community as a whole towards
> a common goal of a good standard and development of good tools. I've seen it
> work for years in the networking community (see
> RIPE working groups) sure there will be the occasional vitriol and
> unnecessary bikeshedding but if it helps more than a handful of
> developers to better understand the node environment from a performance
> point of view. This could be a sister project to what has
> been instantiated next to the recently unveiled nodejs group.

There was a "birds of a feather" session in Vancouver in August; my
co-worker Sam represented StrongLoop.

To the best of my knowledge, not much came of it and I suspect that's
because there is not much overlap in the wants and needs of the
stakeholders. About the only thing everyone could agree on is that it
would be great if we don't have to monkey-patch everything when
instrumenting applications and that is what Trevor Norris's
async-listener work is trying to do.

The situation is different from node-forward. With that project, it's
clear what the problems are that need to be addressed, and there is
broad consensus on how to make it happen.

// ravi

unread,
Nov 11, 2014, 7:32:02 PM11/11/14
to nod...@googlegroups.com
On Nov 11, 2014, at 5:59 PM, Ben Noordhuis <in...@bnoordhuis.nl> wrote:

Maybe I didn't explain it well but in metrics-only mode, no data is
sent to our collector.  Basically, the agent collects metrics
in-process and your agent.use() callback is the data sink.  We have
more advanced options for clustered applications but that's the basic
mode of operation.


This is great, thanks. Definitely something I would want to try out.


There was a "birds of a feather" session in Vancouver in August; my
co-worker Sam represented StrongLoop.

To the best of my knowledge, not much came of it and I suspect that's
because there is not much overlap in the wants and needs of the
stakeholders.


I find this surprising. Aren’t the problems (at least the common subset) the same as in any medium to large scale project? Find out where the CPU cycles are spent, figure out if memory is leaking, etc. Or am I off on the wrong tangent w.r.t this thread?

—ravi

Stephen Belanger

unread,
Nov 11, 2014, 8:31:46 PM11/11/14
to nod...@googlegroups.com

That meeting was about finding common ground in tracing-related needs to determine what can be done to improve the situation for everyone. It was somewhat successful at highlighting those commonalities too.

The issue is not a lack of consistency of tracing needs, but that the tooling to trace async js automatically doesn't exist and would thus require someone stepping up to develop it in the open and lead the community to start using it.

TJ Fontaine did a bit of experimentation with node-tracing, and Forrest Norvell made CLS. Trevor Norris had also created AsyncListener to track changes between async contexts. But all the approaches so far have been too naive or simplistic and ultimately flawed.

The largest obstacle I've seen to overcoming this is unwillingness or inability to put in the time or resources necessary and to work together to find a proper solution. Companies selling performance monitoring, including the one I work at, are focused on adding value to their own products. Putting significant effort into projects that devalue those is unappealing.

Until node APM products shift their core value to something beyond simply collecting and displaying data, those will remain features core to their product and thus often deemed not acceptable for significant open development.

The pursuit of competitive advantage can be problematic when it comes to underdeveloped markets such as this.

--
Job board: http://jobs.nodejs.org/
New group rules: https://gist.github.com/othiym23/9886289#file-moderation-policy-md
Old group rules: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
---
You received this message because you are subscribed to the Google Groups "nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nodejs+un...@googlegroups.com.
To post to this group, send email to nod...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nodejs/467C8A86-27E5-4B5A-A3C4-2838E8EDDD0E%40g8o.net.
For more options, visit https://groups.google.com/d/optout.

Andreas Marschke

unread,
Nov 11, 2014, 8:58:16 PM11/11/14
to nod...@googlegroups.com


On Wednesday, 12 November 2014 01:11:54 UTC+1, Ben Noordhuis wrote:
On Tue, Nov 11, 2014 at 12:04 PM, Andreas Marschke
<andreas....@gmail.com> wrote:
> Hi Ben,
>
> thanks for your pointers.
>
> The fact that I need an API-key and have to (If I understand you correctly)
> still send my data to the service provider (in worst case
> even through the same network interface as my data I need to process comes
> in from) means I still have no control over my data
> and sorry but some more security conscious potential customers may take an
> issue with that. Besides the fact that I'll have to
> doubly saturate my existing bandwidth to get it somehwere else is slightly
> questionable. What I actually meant by "master of
> my own data" (granted the terminology is a bit messy here) is 1) I keep my
> data on my premises and get to validate and evaluate
> it myself. A solution (and still use StrongLoop) would be providing an
> appliance or installation package for "on-premise" deployment
> for StrongLoop.

Maybe I didn't explain it well but in metrics-only mode, no data is
sent to our collector.  Basically, the agent collects metrics
in-process and your agent.use() callback is the data sink.  We have
more advanced options for clustered applications but that's the basic
mode of operation.


Ah my bad. But thanks for the information I will check it out. Thanks.
 
> Since you said that the landscape is "scattered" regarding performance and
> profiling and the like. Would it be an improvement to
> commence something like a roundtable of the elders to start "Working Group"
> or (in theory more agile than a committee) that would
> define, support and create standards as to what to profile? Maybe even ask
> people from the Companies in the community (ie.
> yourself and breatheren) with blink/v8/node developers together and think of
> a proper way to guide community as a whole towards
> a common goal of a good standard and development of good tools. I've seen it
> work for years in the networking community (see
> RIPE working groups) sure there will be the occasional vitriol and
> unnecessary bikeshedding but if it helps more than a handful of
> developers to better understand the node environment from a performance
> point of view. This could be a sister project to what has
> been instantiated next to the recently unveiled nodejs group.

There was a "birds of a feather" session in Vancouver in August; my
co-worker Sam represented StrongLoop.

To the best of my knowledge, not much came of it and I suspect that's
because there is not much overlap in the wants and needs of the
stakeholders.  About the only thing everyone could agree on is that it
would be great if we don't have to monkey-patch everything when
instrumenting applications and that is what Trevor Norris's
async-listener work is trying to do.

The fact that you came to no consensus during the discussion makes me wonder how the BoF worked and/or how you went about finding consensus.
Sure existing tools that could be improved are always there. But the question is can you with an acceptable amount of effort improve them to service the 
whole or most parts of the node-community.

The much more (to me at least)  frustrating thing about this is that node as a scripting language is not the first to experience these painpoints and surely 
will not be the last. And I cannot imagine that 99% of the able-minded community around NodeJS came from wrinting Jquery to writing Node all day.
(Not to undermine or devalue those who HAVE come from a pure Web-Development Background). 

Most of us have (IME) either a CS-Degree or some form of former knowledge towards these things and have seen other projects/languages struggle
under similar conditions in other areas. 

More to the point: Given that Node/Javascript -- and v8 for that matter -- came from a Frontend World  where performance is king especially when improving on a 
byte-level what has been (sometimes) ruined on the BL-level, it bogels my mind that there are not much else except the Webtoolkit from Blink/Webkit to understand
what was wrong.

The question is too how much of what is to be analysed and understood is actually Node and not v8 and how have v8-developers been standing towards this issue?
 
The situation is different from node-forward.  With that project, it's
clear what the problems are that need to be addressed, and there is
broad consensus on how to make it happen.

I'll have a look at node-forward. Granted it is not necessarily performance focused but being a sort of self-help/support-group makes it kind of worthwhile to be checked out,
 

Bruno Jouhier

unread,
Nov 12, 2014, 7:54:22 AM11/12/14
to nod...@googlegroups.com

TJ Fontaine did a bit of experimentation with node-tracing, and Forrest Norvell made CLS. Trevor Norris had also created AsyncListener to track changes between async contexts. But all the approaches so far have been too naive or simplistic and ultimately flawed.

 Isn't this a bit of a red herring?

If you extend the language with an async/await construct (which is what streamline.js does), then the continuations are captured at the "language" level and you can instrument the language runtime to propagate a context across continuations and to track performance counters with "long stacktrace" semantics. This is completely transparent. The instrumentation service is provided by the "language" runtime and does not require any special support from the specific libraries that you are using (although I haven't tried it this way, streamline-flamegraph recording should work in the browser).

If you don't extend the language with async/await then you need ad-hoc mechanisms. You need special mechanisms in node.js to propagate a context across continuations. This can be done in APIs like fs that are aligned on the continuation callback pattern but it won't work with other APIs that use events for continuation (for example a connect call that continues with connect/error events, or a read call that continues with readable/error events).

I see a lot of effort being put in trying to provide context propagation features around raw callbacks (domains, long-stacktrace, async listeners). These APIs are of no use for people like me who are relying on a "language-level" construct and who get these features for free from their language runtime. They add complexity and they don't seem to hit their target (at least so far).

IMO, raw callbacks should just be raw callbacks. They should be as fast as possible and they should not try to deal with context propagation. If developers want context propagation they should use a language extension (a la async/await) that captures the "continuation threads". They will also get other benefits (like robust exception handling). 

Better be explicit about what your program means (express your continuations with async/await) than rely on ad-hoc mechanisms that will try to reconstruct the continuation semantics from callbacks and special APIs.

My 2 cents.

Bruno

Andreas Marschke

unread,
Nov 12, 2014, 2:40:15 PM11/12/14
to nod...@googlegroups.com
+1

The important question here I guess is : How do you get these library neutral instrumentations into the node runtime? Which is  also what I eluded to asking if would make sense to have people from v8 & node core team together with people from the node-community at the same table and start working out a standardize solution without stepping on each others toes and make node and subsequently v8 a better engine to work with. Imagine the benefits the browsers using v8 could foster from a central orchestration of changes. 

Just saying.

--
You received this message because you are subscribed to a topic in the Google Groups "nodejs" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/nodejs/XYJih9Jv_40/unsubscribe.
To unsubscribe from this group and all its topics, send an email to nodejs+un...@googlegroups.com.

To post to this group, send email to nod...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Mit freundlichen Grüßen,

Andreas Marschke.
_________________________________________________________

Stephen Belanger

unread,
Nov 12, 2014, 3:37:11 PM11/12/14
to nod...@googlegroups.com

Unfortunately, Google/V8 doesn't really care about nodejs. They only care about the browser, for which they deem the WebKit inspector to be sufficient. StrongLoop put so much effort into their node-inspector port of those tools for that exact reason.

You received this message because you are subscribed to the Google Groups "nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nodejs+un...@googlegroups.com.

To post to this group, send email to nod...@googlegroups.com.

Alex Gounares

unread,
Nov 14, 2014, 5:48:28 PM11/14/14
to nod...@googlegroups.com
Hi Andreas

Another option to consider for performance monitoring is Concurix (http://concurix.com).  Full disclosure--I am the founder and CEO there!

We have a fully on premise profiling and monitoring solution for Node.js.  There is no API key or signup needed, and it runs on stock Node.js on any OS.  

Concurix works by injecting performance instrumentation at load time at a source level versus using the v8 profiler.  This has the advantage of providing monitoring/profiling across the entire app, not just a limited set of libraries.  It also means you can continuously run the profiler, even in production.  However, it does mean the profiling is at a higher semantic level (the function/module level) versus say the v8 profiler.

Give it a whirl and we'll be happy to help you out!

regards,
alex

Andreas Marschke

unread,
Nov 14, 2014, 10:05:33 PM11/14/14
to nod...@googlegroups.com
Hi Alex,

thank you very much. Has any of the work that you and your company have made filtered back to the community in any way? Just interested. 
Because as we've already realized in this Thread. The tools in the community is very barebones or not entirely interoperable or anything. 

At best -- to my mind -- there should a better (maybe even standardized?) toolset or set of libraries to better understand the layers and what is going on in the application.
Sure Node being Node means some optimizations are already there (v8 coming from Chrom(e/ium))  but you still have javascript as your "FootGun" (see Crockford). 

Like I suggested there could/should be a concerted effort towards that.

Thanks! 

<!-- please stop me if I've become rambly about this... -->

Nodar Chkuaselidze

unread,
Nov 15, 2014, 1:56:17 AM11/15/14
to nod...@googlegroups.com
Is there the SET of tools that you can work while being offline ?

so you don't need to communicate to other server, login to other system... Just like node-inspector, but for profiling and monitoring ?

Mark Getz

unread,
Nov 17, 2014, 10:30:24 AM11/17/14
to nod...@googlegroups.com
Alex,
Does your code work well with Streamlined code yet? Liked your software but in earlier tests the overhead brought our system down because we used streamline which adds a lot of extra functions when it rewrites the code. 

We are currently writing our own flavor of performance monitoring using the hooks provided in streamline. It adds a 10% overhead at the moment but looks like it will give us good timing and code path metrics for our system.  Most of the overhead time when using the streamline hooks is in saving the data. Collecting the data is minimal overhead.

Thanks.
Reply all
Reply to author
Forward
0 new messages