Improving the performance of stacktrace generation

Charles Oliver Nutter

unread,

Jul 7, 2012, 6:03:34 PM7/7/12

to Da Vinci Machine Project, JVM Languages

Today I have a new conundrum for you all: I need stack trace
generation on Hotspot to be considerably faster than it is now.

In order to simulate many Ruby features, JRuby (over)uses Java stack
traces. We recently (JRuby 1.6, about a year ago) moved to using the
Java stack trace as the source of our Ruby backtrace information,
mining out compiled frames and using interpreter markers to peel off
interpreter frames. The result is that a Ruby trace with mixed
compiled and interpreted code like this
(https://gist.github.com/3068210) turns into this
(https://gist.github.com/3068213). I consider this a great deal better
than the plain Java trace, and I know other language implementers have
lamented the verbosity of stack traces coming out of their languages.

The unfortunate thing is that stack trace generation is very expensive
in the JVM, and in order to generate normal exceptions and emulate
other features we sometimes generate a lot of them. I think there's
value in exploring how we can make stack trace generation cheaper at
the JVM level.

Here's a few cases in Ruby where we need to use Java stack traces to
provide the same features:

* Exceptions as non-exceptional or moderately-exceptional method results

In this case I'm specifically thinking about Ruby's tendency to
propagate errno values as exceptions; EAGAIN/EWOULDBLOCK for example
are thrown from nonblocking IO methods when there's no data available.

You will probably say "that's a horrible use for exceptions" and I
agree. But there are a couple reasons why it's nicer too:
- using return value sigils requires you to propagate them back out
through many levels of calls
- exception-handling is cleaner in code than having all your errno
handling logic spliced into regular program flow

In any case, the cost of generating a stack trace for potentially
every non-blocking IO call is obviously too high. In JRuby, we default
to having EAGAIN/EWOULDBLOCK exceptions not generate a stack trace,
and you must pass a flag for them to do so. The justification is that
these exceptions are almost always used to branch back to the top of a
nonblocking IO loop, so the backtrace is useless.

* Getting the current or previous method's name/file/line

Ruby supports a number of features that allow you to get basic
information about the method currently being executed or the method
that called it. The most general of these features is the "caller"
method, which provides an array of all method name + file + line that
would appear in a stack trace at this point. This feature is often
abused to get only the current or previous frame, and so in Ruby 1.9
they added __method__ to get the currently-executing method's
name+file+line.

In both cases, we must generate a full Java trace for these methods
because the name of a method body is not necessarily statically known.
We often want only the current frame or the current and previous
frames, but we pay the cost of generating an entire Java stack trace
to get them.

* Warnings that actually report the line of code that triggered them

In Ruby, it is possible to generate non-fatal warnings to stderr. In
many cases, these warnings automatically include the file and line at
which the triggering code lives. Because the warning logic is
downstream from the Ruby code, we again must use a full Java stack
trace to find the most recent (on stack) Ruby frame. This causes
warnings to be as expensive as regular exceptions.

Because the use of frame introspection (in this case through stack
traces) has largely been ignored on current JVMs, I suspect there's a
lot of improvement possible. At a minimum, the ability to only grab
the top N frames from the stack trace could be a great improvement
(Hotspot even has flags to restrict how large a trace it will
generate, presumably to avoid the cost of accounting for deep stacks
and generating traces from them).

Any thoughts on this? Does anyone else have need for lighter-weight
name/file/line inspection of the call stack?

- Charlie

Rémi Forax

unread,

Jul 7, 2012, 6:31:12 PM7/7/12

to Da Vinci Machine Project, Charles Oliver Nutter, JVM Languages

I don't see how to do more.

>
> * Getting the current or previous method's name/file/line
>
> Ruby supports a number of features that allow you to get basic
> information about the method currently being executed or the method
> that called it. The most general of these features is the "caller"
> method, which provides an array of all method name + file + line that
> would appear in a stack trace at this point. This feature is often
> abused to get only the current or previous frame, and so in Ruby 1.9
> they added __method__ to get the currently-executing method's
> name+file+line.
>
> In both cases, we must generate a full Java trace for these methods
> because the name of a method body is not necessarily statically known.
> We often want only the current frame or the current and previous
> frames, but we pay the cost of generating an entire Java stack trace
> to get them.

You can use Throwable.getStackTraceElement()
which is package visible and OpenJDK specific but at least
it will be faster for all VMs that uses OpenJDK.

>
> * Warnings that actually report the line of code that triggered them
>
> In Ruby, it is possible to generate non-fatal warnings to stderr. In
> many cases, these warnings automatically include the file and line at
> which the triggering code lives. Because the warning logic is
> downstream from the Ruby code, we again must use a full Java stack
> trace to find the most recent (on stack) Ruby frame. This causes
> warnings to be as expensive as regular exceptions.

Please never optimize warnings, they are here to bug users
until they fix the thing. So they should be slow :)

>
> Because the use of frame introspection (in this case through stack
> traces) has largely been ignored on current JVMs, I suspect there's a
> lot of improvement possible. At a minimum, the ability to only grab
> the top N frames from the stack trace could be a great improvement
> (Hotspot even has flags to restrict how large a trace it will
> generate, presumably to avoid the cost of accounting for deep stacks
> and generating traces from them).
>
> Any thoughts on this? Does anyone else have need for lighter-weight
> name/file/line inspection of the call stack?
>
> - Charlie

R�mi

Alex Cruise

unread,

Jul 7, 2012, 6:35:05 PM7/7/12

to jvm-la...@googlegroups.com

Forgive me if this is a naive question, but as regards the caller/current-method question, since you have the complete AST in front of you, why not just synthesize extra arguments to any method that happens to use the caller or __method__ magic?

If it's at most the immediate caller that the callee is interested in, well, that is a service the compiler can provide, no stack trace skullduggery required.

And, of course, for errno, there is a middle ground between exceptions and magical in-band return values--in Scala-land we call it Either. When you're a whole-program compiler (or are allowed to pretend you are one until a monkey comes along) you can feel free to rewrite everyone's returns for your own optimization. Whether that turns out to be a win in the long run is, of course, a different question. :)

Alas, for want of an effect system... :)

-0xe1a (sorry for top-posting, on my phone)

--
You received this message because you are subscribed to the Google Groups "JVM Languages" group.
To post to this group, send email to jvm-la...@googlegroups.com.
To unsubscribe from this group, send email to jvm-language...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/jvm-languages?hl=en.

Charles Oliver Nutter

unread,

Jul 7, 2012, 6:50:35 PM7/7/12

to Rémi Forax, Da Vinci Machine Project, JVM Languages

On Saturday, July 7, 2012, Rémi Forax wrote:

You can use Throwable.getStackTraceElement()
which is package visible and OpenJDK specific but at least
it will be faster for all VMs that uses OpenJDK.

I'll certainly explore that to see if it improves the situation. If it's faster, it might be a way out in some circumstances.

It seems like an official public API is needed here...

Please never optimize warnings, they are here to bug users
until they fix the thing. So they should be slow :)

Yes, except when *everyone else* has faster warnings than you. Then you're just giving people another reason to not use your stuff.

Granted, JRuby is now becoming far and away the fastest Ruby implementation for exactly the reasons that stack traces are slow, but often making 90% of the code 5x faster but 10% of the code 100x slower means people give up on you without bothering.

Remember, I agree with you...but I also feel like this attitude has allowed stack trace generation to avoid optimization effort for a long time, and there *are* useful things you can do with an introspectable call stack.

- Charlie

Charles Oliver Nutter

unread,

Jul 7, 2012, 6:58:00 PM7/7/12

to jvm-la...@googlegroups.com

On Saturday, July 7, 2012, Alex Cruise wrote:

Forgive me if this is a naive question, but as regards the caller/current-method question, since you have the complete AST in front of you, why not just synthesize extra arguments to any method that happens to use the caller or __method__ magic?

Again, one of JRuby's "features" gets in the way of a simple approach; we have an interpreter too.

The dance goes sort of like this:

* JRuby is mixed-mode, and has an interpreted phase before code becomes JVM bytecode

* Generating back traces (or otherwise inspecting the call stack) requires that we splice together the "native" and interpreter traces

* As a result, any stack inspection requires a full Java stack trace plus the splicing logic (and the stack trace is by far the most expensive part of this)

It may be possible for us to pass it from the compiler and if it's not present pull off interpreter backtrace frames, but it's obviously more complicated. My ideal situation would be asking the JVM for the top N frames and reducing the cost of stack inspection by M/N where M is the normal full stack trace it generates for us.

If it's at most the immediate caller that the callee is interested in, well, that is a service the compiler can provide, no stack trace skullduggery required.

And, of course, for errno, there is a middle ground between exceptions and magical in-band return values--in Scala-land we call it Either. When you're a whole-program compiler (or are allowed to pretend you are one until a monkey comes along) you can feel free to rewrite everyone's returns for your own optimization. Whether that turns out to be a win in the long run is, of course, a different question. :)

Alas, for want of an effect system... :)

The idea of having a separate out-of-band mechanism isn't bad, but unfortunately people have written a lot of Ruby code around the idea that errnos are exceptions. So either we'd need to check return values for the exceptional path and raise it ourselves, or they'd have to change all that code.

For what it's worth, people *are* starting to realize (with great effort on our part) that anything that triggers an exception or inspects the stack is probably way more expensive than its worth. As JRuby increases in performance over the other implementations, the benefits of avoiding stack inspection should start to become apparent.

But in the short term...I'd like to understand what makes stack trace generation in the JVM slow and help explore what we can do to improve it.

- Charlie

Rémi Forax

unread,

Jul 7, 2012, 7:57:45 PM7/7/12

to Charles Oliver Nutter, Da Vinci Machine Project, JVM Languages

Throwable.getStackTraceElement() let you walk the stack trace without
generating the whole stack trace,
in my opinion it's enough.

BTW, I also think you are responsible too for the cost of generating
stacktraces because the cost depends on the size of the stacktrace
and an AST interpreter is a big consumer of stack frames. I am guilty
too, PHP.reboot has the same issue and
I try to mitigate that by avoiding to have an helper methods in the
stacktrace of calls (not that easy without tailcall).

>
> - Charlie

Rémi

Jason Zaugg

unread,

Jul 8, 2012, 5:40:21 AM7/8/12

to jvm-la...@googlegroups.com, Da Vinci Machine Project

On Sunday, July 8, 2012 12:03:34 AM UTC+2, Charles Oliver Nutter wrote:

* Exceptions as non-exceptional or moderately-exceptional method results

In this case I'm specifically thinking about Ruby's tendency to
propagate errno values as exceptions; EAGAIN/EWOULDBLOCK for example
are thrown from nonblocking IO methods when there's no data available.

You will probably say "that's a horrible use for exceptions" and I
agree. But there are a couple reasons why it's nicer too:
- using return value sigils requires you to propagate them back out
through many levels of calls
- exception-handling is cleaner in code than having all your errno
handling logic spliced into regular program flow

In any case, the cost of generating a stack trace for potentially
every non-blocking IO call is obviously too high. In JRuby, we default
to having EAGAIN/EWOULDBLOCK exceptions not generate a stack trace,
and you must pass a flag for them to do so. The justification is that
these exceptions are almost always used to branch back to the top of a
nonblocking IO loop, so the backtrace is useless.

That's also the approach used in Scala for non local returns from closures and breaks [1] [2].

-jason

[1] https://github.com/scala/scala/blob/master/src/library/scala/util/control/NoStackTrace.scala

[2] https://github.com/scala/scala/blob/master/src/library/scala/util/control/ControlThrowable.scala

Bob Foster

unread,

Jul 7, 2012, 7:32:38 PM7/7/12

to jvm-la...@googlegroups.com, Da Vinci Machine Project

> Any thoughts on this? Does anyone else have need for lighter-weight name/file/line inspection of the call stack?

Well, yes. Profilers do.

Recall Cliff Click bragging a couple of years ago at the JVM Language Summit about how fast stack trace generation is in Azul Systems' OSs...and knocking Hotspot for being so slow. It turns out that stack trace generation is a very significant overhead in profiling Hotspot using JVMTI. Even CPU sampling on 20 ms. intervals can add 3% or more to execution time, almost entirely due to the delay in reaching a safe point (which also guarantees the profile will be incorrect) and generating a stack trace for each thread.

But 3% is peanuts compared to the cost of memory profiling, which can require a stack trace on every new instance creation. In a profiler I wrote using JVMTI, I discovered that it was faster to call into JNI code on every method entry and exit (and exception catch), keeping a stack trace dynamically than to call into JNI only when memory was allocated and request a stack trace each time. The "fast" technique is about 3-10 times slower than running without profiling. The Netbeans profiler doesn't use this optimization, and its memory profiler when capturing every allocation, as I did, is 2-3 ORDERS OF MAGNITUDE slower than normal (non-server) execution.