Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Improving the performance of stacktrace generation
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  8 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Charles Oliver Nutter  
View profile  
 More options Jul 7 2012, 6:03 pm
From: Charles Oliver Nutter <head...@headius.com>
Date: Sat, 7 Jul 2012 17:03:34 -0500
Local: Sat, Jul 7 2012 6:03 pm
Subject: Improving the performance of stacktrace generation
Today I have a new conundrum for you all: I need stack trace
generation on Hotspot to be considerably faster than it is now.

In order to simulate many Ruby features, JRuby (over)uses Java stack
traces. We recently (JRuby 1.6, about a year ago) moved to using the
Java stack trace as the source of our Ruby backtrace information,
mining out compiled frames and using interpreter markers to peel off
interpreter frames. The result is that a Ruby trace with mixed
compiled and interpreted code like this
(https://gist.github.com/3068210) turns into this
(https://gist.github.com/3068213). I consider this a great deal better
than the plain Java trace, and I know other language implementers have
lamented the verbosity of stack traces coming out of their languages.

The unfortunate thing is that stack trace generation is very expensive
in the JVM, and in order to generate normal exceptions and emulate
other features we sometimes generate a lot of them. I think there's
value in exploring how we can make stack trace generation cheaper at
the JVM level.

Here's a few cases in Ruby where we need to use Java stack traces to
provide the same features:

* Exceptions as non-exceptional or moderately-exceptional method results

In this case I'm specifically thinking about Ruby's tendency to
propagate errno values as exceptions; EAGAIN/EWOULDBLOCK for example
are thrown from nonblocking IO methods when there's no data available.

You will probably say "that's a horrible use for exceptions" and I
agree. But there are a couple reasons why it's nicer too:
- using return value sigils requires you to propagate them back out
through many levels of calls
- exception-handling is cleaner in code than having all your errno
handling logic spliced into regular program flow

In any case, the cost of generating a stack trace for potentially
every non-blocking IO call is obviously too high. In JRuby, we default
to having EAGAIN/EWOULDBLOCK exceptions not generate a stack trace,
and you must pass a flag for them to do so. The justification is that
these exceptions are almost always used to branch back to the top of a
nonblocking IO loop, so the backtrace is useless.

* Getting the current or previous method's name/file/line

Ruby supports a number of features that allow you to get basic
information about the method currently being executed or the method
that called it. The most general of these features is the "caller"
method, which provides an array of all method name + file + line that
would appear in a stack trace at this point. This feature is often
abused to get only the current or previous frame, and so in Ruby 1.9
they added __method__ to get the currently-executing method's
name+file+line.

In both cases, we must generate a full Java trace for these methods
because the name of a method body is not necessarily statically known.
We often want only the current frame or the current and previous
frames, but we pay the cost of generating an entire Java stack trace
to get them.

* Warnings that actually report the line of code that triggered them

In Ruby, it is possible to generate non-fatal warnings to stderr. In
many cases, these warnings automatically include the file and line at
which the triggering code lives. Because the warning logic is
downstream from the Ruby code, we again must use a full Java stack
trace to find the most recent (on stack) Ruby frame. This causes
warnings to be as expensive as regular exceptions.

Because the use of frame introspection (in this case through stack
traces) has largely been ignored on current JVMs, I suspect there's a
lot of improvement possible. At a minimum, the ability to only grab
the top N frames from the stack trace could be a great improvement
(Hotspot even has flags to restrict how large a trace it will
generate, presumably to avoid the cost of accounting for deep stacks
and generating traces from them).

Any thoughts on this? Does anyone else have need for lighter-weight
name/file/line inspection of the call stack?

- Charlie


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Rémi Forax  
View profile   Translate to Translated (View Original)
 More options Jul 7 2012, 6:31 pm
From: Rémi Forax <fo...@univ-mlv.fr>
Date: Sun, 08 Jul 2012 00:31:12 +0200
Local: Sat, Jul 7 2012 6:31 pm
Subject: Re: Improving the performance of stacktrace generation
On 07/08/2012 12:03 AM, Charles Oliver Nutter wrote:

I don't see how to do more.

You can use Throwable.getStackTraceElement()
which is package visible and OpenJDK specific but at least
it will be faster for all VMs that uses OpenJDK.

> * Warnings that actually report the line of code that triggered them

> In Ruby, it is possible to generate non-fatal warnings to stderr. In
> many cases, these warnings automatically include the file and line at
> which the triggering code lives. Because the warning logic is
> downstream from the Ruby code, we again must use a full Java stack
> trace to find the most recent (on stack) Ruby frame. This causes
> warnings to be as expensive as regular exceptions.

Please never optimize warnings, they are here to bug users
until they fix the thing. So they should be slow :)

> Because the use of frame introspection (in this case through stack
> traces) has largely been ignored on current JVMs, I suspect there's a
> lot of improvement possible. At a minimum, the ability to only grab
> the top N frames from the stack trace could be a great improvement
> (Hotspot even has flags to restrict how large a trace it will
> generate, presumably to avoid the cost of accounting for deep stacks
> and generating traces from them).

> Any thoughts on this? Does anyone else have need for lighter-weight
> name/file/line inspection of the call stack?

> - Charlie

R�mi

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Alex Cruise  
View profile   Translate to Translated (View Original)
 More options Jul 7 2012, 6:35 pm
From: Alex Cruise <acru...@gmail.com>
Date: Sat, 7 Jul 2012 15:35:05 -0700
Local: Sat, Jul 7 2012 6:35 pm
Subject: Re: [jvm-l] Improving the performance of stacktrace generation

Forgive me if this is a naive question, but as regards the
caller/current-method question, since you have the complete AST in front of
you, why not just synthesize extra arguments to any method that happens to
use the caller or __method__ magic?

If it's at most the immediate caller that the callee is interested in,
well, that is a service the compiler can provide, no stack trace
skullduggery required.

And, of course, for errno, there is a middle ground between exceptions and
magical in-band return values--in Scala-land we call it Either. When you're
a whole-program compiler (or are allowed to pretend you are one until a
monkey comes along) you can feel free to rewrite everyone's returns for
your own optimization. Whether that turns out to be a win in the long run
is, of course, a different question. :)

Alas, for want of an effect system... :)

-0xe1a (sorry for top-posting, on my phone)
On Jul 7, 2012 3:04 PM, "Charles Oliver Nutter" <head...@headius.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Charles Oliver Nutter  
View profile  
 More options Jul 7 2012, 6:50 pm
From: Charles Oliver Nutter <head...@headius.com>
Date: Sat, 7 Jul 2012 17:50:35 -0500
Local: Sat, Jul 7 2012 6:50 pm
Subject: Re: Improving the performance of stacktrace generation

On Saturday, July 7, 2012, Rémi Forax wrote:
> You can use Throwable.**getStackTraceElement()
> which is package visible and OpenJDK specific but at least
> it will be faster for all VMs that uses OpenJDK.

I'll certainly explore that to see if it improves the situation. If it's
faster, it might be a way out in some circumstances.

It seems like an official public API is needed here...

Please never optimize warnings, they are here to bug users

> until they fix the thing. So they should be slow :)

Yes, except when *everyone else* has faster warnings than you. Then you're
just giving people another reason to not use your stuff.

Granted, JRuby is now becoming far and away the fastest Ruby implementation
for exactly the reasons that stack traces are slow, but often making 90% of
the code 5x faster but 10% of the code 100x slower means people give up on
you without bothering.

Remember, I agree with you...but I also feel like this attitude has allowed
stack trace generation to avoid optimization effort for a long time, and
there *are* useful things you can do with an introspectable call stack.

- Charlie


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Charles Oliver Nutter  
View profile  
 More options Jul 7 2012, 6:58 pm
From: Charles Oliver Nutter <head...@headius.com>
Date: Sat, 7 Jul 2012 17:58:00 -0500
Subject: Re: Improving the performance of stacktrace generation

On Saturday, July 7, 2012, Alex Cruise wrote:
> Forgive me if this is a naive question, but as regards the
> caller/current-method question, since you have the complete AST in front of
> you, why not just synthesize extra arguments to any method that happens to
> use the caller or __method__ magic?

Again, one of JRuby's "features" gets in the way of a simple approach; we
have an interpreter too.

The dance goes sort of like this:

* JRuby is mixed-mode, and has an interpreted phase before code becomes JVM
bytecode
* Generating back traces (or otherwise inspecting the call stack) requires
that we splice together the "native" and interpreter traces
* As a result, any stack inspection requires a full Java stack trace plus
the splicing logic (and the stack trace is by far the most expensive part
of this)

It may be possible for us to pass it from the compiler and if it's not
present pull off interpreter backtrace frames, but it's obviously more
complicated. My ideal situation would be asking the JVM for the top N
frames and reducing the cost of stack inspection by M/N where M is the
normal full stack trace it generates for us.

>  If it's at most the immediate caller that the callee is interested in,
> well, that is a service the compiler can provide, no stack trace
> skullduggery required.

> And, of course, for errno, there is a middle ground between exceptions and
> magical in-band return values--in Scala-land we call it Either. When you're
> a whole-program compiler (or are allowed to pretend you are one until a
> monkey comes along) you can feel free to rewrite everyone's returns for
> your own optimization. Whether that turns out to be a win in the long run
> is, of course, a different question. :)

> Alas, for want of an effect system... :)

The idea of having a separate out-of-band mechanism isn't bad, but
unfortunately people have written a lot of Ruby code around the idea that
errnos are exceptions. So either we'd need to check return values for the
exceptional path and raise it ourselves, or they'd have to change all that
code.

For what it's worth, people *are* starting to realize (with great effort on
our part) that anything that triggers an exception or inspects the stack is
probably way more expensive than its worth. As JRuby increases in
performance over the other implementations, the benefits of avoiding stack
inspection should start to become apparent.

But in the short term...I'd like to understand what makes stack trace
generation in the JVM slow and help explore what we can do to improve it.

- Charlie


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Rémi Forax  
View profile  
 More options Jul 7 2012, 7:57 pm
From: Rémi Forax <fo...@univ-mlv.fr>
Date: Sun, 08 Jul 2012 01:57:45 +0200
Local: Sat, Jul 7 2012 7:57 pm
Subject: Re: Improving the performance of stacktrace generation
On 07/08/2012 12:50 AM, Charles Oliver Nutter wrote:

Throwable.getStackTraceElement() let you walk the stack trace without
generating the whole stack trace,
in my opinion it's enough.

BTW, I also think you are responsible too for the cost of generating
stacktraces because the cost depends on the size of the stacktrace
and an AST interpreter is a big consumer of stack frames. I am guilty
too, PHP.reboot has the same issue and
I try to mitigate that by avoiding to have an helper methods in the
stacktrace of calls (not that easy without tailcall).

> - Charlie

Rémi

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jason Zaugg  
View profile  
 More options Jul 8 2012, 5:40 am
From: Jason Zaugg <jza...@gmail.com>
Date: Sun, 8 Jul 2012 02:40:21 -0700 (PDT)
Local: Sun, Jul 8 2012 5:40 am
Subject: Re: Improving the performance of stacktrace generation

That's also the approach used in Scala for non local returns from closures
and breaks [1] [2].

-jason

[1]
https://github.com/scala/scala/blob/master/src/library/scala/util/con...
[2]
https://github.com/scala/scala/blob/master/src/library/scala/util/con...


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Bob Foster  
View profile   Translate to Translated (View Original)
 More options Jul 7 2012, 7:32 pm
From: Bob Foster <bobfos...@gmail.com>
Date: Sat, 7 Jul 2012 16:32:38 -0700
Local: Sat, Jul 7 2012 7:32 pm
Subject: Re: [jvm-l] Improving the performance of stacktrace generation

> Any thoughts on this? Does anyone else have need for

lighter-weight name/file/line inspection of the call stack?

Well, yes. Profilers do.

Recall Cliff Click bragging a couple of years ago at the JVM Language
Summit about how fast stack trace generation is in Azul Systems' OSs...and
knocking Hotspot for being so slow. It turns out that stack trace
generation is a very significant overhead in profiling Hotspot using JVMTI.
Even CPU sampling on 20 ms. intervals can add 3% or more to execution time,
almost entirely due to the delay in reaching a safe point (which also
guarantees the profile will be incorrect) and generating a stack trace for
each thread.

But 3% is peanuts compared to the cost of memory profiling, which can
require a stack trace on every new instance creation. In a profiler I wrote
using JVMTI, I discovered that it was faster to call into JNI code on every
method entry and exit (and exception catch), keeping a stack trace
dynamically than to call into JNI only when memory was allocated and
request a stack trace each time. The "fast" technique is about 3-10 times
slower than running without profiling. The Netbeans profiler doesn't use
this optimization, and its memory profiler when capturing every allocation,
as I did, is 2-3 ORDERS OF MAGNITUDE slower than normal (non-server)
execution.

Faster stack traces would benefit the entire Hotspot profiling community.

Bob

On Sat, Jul 7, 2012 at 3:03 PM, Charles Oliver Nutter
<head...@headius.com>wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »