Breakpad CFI draft on wiki

Jim Blandy

unread,

Sep 22, 2009, 6:01:34 PM9/22/09

to google-br...@googlegroups.com

I've written up a draft description of how we could represent
DWARF-style call frame information in Breakpad symbol files. It's on
the wiki at:

http://code.google.com/p/google-breakpad/wiki/SymbolFiles

This is a more detailed writeup of what Ted Mielczarek, Neal Sidhwaney
and I talked about when we met at Mozilla in August.

Neal Sidhwaney

unread,

Sep 28, 2009, 7:05:54 PM9/28/09

to google-br...@googlegroups.com, brd...@gmail.com

Thanks, Jim, for doing that writeup. A couple of questions:

You wrote:

"Exactly where the CFA points in the frame --- at the return address? below it? At some fixed point within the frame? --- is a question of definition that depends on the architecture and ABI in use. But by definition, the CFA remains constant throughout the lifetime of the frame. It's up to architecture- specific code to know what significance to assign the CFA, if any."

How important is this if we have rules to calculate the RA? From a minidump-stackwalking perspective, what we're most interested in is how to retrieve the calling address given any frame, correct?

As a step to simplify initial implementation, would it help to only support CFA & RA at the beginning?

Finally, how much of an increase in symbol file size do you think there might be? Right now Chrome is sitting right around 7.8-8.5 MB, and we're not sure about the processor performance/capabilities of handling something that might increase by 2x, for instance.

Thanks again for your help with this.

Neal

Jim Blandy

unread,

Sep 28, 2009, 7:50:20 PM9/28/09

to google-br...@googlegroups.com, brd...@gmail.com

On Mon, Sep 28, 2009 at 4:05 PM, Neal Sidhwaney <nea...@gmail.com> wrote:
> Thanks, Jim, for doing that writeup. A couple of questions:
> You wrote:
> "Exactly where the CFA points in the frame --- at the return address? below
> it? At some fixed point within the frame? --- is a question of definition
> that depends on the architecture and ABI in use. But by definition, the CFA
> remains constant throughout the lifetime of the frame. It's up to
> architecture- specific code to know what significance to assign the CFA, if
> any."
>
> How important is this if we have rules to calculate the RA? From a
> minidump-stackwalking perspective, what we're most interested in is how to
> retrieve the calling address given any frame, correct?

In DWARF, it seems to be customary to derive the caller's SP from the
CFA if the CFI doesn't describe it explicitly. So the CFA does play a
semi-visible role.

Another point is that the CFA's rule is kind of special, in that it
gets computed first, and then made available for the other rules to
use. Suppose we have frame A that has called frame B; we've got the
values the registers had at some instruction X in B; and we've looked
up that instruction's CFI rules, describing how to recover A's
registers' values given B's. Rules can refer to register values; when
they do, they're referring to B's values, not A's --- not the value
described some other rule for X. In other words, all register values
are computed "simultaneously". The CFA is the exception: we first
compute its value from B's registers, and then if X's other rules
refer to ".cfa", they get this new value.

The description ought to make all this clear; I'll fix that.

> As a step to simplify initial implementation, would it help to only support
> CFA & RA at the beginning?

The rules for the CFA and RA can refer to other registers. For
example, on architectures where the "call" instruction simply saves
the PC in some general-purpose register --- ARM, for example, saves
the return address in the "LR" register --- the rule for .ra might be
"$lr". Non-leaf functions must save LR in their own stack frame
before making other calls; one could imagine the compiler moving LR
into some other callee-saves register, in which case the rule for .ra
might cite any such register. I don't know how often that second case
happens in practice.

> Finally, how much of an increase in symbol file size do you think there
> might be? Right now Chrome is sitting right around 7.8-8.5 MB, and we're not
> sure about the processor performance/capabilities of handling something that
> might increase by 2x, for instance.

Well, it seems like the .debug_frame section is typically around 30%
of the size of the .text section it describes. The draft CFI format,
being textual, would be, I'm guessing, 5x-10x that size. What does
that add up to for you? I felt like I ought to follow along with the
existing Breakpad precedent of favoring legibility and simplicity over
compactness.

If this turns out to be a problem, we could simply encode DWARF CFI
data as blobs of hex and copy it directly in the Breakpad symbol file,
meaning that Breakpad CFI data would be almost (different headers)
exactly 2x the size of the .debug_frame section from which it was
derived, or 60% of the size of the .text section it describes.

Another alternative would be to supply the original ELF symbol files
to the processor, and use libunwind for the unwinding.

Jim Blandy

unread,

Sep 28, 2009, 8:26:06 PM9/28/09

to google-br...@googlegroups.com, brd...@gmail.com

On Mon, Sep 28, 2009 at 4:05 PM, Neal Sidhwaney <nea...@gmail.com> wrote:

> Finally, how much of an increase in symbol file size do you think there
> might be? Right now Chrome is sitting right around 7.8-8.5 MB, and we're not
> sure about the processor performance/capabilities of handling something that
> might increase by 2x, for instance.

I was wondering about this, so I mocked something up.

The largest .so in Firefox recently is libxpcom_core.so. The overall
file size (including debug info) is 25MiB. The .debug_frame section
is 251kiB. If the expansion factor is 10x, the Breakpad CFI would be
2.5MiB. If the parsing is minimal --- just parse the address and
stuff data in a map, to be parsed in detail as needed (which is
appropriate, since the vast majority of CFI records will never be used
in a given stack walk) --- then it takes the simplest STL code --- a
std::map<address, string>, the usual istream arithmetic extractors to
parse the address, and a getline call for the rest of the data --- a
third of a second to parse that 2.5MiB of data.

I don't know what your constraints are; how does that sound?

Jim Blandy

unread,

Sep 28, 2009, 8:30:26 PM9/28/09

to google-br...@googlegroups.com, brd...@gmail.com

On Mon, Sep 28, 2009 at 5:26 PM, Jim Blandy <ji...@red-bean.com> wrote:
> a
> third of a second

(on a 2.4GHz MacBook Pro)

Neal Sidhwaney

unread,

Oct 14, 2009, 2:34:24 PM10/14/09

to google-br...@googlegroups.com

Inline, Jim, thanks (sorry for the delay, I've had to come up to speed on what our processing requirements are and how they're affected by symbol size, and I was out at a conference last week)

On Mon, Sep 28, 2009 at 5:26 PM, Jim Blandy <ji...@red-bean.com> wrote:

I don't know what your constraints are; how does that sound?

I think part of our concern is more on the impact on symbol size storage and how efficient the processor will be with the larger symbols. Since Soccoro uses a cache of the processed symbols I'd imagine it impacts you less? I'm currently discussing this with the crash2 team here and we'll see if maybe using the cache would negate this concern for us, as well.

Thanks,

Neal

Ted Mielczarek

unread,

Oct 15, 2009, 2:04:45 PM10/15/09

to google-breakpad-dev

On Oct 14, 2:34 pm, Neal Sidhwaney <neal...@gmail.com> wrote:
> I think part of our concern is more on the impact on symbol size storage and
> how efficient the processor will be with the larger symbols. Since Soccoro
> uses a cache of the processed symbols I'd imagine it impacts you less? I'm
> currently discussing this with the crash2 team here and we'll see if maybe
> using the cache would negate this concern for us, as well.

Note that we're not actually using my processor-symbol-cache work in
production yet. It probably bears more investigation.

-Ted

Reply all

Reply to author

Forward