http://code.google.com/p/google-breakpad/wiki/SymbolFiles
This is a more detailed writeup of what Ted Mielczarek, Neal Sidhwaney
and I talked about when we met at Mozilla in August.
In DWARF, it seems to be customary to derive the caller's SP from the
CFA if the CFI doesn't describe it explicitly. So the CFA does play a
semi-visible role.
Another point is that the CFA's rule is kind of special, in that it
gets computed first, and then made available for the other rules to
use. Suppose we have frame A that has called frame B; we've got the
values the registers had at some instruction X in B; and we've looked
up that instruction's CFI rules, describing how to recover A's
registers' values given B's. Rules can refer to register values; when
they do, they're referring to B's values, not A's --- not the value
described some other rule for X. In other words, all register values
are computed "simultaneously". The CFA is the exception: we first
compute its value from B's registers, and then if X's other rules
refer to ".cfa", they get this new value.
The description ought to make all this clear; I'll fix that.
> As a step to simplify initial implementation, would it help to only support
> CFA & RA at the beginning?
The rules for the CFA and RA can refer to other registers. For
example, on architectures where the "call" instruction simply saves
the PC in some general-purpose register --- ARM, for example, saves
the return address in the "LR" register --- the rule for .ra might be
"$lr". Non-leaf functions must save LR in their own stack frame
before making other calls; one could imagine the compiler moving LR
into some other callee-saves register, in which case the rule for .ra
might cite any such register. I don't know how often that second case
happens in practice.
> Finally, how much of an increase in symbol file size do you think there
> might be? Right now Chrome is sitting right around 7.8-8.5 MB, and we're not
> sure about the processor performance/capabilities of handling something that
> might increase by 2x, for instance.
Well, it seems like the .debug_frame section is typically around 30%
of the size of the .text section it describes. The draft CFI format,
being textual, would be, I'm guessing, 5x-10x that size. What does
that add up to for you? I felt like I ought to follow along with the
existing Breakpad precedent of favoring legibility and simplicity over
compactness.
If this turns out to be a problem, we could simply encode DWARF CFI
data as blobs of hex and copy it directly in the Breakpad symbol file,
meaning that Breakpad CFI data would be almost (different headers)
exactly 2x the size of the .debug_frame section from which it was
derived, or 60% of the size of the .text section it describes.
Another alternative would be to supply the original ELF symbol files
to the processor, and use libunwind for the unwinding.
I was wondering about this, so I mocked something up.
The largest .so in Firefox recently is libxpcom_core.so. The overall
file size (including debug info) is 25MiB. The .debug_frame section
is 251kiB. If the expansion factor is 10x, the Breakpad CFI would be
2.5MiB. If the parsing is minimal --- just parse the address and
stuff data in a map, to be parsed in detail as needed (which is
appropriate, since the vast majority of CFI records will never be used
in a given stack walk) --- then it takes the simplest STL code --- a
std::map<address, string>, the usual istream arithmetic extractors to
parse the address, and a getline call for the rest of the data --- a
third of a second to parse that 2.5MiB of data.
I don't know what your constraints are; how does that sound?
(on a 2.4GHz MacBook Pro)
I don't know what your constraints are; how does that sound?