However, Danny mentioned that the DWARF reader in breakpad has been
copied elsewhere and improved. I would rather not introduce
divergence if it would prevent reconciliation.
Danny, could we get a copy of the latest sources for your DWARF
reader? And, can we put together a plan for keeping the two versions
in sync? If there's not very much work going on this area, I guess we
could just swap patches around, but in any case we should get started.
Here are the technical details:
DWARF line number programs must end with a DW_LNE_end_sequence
instruction, which provides the "one-past" ending address of the
previous line's machine code. However, line number programs may also
contain DW_LNE_end_sequence in the middle of the program, to provide
line number info for discontiguous ranges of machine code. For
example:
The Directory Table:
/build/buildd/glibc-2.9/build-tree/i386-libc/csu
../sysdeps/generic
The File Name Table:
Entry Dir Time Size Name
1 1 0 0 crti.S
2 2 0 0 initfini.c
Line Number Statements:
Extended opcode 2: set Address to 0xd6d4
Advance Line by 14 to 15
Copy
Special opcode 20: advance Address by 1 to 0xd6d5 and Line by 1 to 16
Special opcode 34: advance Address by 2 to 0xd6d7 and Line by 1 to 17
Special opcode 20: advance Address by 1 to 0xd6d8 and Line by 1 to 18
Special opcode 48: advance Address by 3 to 0xd6db and Line by 1 to 19
Special opcode 77: advance Address by 5 to 0xd6e0 and Line by 2 to 21
Special opcode 20: advance Address by 1 to 0xd6e1 and Line by 1 to 22
Special opcode 90: advance Address by 6 to 0xd6e7 and Line by 1 to 23
Special opcode 90: advance Address by 6 to 0xd6ed and Line by 1 to 24
Special opcode 34: advance Address by 2 to 0xd6ef and Line by 1 to 25
Special opcode 34: advance Address by 2 to 0xd6f1 and Line by 1 to 26
Advance PC by 5 to 0xd6f6
Extended opcode 1: End of Sequence
Set File Name to entry 2 in the File Name Table
Extended opcode 2: set Address to 0x11ca78
Advance Line by 108 to 109
Copy
Special opcode 20: advance Address by 1 to 0x11ca79 and Line by 1 to 110
Special opcode 34: advance Address by 2 to 0x11ca7b and Line by 1 to 111
Special opcode 20: advance Address by 1 to 0x11ca7c and Line by 1 to 112
Special opcode 48: advance Address by 3 to 0x11ca7f and Line by 1 to 113
Special opcode 77: advance Address by 5 to 0x11ca84 and Line by 2 to 115
Special opcode 20: advance Address by 1 to 0x11ca85 and Line by 1 to 116
Advance PC by 6 to 0x11ca8b
Extended opcode 1: End of Sequence
Note the two occurrences of "Extended opcode 1: End of Sequence". The
former serves to indicate that line crti.S:26 does not include the
bytes 0xd6f6 to 0x11ca78.
The dwarf2reader::LineHandler interface:
http://code.google.com/p/google-breakpad/source/browse/trunk/src/common/mac/dwarf/dwarf2reader.h#149
includes an AddLine member function, but no EndSequence member
function. The reader reports DW_LNE_end_sequence instructions with an
additional call to AddLine; in the above example, the client has no
way to know about the gap.
I'd like to add an EndSequence member to the LineInfoHandler class
that the reader calls to report DW_LNE_end_sequence instructions; that
change alone would be source-code backwards compatible. However, I
think it's confusing to report the end sequence address via AddLine,
as we do know; I think we should report DW_LNE_end_sequence via
EndSequence, but not via AddLine. This change is not
backwards-compatible.
I've attached a untested sketch patch to show what I have in mind.
I'd like to be able to accurately answer questions of the form, "What
source line, if any, covers this address?" It's wrong to answer
"crt.S:26" if I ask about address 0x10000, but a client of the current
interface has no way to know that.
Or do you mean, in what circumstances is it harmful to answer
"crt.S:26" instead of "no source for that address"?
In the immediate situation, I need to produce breakpad symbol files,
whose source line records include both an address and a size. The
current dwarf2reader::LineInfo API provides no way to compute the size
of the last line, and produces incorrect sizes when there is a gap in
the line info.
The DWARF data I quoted earlier, with the current code, produces the
following sequence of calls to handler_->AddLine:
AddLine(0xd6d4, 1, 15, 0)
AddLine(0xd6d5, 1, 16, 0)
AddLine(0xd6d7, 1, 17, 0)
AddLine(0xd6d8, 1, 18, 0)
AddLine(0xd6db, 1, 19, 0)
AddLine(0xd6e0, 1, 21, 0)
AddLine(0xd6e1, 1, 22, 0)
AddLine(0xd6e7, 1, 23, 0)
AddLine(0xd6ed, 1, 24, 0)
AddLine(0xd6ef, 1, 25, 0)
AddLine(0xd6f1, 1, 26, 0)
AddLine(0xd6f6, 1, 26, 0)
AddLine(0x11ca78, 2, 109, 0)
AddLine(0x11ca79, 2, 110, 0)
AddLine(0x11ca7b, 2, 111, 0)
AddLine(0x11ca7c, 2, 112, 0)
AddLine(0x11ca7f, 2, 113, 0)
AddLine(0x11ca84, 2, 115, 0)
AddLine(0x11ca85, 2, 116, 0)
AddLine(0x11ca8b, 2, 116, 0)
If one constructs an address-to-source mapping from these calls, then
there's no indication that line 26 of file 1 doesn't cover the range
from 0xd6f1 to 0x11ca78.
The change I'm suggesting would give us the following sequence of calls:
AddLine(0xd6d4, 1, 15, 0)
AddLine(0xd6d5, 1, 16, 0)
AddLine(0xd6d7, 1, 17, 0)
AddLine(0xd6d8, 1, 18, 0)
AddLine(0xd6db, 1, 19, 0)
AddLine(0xd6e0, 1, 21, 0)
AddLine(0xd6e1, 1, 22, 0)
AddLine(0xd6e7, 1, 23, 0)
AddLine(0xd6ed, 1, 24, 0)
AddLine(0xd6ef, 1, 25, 0)
AddLine(0xd6f1, 1, 26, 0)
EndSequence(0xd6f6)
AddLine(0x11ca78, 2, 109, 0)
AddLine(0x11ca79, 2, 110, 0)
AddLine(0x11ca7b, 2, 111, 0)
AddLine(0x11ca7c, 2, 112, 0)
AddLine(0x11ca7f, 2, 113, 0)
AddLine(0x11ca84, 2, 115, 0)
AddLine(0x11ca85, 2, 116, 0)
EndSequence(0x11ca8b)
In this case it's clear the breakpad symbol file should have entries like:
...
d6f1 5 26 1
11ca78 1 109 2
...
That is, crti.S:26 contributes five bytes at 0xd6f1, and
initfini.c:109 contributes a byte at 0x11ca78, instead of the
following, which is what we would produce now:
...
d6f1 10f387 26 1
11ca78 1 109 2
...
Changing AddLine to supply a size as well as an address is fine with
me. I'm catching up on some more directly Mozilla-related stuff at
the moment (partly to give Neal a chance to review the patches), but
that's certainly something I can do when I get back to breakpad.
Are you referring to code presently in breakpad, or code in your other
copy of the reader? In breakpad, I've found functioninfo.h; are there
other high-level interfaces I haven't seen yet?
Awesomeness. (And --- you should know better than to fork!)
It was the breakpad guys who forked ;)
What is the "right" way? An external? Something else?
The portion of my patch queue I haven't posted includes some minor
changes to the DWARF reader, so however it works, I hope to be
contributing there as well.