Changing dwarf2reader::LineInfo interface to report end-of-sequence address correctly

1 view
Skip to first unread message

Jim Blandy

unread,
Jul 21, 2009, 3:53:21 PM7/21/09
to Neal Sidhwaney, Daniel Berlin, google-br...@googlegroups.com
The interface dwarf2reader::LineInfo offers to DWARF line number
information loses some reasonably important information present in the
DWARF; I explain in detail below. Fixing it while keeping the
interface straightforward requires making a change to the interface,
and will affect LineInfo's clients. It's not a big change. I'm
willing to do this work for everything in the breakpad tree; I can
test all the clients.

However, Danny mentioned that the DWARF reader in breakpad has been
copied elsewhere and improved. I would rather not introduce
divergence if it would prevent reconciliation.

Danny, could we get a copy of the latest sources for your DWARF
reader? And, can we put together a plan for keeping the two versions
in sync? If there's not very much work going on this area, I guess we
could just swap patches around, but in any case we should get started.

Here are the technical details:

DWARF line number programs must end with a DW_LNE_end_sequence
instruction, which provides the "one-past" ending address of the
previous line's machine code. However, line number programs may also
contain DW_LNE_end_sequence in the middle of the program, to provide
line number info for discontiguous ranges of machine code. For
example:

The Directory Table:
/build/buildd/glibc-2.9/build-tree/i386-libc/csu
../sysdeps/generic

The File Name Table:
Entry Dir Time Size Name
1 1 0 0 crti.S
2 2 0 0 initfini.c

Line Number Statements:
Extended opcode 2: set Address to 0xd6d4
Advance Line by 14 to 15
Copy
Special opcode 20: advance Address by 1 to 0xd6d5 and Line by 1 to 16
Special opcode 34: advance Address by 2 to 0xd6d7 and Line by 1 to 17
Special opcode 20: advance Address by 1 to 0xd6d8 and Line by 1 to 18
Special opcode 48: advance Address by 3 to 0xd6db and Line by 1 to 19
Special opcode 77: advance Address by 5 to 0xd6e0 and Line by 2 to 21
Special opcode 20: advance Address by 1 to 0xd6e1 and Line by 1 to 22
Special opcode 90: advance Address by 6 to 0xd6e7 and Line by 1 to 23
Special opcode 90: advance Address by 6 to 0xd6ed and Line by 1 to 24
Special opcode 34: advance Address by 2 to 0xd6ef and Line by 1 to 25
Special opcode 34: advance Address by 2 to 0xd6f1 and Line by 1 to 26
Advance PC by 5 to 0xd6f6
Extended opcode 1: End of Sequence

Set File Name to entry 2 in the File Name Table
Extended opcode 2: set Address to 0x11ca78
Advance Line by 108 to 109
Copy
Special opcode 20: advance Address by 1 to 0x11ca79 and Line by 1 to 110
Special opcode 34: advance Address by 2 to 0x11ca7b and Line by 1 to 111
Special opcode 20: advance Address by 1 to 0x11ca7c and Line by 1 to 112
Special opcode 48: advance Address by 3 to 0x11ca7f and Line by 1 to 113
Special opcode 77: advance Address by 5 to 0x11ca84 and Line by 2 to 115
Special opcode 20: advance Address by 1 to 0x11ca85 and Line by 1 to 116
Advance PC by 6 to 0x11ca8b
Extended opcode 1: End of Sequence

Note the two occurrences of "Extended opcode 1: End of Sequence". The
former serves to indicate that line crti.S:26 does not include the
bytes 0xd6f6 to 0x11ca78.

The dwarf2reader::LineHandler interface:
http://code.google.com/p/google-breakpad/source/browse/trunk/src/common/mac/dwarf/dwarf2reader.h#149

includes an AddLine member function, but no EndSequence member
function. The reader reports DW_LNE_end_sequence instructions with an
additional call to AddLine; in the above example, the client has no
way to know about the gap.

I'd like to add an EndSequence member to the LineInfoHandler class
that the reader calls to report DW_LNE_end_sequence instructions; that
change alone would be source-code backwards compatible. However, I
think it's confusing to report the end sequence address via AddLine,
as we do know; I think we should report DW_LNE_end_sequence via
EndSequence, but not via AddLine. This change is not
backwards-compatible.

I've attached a untested sketch patch to show what I have in mind.

dwarf-line-end.patch

Jim Blandy

unread,
Jul 24, 2009, 12:47:33 AM7/24/09
to Daniel Berlin, Neal Sidhwaney, google-br...@googlegroups.com
On Thu, Jul 23, 2009 at 6:12 PM, Daniel Berlin<dbe...@dberlin.org> wrote:
> I'm still not clear on *why* though.
> You are talking about gaps in the ranges.  But the line info handler
> shouldn't report lines or ranges that don't exist.
> If you tell me why you want to know about the gaps, it would help immensely.

I'd like to be able to accurately answer questions of the form, "What
source line, if any, covers this address?" It's wrong to answer
"crt.S:26" if I ask about address 0x10000, but a client of the current
interface has no way to know that.

Or do you mean, in what circumstances is it harmful to answer
"crt.S:26" instead of "no source for that address"?

Jim Blandy

unread,
Jul 24, 2009, 1:33:30 AM7/24/09
to Daniel Berlin, Neal Sidhwaney, google-br...@googlegroups.com

In the immediate situation, I need to produce breakpad symbol files,
whose source line records include both an address and a size. The
current dwarf2reader::LineInfo API provides no way to compute the size
of the last line, and produces incorrect sizes when there is a gap in
the line info.

Jim Blandy

unread,
Jul 24, 2009, 5:48:27 PM7/24/09
to Daniel Berlin, Neal Sidhwaney, google-br...@googlegroups.com
On Fri, Jul 24, 2009 at 11:53 AM, Daniel Berlin<dbe...@dberlin.org> wrote:
> It has a mapping that says what lines go with what addresses.
> If there is nothing in the mapping, why would it report the closest,
> instead of "i don't know"?
> That's the part i don't understand.

The DWARF data I quoted earlier, with the current code, produces the
following sequence of calls to handler_->AddLine:

AddLine(0xd6d4, 1, 15, 0)
AddLine(0xd6d5, 1, 16, 0)
AddLine(0xd6d7, 1, 17, 0)
AddLine(0xd6d8, 1, 18, 0)
AddLine(0xd6db, 1, 19, 0)
AddLine(0xd6e0, 1, 21, 0)
AddLine(0xd6e1, 1, 22, 0)
AddLine(0xd6e7, 1, 23, 0)
AddLine(0xd6ed, 1, 24, 0)
AddLine(0xd6ef, 1, 25, 0)
AddLine(0xd6f1, 1, 26, 0)
AddLine(0xd6f6, 1, 26, 0)
AddLine(0x11ca78, 2, 109, 0)
AddLine(0x11ca79, 2, 110, 0)
AddLine(0x11ca7b, 2, 111, 0)
AddLine(0x11ca7c, 2, 112, 0)
AddLine(0x11ca7f, 2, 113, 0)
AddLine(0x11ca84, 2, 115, 0)
AddLine(0x11ca85, 2, 116, 0)
AddLine(0x11ca8b, 2, 116, 0)

If one constructs an address-to-source mapping from these calls, then
there's no indication that line 26 of file 1 doesn't cover the range
from 0xd6f1 to 0x11ca78.

The change I'm suggesting would give us the following sequence of calls:

AddLine(0xd6d4, 1, 15, 0)
AddLine(0xd6d5, 1, 16, 0)
AddLine(0xd6d7, 1, 17, 0)
AddLine(0xd6d8, 1, 18, 0)
AddLine(0xd6db, 1, 19, 0)
AddLine(0xd6e0, 1, 21, 0)
AddLine(0xd6e1, 1, 22, 0)
AddLine(0xd6e7, 1, 23, 0)
AddLine(0xd6ed, 1, 24, 0)
AddLine(0xd6ef, 1, 25, 0)
AddLine(0xd6f1, 1, 26, 0)
EndSequence(0xd6f6)
AddLine(0x11ca78, 2, 109, 0)
AddLine(0x11ca79, 2, 110, 0)
AddLine(0x11ca7b, 2, 111, 0)
AddLine(0x11ca7c, 2, 112, 0)
AddLine(0x11ca7f, 2, 113, 0)
AddLine(0x11ca84, 2, 115, 0)
AddLine(0x11ca85, 2, 116, 0)
EndSequence(0x11ca8b)

In this case it's clear the breakpad symbol file should have entries like:

...
d6f1 5 26 1
11ca78 1 109 2
...

That is, crti.S:26 contributes five bytes at 0xd6f1, and
initfini.c:109 contributes a byte at 0x11ca78, instead of the
following, which is what we would produce now:

...
d6f1 10f387 26 1
11ca78 1 109 2
...

Jim Blandy

unread,
Jul 24, 2009, 5:54:28 PM7/24/09
to Daniel Berlin, Neal Sidhwaney, google-br...@googlegroups.com
Maybe the source of confusion is this: LineHandler::AddLine reports
the address of the line, but not its size. If AddLine provided the
size, that would also be a workable interface, although it would
complicate the line info parser a bit. I personally think it's smart
to keep the DWARF parsing classes pretty direct, and leave it to
handlers to massage what they get into whatever form they find useful.

Jim Blandy

unread,
Jul 27, 2009, 1:51:21 PM7/27/09
to Daniel Berlin, Neal Sidhwaney, google-br...@googlegroups.com
On Mon, Jul 27, 2009 at 10:34 AM, Daniel Berlin<dbe...@dberlin.org> wrote:
> I'd rather see *the handler* have a simple AddLine method like it does
> now (with an added size) since that is what most consumers we have
> found, care about. They rarely, if ever, care about EndSequence.
> If you want a lower level interface, we should place an interface
> between what the current *handler* does and the parser.
> I can't imagine trying to explain to people who just want line numbers
> exactly what the heck dwarf means by a sequence ;)

Changing AddLine to supply a size as well as an address is fine with
me. I'm catching up on some more directly Mozilla-related stuff at
the moment (partly to give Neal a chance to review the patches), but
that's certainly something I can do when I get back to breakpad.

Jim Blandy

unread,
Jul 27, 2009, 1:53:37 PM7/27/09
to Daniel Berlin, Neal Sidhwaney, google-br...@googlegroups.com
On Mon, Jul 27, 2009 at 10:36 AM, Daniel Berlin<dbe...@dberlin.org> wrote:
> The current handler is intended to be higher level and just provide
> info for people who want to get line numbers out of DWARF.
> If we want an interface that is closer to the metal (like the one you
> see in the reader code you have, but is actually not the highest level
> interface we provide to this code), we should put it between the high
> level interface and the parser, like we did for dwarf2reader

Are you referring to code presently in breakpad, or code in your other
copy of the reader? In breakpad, I've found functioninfo.h; are there
other high-level interfaces I haven't seen yet?

Jim Blandy

unread,
Jul 27, 2009, 2:11:04 PM7/27/09
to Daniel Berlin, Neal Sidhwaney, google-br...@googlegroups.com
On Mon, Jul 27, 2009 at 10:56 AM, Daniel Berlin<dbe...@dberlin.org> wrote:
> Yup, there are a bunch.
> I'll try to open source them all ;)

Awesomeness. (And --- you should know better than to fork!)

Neal Sidhwaney

unread,
Jul 27, 2009, 2:22:55 PM7/27/09
to Daniel Berlin, Jim Blandy, google-br...@googlegroups.com
I didn't expect so many changes to code that read DWARF :-) Most of our other dependencies are in fact referenced as SVN externals but at that time the process for the 'right' way of open sourcing the DWARF code to it's own Google Code project was not clear to me!

Neal

On Mon, Jul 27, 2009 at 11:16 AM, Daniel Berlin <dbe...@dberlin.org> wrote:
It was the breakpad guys who forked ;)

Jim Blandy

unread,
Jul 27, 2009, 4:51:44 PM7/27/09
to google-br...@googlegroups.com, Daniel Berlin
On Mon, Jul 27, 2009 at 11:22 AM, Neal Sidhwaney<nea...@gmail.com> wrote:
> I didn't expect so many changes to code that read DWARF :-) Most of our
> other dependencies are in fact referenced as SVN externals but at that time
> the process for the 'right' way of open sourcing the DWARF code to it's own
> Google Code project was not clear to me!

What is the "right" way? An external? Something else?

The portion of my patch queue I haven't posted includes some minor
changes to the DWARF reader, so however it works, I hope to be
contributing there as well.

Reply all
Reply to author
Forward
0 new messages