RFC: static line number information

Juergen Boemmels

unread,

Oct 7, 2002, 2:27:29 PM10/7/02

to perl6-i...@perl.org

From TODO:
Metadata (source line number info, symbol table)

Currently parrot the line number information in parrot is done via
special opcodes, namely setline/getline and setfile/getfile. This is a
good solution when you write an interpreter in parrot, and the line
number information is only known at runtime. But this approach is very
inefficient if you have a tight loop like this:

$i = 0;
while ($i < 1000) {
$i++;
}

With linenumber information enabled this would translate to something
like this

setline 1
set I0,0
LOOP: setline 2
lt I0, 1000, DONE
setline 3
add I0,1
branch LOOP
DONE: setline 5

This is inefficient, because there are two setlines in the loops.

A possible solution to this problem is doing it the same way the
c-compiler does: Add an extra structure to the executable which can
translate the current program counter to the source line. The advantage
of this approach is that the linenumber is only decoded when its
needed, and only the application which uses the line number
information has a runtime cost; the disadvantage is that the line
number information must be known at compile time (which I think is the
common case).

This can be implemented in 2 ways:
- Create our own debugging format
- Use an already existing one
The first way might be more fun, but I think the second one would be
better. IMHO we should use DWARF-2. The Mono Project does something
similar.

To get this working 3 things must happen.

1.) Extending of the packfile format to contain a section with
debugging information.

Changing the packfile is not an easy task, because many parts of
parrot depend on it. The ones I remember are packfile.c assemble.pl
and somewhere in imcc.

In principle the packfile is extendible in a backward compatible
way. At the moment there are (according to parrotbyte.pod) 3 segments
(FIXUP, CONSTANT, BYTECODE) in exactly that order. This can be easily
extended by just adding a 4th one DEBUG_LINE (or .debug_line or
..stabs). But doing some more extensions (e.g. call frames, language
dependent sections) by allocating numbers in a linear chain will be
painful.

Another extension scheme would be make the 4th section a
directory section, in which all packfile extend-sections can be looked
up by name. This is still a backward-compatible solution.

But why use the 4th section as directory section. Naturally it would
be the first one. Since FIXUP is not used at the moment, this is not
such a drastic change as it first sounds.

2.) The assembler must emit the debugging information.

Emitting the debugging information from pure assembly code is not
really complicated, because the address and linenumber are always
increasing, the address increment is defined only by the current line
and the basic blocks can be easily analyzed.

But there must also be a way the higher level languages can assign
line numbers. Maybe C-like
#line 1 "foo.c"
directives are a solution.
or create dedicated assembler macros
..line
..file
(maybe) .column

3.) The debugger must read this information.

I have some ugly little code lying around reading the line number
information out of an ELF binary. I can fix this up and integrate it,
but not doing the last step first.

Bonus point.) Teach the JIT-engine to translate the line number
information, so that you can debug a JITed program with gdb.

Comments?
b.
--
Juergen Boemmels boem...@physik.uni-kl.de
Fachbereich Physik Tel: ++49-(0)631-205-2817
Universitaet Kaiserslautern Fax: ++49-(0)631-205-3906
PGP Key fingerprint = 9F 56 54 3D 45 C1 32 6F 23 F6 C7 2F 85 93 DD 47

Brent Dax

unread,

Oct 7, 2002, 3:01:29 PM10/7/02

to boem...@physik.uni-kl.de, perl6-i...@perl.org

boem...@physik.uni-kl.de:
# $i = 0;
# while ($i < 1000) {
# $i++;
# }
#
# With linenumber information enabled this would translate to
# something like this
#
# setline 1
# set I0,0
# LOOP: setline 2
# lt I0, 1000, DONE
# setline 3
# add I0,1
# branch LOOP
# DONE: setline 5
#
# This is inefficient, because there are two setlines in the loops.

That example is quite oversimplified. A more correct one would be
something like:

setfile "foo.pl"
setline 1
enter
new P0, .PerlInt
assign P0, 0
LOOP:
setline 2
lt P0, 1000, DONE
enter
setline 3
add P0, 1
leave

branch LOOP
DONE:
setline 5

leave

When you include all the scoping operations and change the I to a P
(which is more realistic), it's not as big a deal, especially since a
vtable dispatch is much more expensive than an assignment to
interpreter->current_line or whatever. Still, I can see your point.

# This can be implemented in 2 ways:
# - Create our own debugging format
# - Use an already existing one
# The first way might be more fun, but I think the second one
# would be better. IMHO we should use DWARF-2. The Mono Project
# does something similar.

Can you justify these? Parrot may want to support file names in
different character sets, which I doubt much of anything else handles
correctly. And if we choose to use an existing format, why DWARF-2?

--Brent Dax <bren...@cpan.org>
@roles=map {"Parrot $_"} qw(embedding regexen Configure)

Wire telegraph is a kind of a very, very long cat. You pull his tail in
New York and his head is meowing in Los Angeles. And radio operates
exactly the same way. The only difference is that there is no cat.
--Albert Einstein (explaining radio)

Dan Sugalski

unread,

Oct 7, 2002, 2:52:05 PM10/7/02

to Juergen Boemmels, perl6-i...@perl.org

At 8:27 PM +0200 10/7/02, Juergen Boemmels wrote:
> >From TODO:
> Metadata (source line number info, symbol table)
>
>Currently parrot the line number information in parrot is done via
>special opcodes, namely setline/getline and setfile/getfile. This is a
>good solution when you write an interpreter in parrot, and the line
>number information is only known at runtime. But this approach is very
>inefficient if you have a tight loop like this:

Right. We're moving the line number information out of band, into a
separate section of the bytecode. I've specs for it--I'd promise when
but I've not dug through the backlog of unmet promises from the last
perl 6 summary. :)
--
Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

Nicholas Clark

unread,

Oct 7, 2002, 4:04:36 PM10/7/02

to Juergen Boemmels, perl6-i...@perl.org

On Mon, Oct 07, 2002 at 08:27:29PM +0200, Juergen Boemmels wrote:
> But there must also be a way the higher level languages can assign
> line numbers. Maybe C-like
> #line 1 "foo.c"
> directives are a solution.
> or create dedicated assembler macros
> ..line
> ..file
> (maybe) .column

ooh. nice. That's built in full debugging support for befunge, isn't it?

Nicholas Clark
--
Even better than the real thing: http://nms-cgi.sourceforge.net/

Leopold Toetsch

unread,

Oct 7, 2002, 4:27:07 PM10/7/02

to Juergen Boemmels, perl6-i...@perl.org

Juergen Boemmels wrote:

>>From TODO:
> Metadata (source line number info, symbol table)
>
> Currently parrot the line number information in parrot is done via
> special opcodes, namely setline/getline and setfile/getfile. This is a
> good solution when you write an interpreter in parrot, and the line
> number information is only known at runtime. But this approach is very

> inefficient if you have a tight loop ...

Yep you are right. Actually one of my first patches WRT imcc was adding
a »-g« option to perl6 and passing on this setline/setfile info through
imcc to PASM.

But, the biggest problem I encountered was, to get this line numbers out
of the parser. May be, one more familar with Parse::RecDescent find's
a solution to get line numbers for blocks, which are finished parsing
not before the closing bracket. Normal statements are no problem.

> Changing the packfile is not an easy task, because many parts of
> parrot depend on it. The ones I remember are packfile.c assemble.pl
> and somewhere in imcc.

When there is a final way, how to do it, I'll adjust packout.c, which is
the file imcc is using for writing PBC.

> ... directory section. Naturally it would

> be the first one. Since FIXUP is not used at the moment, this is not
> such a drastic change as it first sounds.

Yes, I would vote for an extensible, always backward compatible solution.

When changing packfile, please consider the proposals towards
"fingerprinting PBC files", please look at the thread with this subject.
So we would need just one change in PBC to get both features in.

> #line 1 "foo.c"

perl6 -g

imcc currently ignores these, because of block statements, but when this
is solved, I'll look for this old patch and update imcc.

leo

Juergen Boemmels

unread,

Oct 7, 2002, 5:40:26 PM10/7/02

to perl6-i...@perl.org

"Brent Dax" <bren...@cpan.org> writes:

[...]

> # This can be implemented in 2 ways:
> # - Create our own debugging format
> # - Use an already existing one
> # The first way might be more fun, but I think the second one
> # would be better. IMHO we should use DWARF-2. The Mono Project
> # does something similar.
>
> Can you justify these?

Creating our own one is reinventing the wheel. You should not do that
if you don't have a good reason for this. Until now I've no reason not
to use a standard debugging format.

> Parrot may want to support file names in different character sets,
> which I doubt much of anything else handles correctly.

Ok, this is a reason. But is this really a problem? Parrot may also
want to open files in diffrent character sets, but somehow it has to
pass the filename to the operation system. This call takes a plain
c-string. Only that this NUL-terminated string will be stored in
the debug section. You might say that will be depending on the default
character set of the machine creating the debugging, but we can store
this information and if neccessary transcode. The bytecodeformat
already transcodes big/little endian.

> And if we choose to use an existing format, why DWARF-2?

I don't know very much debugging formats. All I know about debugging
formats is from the gdb manual (and the dwarf-spec).
- stabs: One of the first debugging formats. But it feels like a hack
getting around some limits in the underlying file format.
- COFF: (quoting gdb-manual): "The basic COFF definition includes
debugging information. The level of support is minimal and
non-extensible, and is not often used."
- DWARF-1: The debug format of the ELF file format, designed to be
extendible and language independent, but superseeded by an
incompatible second version
- DWARF-2: The current version of this debugging format.
- DWARF-3: The next generation, but as far as I can see at least in
the line number information compatible to DWARF-2.

Mips and SOM I don't have looked at.

So the reasons for DWARF-2 were: Its standardized, documented,
language independent, and I've already used it once. Ok the last
reason is not valid for a design decision.

Bye

Sean O'Rourke

unread,

Oct 7, 2002, 6:28:23 PM10/7/02

to Nicholas Clark, Juergen Boemmels, perl6-i...@perl.org

On Mon, 7 Oct 2002, Nicholas Clark wrote:

> On Mon, Oct 07, 2002 at 08:27:29PM +0200, Juergen Boemmels wrote:
> > But there must also be a way the higher level languages can assign
> > line numbers. Maybe C-like
> > #line 1 "foo.c"
> > directives are a solution.
> > or create dedicated assembler macros
> > ..line
> > ..file
> > (maybe) .column
>
> ooh. nice. That's built in full debugging support for befunge, isn't it?

We should probably add a ".plane" to support Trefunge, as well. Or just
make "source position" a vector, to generalize to scripting languages of
any dimension.

/s