Re: HLL Debug Segments

Leopold Toetsch

unread,

Nov 13, 2005, 7:48:35 PM11/13/05

to Jonathan Worthington, perl6-i...@perl.org, perl6-c...@perl.org

On Nov 14, 2005, at 0:02, Jonathan Worthington wrote:

> Hi,

> .hll_debug file "something.pl"
> .hll_debug line 1

Just

#line 123
#line 789 "file.foo"

looks simpler and well known to me - the latter is already parsed. But
actually making it work is more important for me.

> Either an integer or a string constant from the constants table

Storing debug-only things in the constant table could complicate a TODO
'pbc_strip(1)' utility but not a problem.
pbc_merge deals with such things already.

> Jonathan

leo

Joshua Hoblitt

unread,

Nov 13, 2005, 8:16:41 PM11/13/05

to Leopold Toetsch, Jonathan Worthington, perl6-i...@perl.org, perl6-c...@perl.org

I think it would be better if we didn't overload the meaning of '\s*#.*'
in PIR.

-J

--

Joshua Isom

unread,

Nov 13, 2005, 8:19:35 PM11/13/05

to perl6-i...@perl.org, perl6-c...@perl.org

I'm pretty sure it already is for when pir's compiled to pasm.
Joshua

Roger Browne

unread,

Nov 14, 2005, 5:22:23 AM11/14/05

to perl6-i...@perl.org, perl6-c...@perl.org

Jonathan,

My highest priority requests (for use by the Amber compiler
and toolset) are:

1. To store away, for each part of the compiled program:

- the name of the HLL source filename
- the line and column numbers

2. For PIR error messages to be presented using the HLL source
location rather than the PIR source location.

3. To be able to retrieve the information programatically, e.g.
whilst walking the call chain, so that Amber can provide useful
information when handling an exception.

At a lower priority is:

1. To be able to store and retrieve additional pieces of
information associated with any source location. These
should be extendible, not hardwired. There are two pieces
of information that I am currently interested in:

- The HLL language name (so that the HLL Amber debugger
would not try to handle pieces of the program that are
written in e.g. Python). As a fallback, I could look at
the suffix of the HLL source file, but that's not so
robust.

- The HLL compile options (because Amber scripts can be
compiled with or without runtime monitoring of preconditions
and postconditions, and debugging/exception-handling might
need to work differently in these cases.

No doubt over time, HLL authors will find many useful
and imaginative ways to store and use additional data.

Finally, the following would be "nice to have":

1. To be able to embed the entire HLL source code. Source code
compresses really well, and I think we should compress it.
If we can't find a suitably library, it would take only a
few lines of code to compress and decompress repeated blanks.

The zlib license looks unproblematic, if licensing
rather than dependency is the issue:
http://www.gzip.org/zlib/zlib_license.html
Compression could be made optional, but let's put the hooks
in for it, anyway.

Jonathan wrote:
> Here are my current thoughts.
>
> * We shouldn't restrict this info to a fixed set of fields.

Agree.

> ... Having to specify the file and line every time you want
> to specify a column will bloat generated code massively

Yep. That's clearly out of the question.

> * I'm thinking of a PIR syntax along the lines of this:-
>
> .hll_debug file "something.pl"

I don't mind what syntax is used, provided it's compact. There are
going to be a LOT of hll debug lines generated.

> A special entry ...
> to indicate that all currently inherited data should be
> dispensed with, so it is possible to merge bytecode sanely.
> ... * Maybe there is a need for some PIR syntax...

I don't see the need for special syntax. Just reset everything
to defaults at the start of each new file. Within a file, the
usual syntax can be used (e.g. you could just set filename to "").

Thanks for your work on this.

Regards,
Roger Browne

Nick Glencross

unread,

Nov 14, 2005, 7:23:31 AM11/14/05

to perl6-i...@perl.org

[Sorry if this doesn't thread in your reader]

Jonathan Worthington wrote:

> I'm looking to work
> on enabling Parrot to store away HLL debug info - that is, the file name,
> line number, columns etc in the high level language source code. This data
> can then be used to emit useful error messages that relate to the HLL source
> code rather than the generated PIR/PASM/whatever.

Does it make sense to have nestable the structures?

e.g.

@push HLL "perl"

@push file "something.pl"
@push line 1
@push column 17
...
@pop column
@pop line

@push line 2
...
@push file "inlined.pl"
@push line 34
...
@pop line
@pop file
...
@pop line

@push line 4
@push pragma xyz
...
@pop pragma
@pop line
@pop file

@pop HLL

Ok, ok, the syntax itself isn't important. I'm just doing nesting.

Then for any point in the pbc file there would be a set of active
attributes/values which could be retrieved as a hash. Note that you
can express bytecode ranges where there isn't valid line number
information available (e.g. where extra PIR runtime glue is inserted
which doesn't come from the HLL). You can also push the language at
the top at bottom.

The representation in the debug section would then just be something like:

1: push HLL "perl"
1: push file "something.pl"
1: push line 1
1: push column 17
5: pop column
5: pop line
5: push line 2
etc.

where the number is the bytecode offset. [Obviously there are
efficient ways to store this by using bit fields and sharing data]

To get the active attributes corresponding to the bytecode you'd need
to start at the beginning and push/pop attributes until you get to the
correct offset, but you wouldn't need to do this too often and so I
believe the extra processing to be worthwhile.

Cheers,

Nick

Nick Glencross

unread,

Nov 14, 2005, 8:03:01 AM11/14/05

to perl6-i...@perl.org, Jonathan Worthington

[Disclaimer: I've only just started thinking about this in the last
hour, and don't want to appear all knowledgeable or anything!]

On 11/14/05, Jonathan Worthington <jona...@jwcs.net> wrote:
> My current thinking on this is that a HLL will define a sub that knows how
> to print errors for that HLL. That sub would be passed an array PMC, with
> element 0 representing the current sub, element 1 representing it's caller,
> etc, so you can produce a backtrace. Each entry in the array would be a
> hash containing the HLL debug info. So a very simple handler could maybe
> look like:-
> ...

I agree, although I'm tempted to say that the PMCs in the backtrace
would encapsulate the backtrace, but wouldn't have yet looked things
up yet in the HLL debug info. It would be up to the code snipped that
you provided to do the actual lookup using an opcode.

Like you say, the handler can be as simple or as complicated as
required, and delegate to other more appropriate handlers if
necessary. It would also print sensible messages depending on what
information is available (e.g. column information would only be
printed if it is in the hash), and perhaps do nice word wrapping etc.

> > As one of the first "here's something extra I need", I need not only line
> > numbers for files, but line numbers of user defined subroutines and eval
> > blocks. (that is, the line *of the sub def* that the error occurs on, in
> > addition to each line as we go.)
> >
> Unless I'm missing something, that's fine with what I proposed; you can emit
> a ".hll_debug line 42" or similar without having to specify a filename. The
> line number means whatever you'd like it to mean - it doesn't have to be
> line number in a file.
>

Out of interest, are we able to associate HLL debug info with eval'd
code? Does it have a directory, or is it just the bytecode?

Cheers,

Nick

Roger Browne

unread,

Nov 14, 2005, 10:20:37 AM11/14/05

to perl6-c...@perl.org, perl6-i...@perl.org

On Mon, 2005-11-14 at 12:31 +0000, Jonathan Worthington wrote:
> My current thinking on this is that a HLL will define a sub that knows how

> to print errors for that HLL...

The HLL could register a PMC or object class (instead of just a sub),
using the existing "Parrot_register_HLL_type" call (or a future
equivalent opcode).

Then, the system can use the existing "Parrot_get_ctx_HLL_type" call (or
a future equivalent opcode) to work out which error handler to call for
that HLL.

> But there is also an issue relating to what if some of the
> BT is from another HLL's code

One handler could use the "Parrot_get_ctx_HLL_type" call to find the
handler for nested code that's written using a different HLL.

For example, Amber allows methods of an Amber class to be written in
PIR. If that PIR fails, the PIR error handler should print the message
for its piece of code before finding and handing over to the Amber error
handler for the remainder.

Regards,
Roger Browne

Nick Glencross

unread,

Nov 14, 2005, 11:43:24 AM11/14/05

to perl6-i...@perl.org

On 11/14/05, Nick Glencross <nick.gl...@gmail.com> wrote:
> Jonathan Worthington wrote:
>
> > I'm looking to work
> > on enabling Parrot to store away HLL debug info - that is, the file name,
> > line number, columns etc in the high level language source code. This data
> > can then be used to emit useful error messages that relate to the HLL source
> > code rather than the generated PIR/PASM/whatever.
>

> Does it make sense to have nestable structures?

Actually the example notation looks quite different from what other
people are suggesting, so let me rephrase it as:

.hll_debug_begin HLL "perl5"
.hll_debug_begin copyright "Fred"

.hll_debug_begin file "something.pl"
.hll_debug_begin line 1
.hll_debug_begin column 17
...
.hll_debug_end column
.hll_debug_end line

.hll_debug_begin line 2
...
.hll_debug_begin file "inlined.pl"
.hll_debug_begin copyright "Jim"
.hll_debug_begin line 34
...
.hll_debug_end line
.hll_debug_end copyright
.hll_debug_end file
...
.hll_debug_end line

.hll_debug_begin line 4
.hll_debug_begin pragma xyz
...
.hll_debug_end pragma
.hll_debug_end line
.hll_debug_end file

.hll_debug_end copyright
.hll_debug_end HLL

It's perhaps not clear that these are sprinkled among a large number
of PIR instructions, even though it might look like this is the
majority of the code.

Another addition to what I suggested before is that the info section
could have a table of contents with entry points where the attribute
stack is empty. That way, instead of always starting from the begging,
the most appropriate entry point can be easily located.

Just some thoughts.

Cheers,

Nick

Roger Browne

unread,

Nov 14, 2005, 12:35:23 PM11/14/05

to perl6-i...@perl.org

Nick Glencross wrote:

> > Does it make sense to have nestable structures?

Not always. Consider debug info that includes "line number" and
"statement number". You could have multiple statements per line, or
multiple lines per statement.

> Actually the example notation looks quite different from what other
> people are suggesting, so let me rephrase it as:
>
> .hll_debug_begin HLL "perl5"

> .hll_debug_begin copyright "Fred"...

I much prefer your previous, more compact, example syntax.

> .hll_debug_end line
> .hll_debug_begin line 2

I don't think the "end" directives add much. There's almost always going
to be an "end line" before a "begin line", so why not let 'begin line'
to imply the end of any previously-declared line?

Regards,
Roger Browne

Jonathan Worthington

unread,

Nov 14, 2005, 1:31:53 PM11/14/05

to perl6-i...@perl.org

"Roger Browne" <ro...@eiffel.demon.co.uk> wrote:
>> > Does it make sense to have nestable structures?
>
> Not always. Consider debug info that includes "line number" and
> "statement number". You could have multiple statements per line, or
> multiple lines per statement.
>
>> Actually the example notation looks quite different from what other
>> people are suggesting, so let me rephrase it as:
>>
>> .hll_debug_begin HLL "perl5"
>> .hll_debug_begin copyright "Fred"...
>
> I much prefer your previous, more compact, example syntax.
>

I think we'll end up with something more compact than .hll_debug - it's
quite long and as has been mentioned we'll generate quite a lot of them.
I'm wondering about .ann (for annotate), or maybe just .dbg (for debug).

>> .hll_debug_end line
>> .hll_debug_begin line 2
>
> I don't think the "end" directives add much. There's almost always going
> to be an "end line" before a "begin line", so why not let 'begin line'
> to imply the end of any previously-declared line?
>

This was my take on things. Plus the fact that nesting doesn't always make
sense, as mentioned above.

Thanks,

Jonathan

Nick Glencross

unread,

Nov 14, 2005, 3:06:24 PM11/14/05

to Roger Browne, Perl 6 Internals

Roger Browne wrote:

>Nick Glencross wrote:
>
>
>> .hll_debug_end line
>> .hll_debug_begin line 2
>>
>>
>
>I don't think the "end" directives add much. There's almost always going
>to be an "end line" before a "begin line", so why not let 'begin line'
>to imply the end of any previously-declared line?
>

While nesting one begin/end line number directly inside another doesn't
make much sense, my reasoning for this is for inlining of code where you
nest a new filename/line/column and then these are popped to get back to
the original calling location.

I also see your point about statements/line numbers, but again
begin/ends can of course be arranged to model this too.

I can see both sides of the coin.

Nick

Leopold Toetsch

unread,

Nov 14, 2005, 4:15:03 PM11/14/05

to Nick Glencross, Roger Browne, Perl 6 Internals

On Nov 14, 2005, at 21:06, Nick Glencross wrote:

> While nesting one begin/end line number directly inside another
> doesn't make much sense, my reasoning for this is for inlining of code
> where you nest a new filename/line/column and then these are popped to
> get back to the original calling location.

Either your compiler emits proper line/file directives for nested stuff
or parrot handles these, if there is an .include "file". I don't see
any reason to need kind of some end-directives.

So instead of

.end-foo 1
.begin-foo 2

a simple:

.foo 2

ought to be enough. Whenever foo changes, set a new value, done.

> I also see your point about statements/line numbers, but again
> begin/ends can of course be arranged to model this too.

That's overkill and code bloat to me - sorry.

> Nick

leo

Leopold Toetsch

unread,

Nov 14, 2005, 4:33:37 PM11/14/05

to Jonathan Worthington, perl6-i...@perl.org, perl6-c...@perl.org

On Nov 14, 2005, at 0:02, Jonathan Worthington wrote:

> * I'm thinking of a PIR syntax along the lines of this:-

The discussion goes forth and back, like all other discussion we
already had WRT syntax, months and years ago.

I'd much more prefer that a compiler (amber anyone ;) just emits PIR
with debug syntax so that folks get a feeling how it looks like. E.g.

#_dbg file "foo" # file is a bit special but general syntax is:
#_dbg keyword rest # (\w+)\s+(.*)
#_dbg 1 # bare number defaults to line for brevity
#_dbg line 1 # same
#_dbg 1.4-8 # line.column-range

'file' and 'line' and maybe 'column' are special insofar that we might
need/invent a compact storage format for it (this is an optimization -
yes), that is we might have an 'optimized' file/line/column format and
a genernal format with 'key' => stuff mappings.

Future improvements:

#_dbg begin_segment foo # create new segment type
#_dbg stuff bar # goes into segment foo
...
#_dbg end_segment # done with it

> Jonathan

leo

Roger Browne

unread,

Nov 14, 2005, 5:35:50 PM11/14/05

to perl6-i...@perl.org, perl6-c...@perl.org

On Mon, 2005-11-14 at 22:33 +0100, Leopold Toetsch wrote:

> I'd much more prefer that a compiler (amber anyone ;) just emits PIR
> with debug syntax so that folks get a feeling how it looks like.

Good idea. I'll do it tomorrow (off to bed now).

Regards,
Roger Browne

Leopold Toetsch

unread,

Nov 15, 2005, 3:45:17 AM11/15/05

to Jonathan Worthington, perl6-i...@perl.org, perl6-c...@perl.org

On Nov 15, 2005, at 0:07, Jonathan Worthington wrote:

> What's the fascination with overloading comment syntax?

Because a compiler can emit it right now w/o any change to Parrot.

>
> Jonathan

leo

Leopold Toetsch

unread,

Nov 15, 2005, 4:25:07 AM11/15/05

to Brent 'Dax' Royal-Gordon, perl6-i...@perl.org, Jonathan Worthington, perl6-c...@perl.org

On Nov 15, 2005, at 10:04, Brent 'Dax' Royal-Gordon wrote:

> Leopold Toetsch <l...@toetsch.at> wrote:

>> Because a compiler can emit it right now w/o any change to Parrot.
>

> That's an advantage for the week it takes to implement the feature.
> For the remaining age of the universe,

Err, I didn't say that this is the final syntax. I just want to have a
syntax to start with right now. Changing it from a comment to a
directive is easy, after parsing is implemented in Parrot.

leo

Joshua Hoblitt

unread,

Nov 15, 2005, 6:14:25 AM11/15/05

to Leopold Toetsch, Brent 'Dax' Royal-Gordon, perl6-i...@perl.org, Jonathan Worthington, perl6-c...@perl.org

Why do I get the feeling that Parrot is going to end up stuck supporting
a syntax where:

#line

and

# line

mean two different things? If the only good thing that can be said
about using '#' for debug info is that compilers can emit it right now,
supported or not, then it's worth noting that compilers can also emit
'#.debug_hll ...' right now and the '#' can simply be removed when
support for it is ready.

Cheers,

-J

--

Roger Browne

unread,

Nov 15, 2005, 9:28:02 AM11/15/05

to perl6-i...@perl.org, perl6-c...@perl.org

Leopold Toetsch wrote:
> I'd much more prefer that a compiler (amber anyone ;) just emits PIR

> with debug syntax so that folks get a feeling how it looks like...

OK, I've done this.

I have modified the Amber compiler to generate PIR code that contains
debug directives, so that we can discuss a real example.

You can access the generated PIR file from the link on this page:
http://xamber.org/temp/debug.html

There is also a link to the Amber source file from which the PIR is
generated (annotated with line numbers). This is for your interest only;
you don't need to look at it if you are only interested in the PIR debug
directives.

SYNTAX USED:

I have used a syntax that I found convenient to generate, but of course
I will change it to whatever the Parrot project wants to use. It's a
fairly minimal syntax for the most common cases, which are line numbers,
file numbers and column numbers.

I realise that starting these directives with a PIR comment character is
controversial, however in the short term this enables the PIR to remain
runnable, so I have made every directive start with "#.debug ".

If that is followed by a string, it represents the current filename,
e.g.:

#.debug "foo.am"

An integer represents a line number:

#.debug 27

A colon separates line numbers and column numbers:

#.debug 27:9

Other kinds of data are represented by a key (an identifier) followed by
a value (a string or integer). There are two samples of this in the
current PIR file. The first is a zone, which has the value "assertion"
within code that evaluates assertions, and the value "" elsewhere. The
second is a class_number, which is a distinct positive integer for each
class, or 0 for unknown. For example:

#.debug class_number 4
#.debug zone "assertion"
...
#.debug zone ""
...
#.debug class_number 0

SOME ISSUES FOR DISCUSSION:

Amber uses natural numbers for counting, so the first column of the first
line is 1:1. A line number of column number of 0 represents 'unknown'. If
PIR uses a different convention, I will adjust the code generation to match.

Amber sometimes generates multiple directives without any intervening
PIR code. Some of these may be 0:0, for constructs where Amber isn't yet
tracking the line/column number, but imcc should accept this. For
example:

#.debug 48:12
#.debug 0:0
#.debug 48:56

RUNNING THE PIR:

If you want to run this PIR, you will need revision 9911 or later of Parrot.
You have to first build the Amber PMCs:

cd languages/amber ; make pmcs

Then you can run it like this:

parrot life.pir

This runs a small text-mode display of Conway's game of life for 20
generations (takes only a few seconds). You can choose a different
number of generations by adding a command-line argument:

parrot life.pir 35

ERROR REPORTING:

I have inserted a trap so that an exception is raised after generation
42 has been displayed. So, if you run the program like this...

parrot life.pir 50

...then you will get the following message after generation 42:

This is an Amber exception raised to test error reporting.
current instr.: 'ANY :: raise' pc 208 (life.pir:81)
called from Sub 'ROOT_CLASS :: life' pc 1699 (life.pir:656)
called from Sub 'ROOT_CLASS :: make' pc 1235 (life.pir:495)
called from Sub '_root' pc 71 (life.pir:39)

In the future, we should be able to report the HLL line numbers instead
of (or in addition to) the PC and PIR counters.

I hope this example is useful for the purposes of discussion, and maybe
also as sample input data for whoever implements this. I will keep the
example updated according to any design decisions that are made.

Regards,
Roger Browne

Dave Whipp

unread,

Nov 15, 2005, 12:49:26 PM11/15/05

to perl6-c...@perl.org, perl6-i...@perl.org

Will Coleda wrote:

> Right, the hard bit here was that I needed to specify something other
> than "file". Just agreeing that we need something other than just
> "file/line".

I'd have thought the onus is the other way: justify the use of
"file/line" as the primitive concept.

We're going to have aset of "parrot compiler tools", which represent
high level language and subsequent transformations as trees. If these
trees are available, then all that is needed for debug traceability is a
pointer/reference to nodes in the tree. If the node has a "get
file/line" method, then the node (attribute grammar?) can be responsible
for chaining the information back to the source code, even when things
like common-subexpression optimizations have been done (the method can
query the callstack, etc., to resolve this).