> .hll_debug file "something.pl"
> .hll_debug line 1
#line 789 "file.foo"
looks simpler and well known to me - the latter is already parsed. But
actually making it work is more important for me.
> Either an integer or a string constant from the constants table
Storing debug-only things in the constant table could complicate a TODO
'pbc_strip(1)' utility but not a problem.
pbc_merge deals with such things already.
My highest priority requests (for use by the Amber compiler
and toolset) are:
1. To store away, for each part of the compiled program:
- the name of the HLL source filename
- the line and column numbers
2. For PIR error messages to be presented using the HLL source
location rather than the PIR source location.
3. To be able to retrieve the information programatically, e.g.
whilst walking the call chain, so that Amber can provide useful
information when handling an exception.
At a lower priority is:
1. To be able to store and retrieve additional pieces of
information associated with any source location. These
should be extendible, not hardwired. There are two pieces
of information that I am currently interested in:
- The HLL language name (so that the HLL Amber debugger
would not try to handle pieces of the program that are
written in e.g. Python). As a fallback, I could look at
the suffix of the HLL source file, but that's not so
- The HLL compile options (because Amber scripts can be
compiled with or without runtime monitoring of preconditions
and postconditions, and debugging/exception-handling might
need to work differently in these cases.
No doubt over time, HLL authors will find many useful
and imaginative ways to store and use additional data.
Finally, the following would be "nice to have":
1. To be able to embed the entire HLL source code. Source code
compresses really well, and I think we should compress it.
If we can't find a suitably library, it would take only a
few lines of code to compress and decompress repeated blanks.
The zlib license looks unproblematic, if licensing
rather than dependency is the issue:
Compression could be made optional, but let's put the hooks
in for it, anyway.
> Here are my current thoughts.
> * We shouldn't restrict this info to a fixed set of fields.
> ... Having to specify the file and line every time you want
> to specify a column will bloat generated code massively
Yep. That's clearly out of the question.
> * I'm thinking of a PIR syntax along the lines of this:-
> .hll_debug file "something.pl"
I don't mind what syntax is used, provided it's compact. There are
going to be a LOT of hll debug lines generated.
> A special entry ...
> to indicate that all currently inherited data should be
> dispensed with, so it is possible to merge bytecode sanely.
> ... * Maybe there is a need for some PIR syntax...
I don't see the need for special syntax. Just reset everything
to defaults at the start of each new file. Within a file, the
usual syntax can be used (e.g. you could just set filename to "").
Thanks for your work on this.
Jonathan Worthington wrote:
> I'm looking to work
> on enabling Parrot to store away HLL debug info - that is, the file name,
> line number, columns etc in the high level language source code. This data
> can then be used to emit useful error messages that relate to the HLL source
> code rather than the generated PIR/PASM/whatever.
Does it make sense to have nestable the structures?
@push HLL "perl"
@push file "something.pl"
@push line 1
@push column 17
@push line 2
@push file "inlined.pl"
@push line 34
@push line 4
@push pragma xyz
Ok, ok, the syntax itself isn't important. I'm just doing nesting.
Then for any point in the pbc file there would be a set of active
attributes/values which could be retrieved as a hash. Note that you
can express bytecode ranges where there isn't valid line number
information available (e.g. where extra PIR runtime glue is inserted
which doesn't come from the HLL). You can also push the language at
the top at bottom.
The representation in the debug section would then just be something like:
1: push HLL "perl"
1: push file "something.pl"
1: push line 1
1: push column 17
5: pop column
5: pop line
5: push line 2
where the number is the bytecode offset. [Obviously there are
efficient ways to store this by using bit fields and sharing data]
To get the active attributes corresponding to the bytecode you'd need
to start at the beginning and push/pop attributes until you get to the
correct offset, but you wouldn't need to do this too often and so I
believe the extra processing to be worthwhile.
On 11/14/05, Jonathan Worthington <jona...@jwcs.net> wrote:
> My current thinking on this is that a HLL will define a sub that knows how
> to print errors for that HLL. That sub would be passed an array PMC, with
> element 0 representing the current sub, element 1 representing it's caller,
> etc, so you can produce a backtrace. Each entry in the array would be a
> hash containing the HLL debug info. So a very simple handler could maybe
> look like:-
I agree, although I'm tempted to say that the PMCs in the backtrace
would encapsulate the backtrace, but wouldn't have yet looked things
up yet in the HLL debug info. It would be up to the code snipped that
you provided to do the actual lookup using an opcode.
Like you say, the handler can be as simple or as complicated as
required, and delegate to other more appropriate handlers if
necessary. It would also print sensible messages depending on what
information is available (e.g. column information would only be
printed if it is in the hash), and perhaps do nice word wrapping etc.
> > As one of the first "here's something extra I need", I need not only line
> > numbers for files, but line numbers of user defined subroutines and eval
> > blocks. (that is, the line *of the sub def* that the error occurs on, in
> > addition to each line as we go.)
> Unless I'm missing something, that's fine with what I proposed; you can emit
> a ".hll_debug line 42" or similar without having to specify a filename. The
> line number means whatever you'd like it to mean - it doesn't have to be
> line number in a file.
Out of interest, are we able to associate HLL debug info with eval'd
code? Does it have a directory, or is it just the bytecode?
The HLL could register a PMC or object class (instead of just a sub),
using the existing "Parrot_register_HLL_type" call (or a future
Then, the system can use the existing "Parrot_get_ctx_HLL_type" call (or
a future equivalent opcode) to work out which error handler to call for
> But there is also an issue relating to what if some of the
> BT is from another HLL's code
One handler could use the "Parrot_get_ctx_HLL_type" call to find the
handler for nested code that's written using a different HLL.
For example, Amber allows methods of an Amber class to be written in
PIR. If that PIR fails, the PIR error handler should print the message
for its piece of code before finding and handing over to the Amber error
handler for the remainder.
Actually the example notation looks quite different from what other
people are suggesting, so let me rephrase it as:
.hll_debug_begin HLL "perl5"
.hll_debug_begin copyright "Fred"
.hll_debug_begin file "something.pl"
.hll_debug_begin line 1
.hll_debug_begin column 17
.hll_debug_begin line 2
.hll_debug_begin file "inlined.pl"
.hll_debug_begin copyright "Jim"
.hll_debug_begin line 34
.hll_debug_begin line 4
.hll_debug_begin pragma xyz
It's perhaps not clear that these are sprinkled among a large number
of PIR instructions, even though it might look like this is the
majority of the code.
Another addition to what I suggested before is that the info section
could have a table of contents with entry points where the attribute
stack is empty. That way, instead of always starting from the begging,
the most appropriate entry point can be easily located.
Just some thoughts.
> > Does it make sense to have nestable structures?
Not always. Consider debug info that includes "line number" and
"statement number". You could have multiple statements per line, or
multiple lines per statement.
> Actually the example notation looks quite different from what other
> people are suggesting, so let me rephrase it as:
> .hll_debug_begin HLL "perl5"
> .hll_debug_begin copyright "Fred"...
I much prefer your previous, more compact, example syntax.
> .hll_debug_end line
> .hll_debug_begin line 2
I don't think the "end" directives add much. There's almost always going
to be an "end line" before a "begin line", so why not let 'begin line'
to imply the end of any previously-declared line?
>> .hll_debug_end line
>> .hll_debug_begin line 2
> I don't think the "end" directives add much. There's almost always going
> to be an "end line" before a "begin line", so why not let 'begin line'
> to imply the end of any previously-declared line?
This was my take on things. Plus the fact that nesting doesn't always make
sense, as mentioned above.
>Nick Glencross wrote:
>> .hll_debug_end line
>> .hll_debug_begin line 2
>I don't think the "end" directives add much. There's almost always going
>to be an "end line" before a "begin line", so why not let 'begin line'
>to imply the end of any previously-declared line?
While nesting one begin/end line number directly inside another doesn't
make much sense, my reasoning for this is for inlining of code where you
nest a new filename/line/column and then these are popped to get back to
the original calling location.
I also see your point about statements/line numbers, but again
begin/ends can of course be arranged to model this too.
I can see both sides of the coin.
> While nesting one begin/end line number directly inside another
> doesn't make much sense, my reasoning for this is for inlining of code
> where you nest a new filename/line/column and then these are popped to
> get back to the original calling location.
Either your compiler emits proper line/file directives for nested stuff
or parrot handles these, if there is an .include "file". I don't see
any reason to need kind of some end-directives.
So instead of
ought to be enough. Whenever foo changes, set a new value, done.
> I also see your point about statements/line numbers, but again
> begin/ends can of course be arranged to model this too.
That's overkill and code bloat to me - sorry.
> * I'm thinking of a PIR syntax along the lines of this:-
The discussion goes forth and back, like all other discussion we
already had WRT syntax, months and years ago.
I'd much more prefer that a compiler (amber anyone ;) just emits PIR
with debug syntax so that folks get a feeling how it looks like. E.g.
#_dbg file "foo" # file is a bit special but general syntax is:
#_dbg keyword rest # (\w+)\s+(.*)
#_dbg 1 # bare number defaults to line for brevity
#_dbg line 1 # same
#_dbg 1.4-8 # line.column-range
'file' and 'line' and maybe 'column' are special insofar that we might
need/invent a compact storage format for it (this is an optimization -
yes), that is we might have an 'optimized' file/line/column format and
a genernal format with 'key' => stuff mappings.
#_dbg begin_segment foo # create new segment type
#_dbg stuff bar # goes into segment foo
#_dbg end_segment # done with it
> I'd much more prefer that a compiler (amber anyone ;) just emits PIR
> with debug syntax so that folks get a feeling how it looks like.
Good idea. I'll do it tomorrow (off to bed now).
> What's the fascination with overloading comment syntax?
Because a compiler can emit it right now w/o any change to Parrot.
> Leopold Toetsch <l...@toetsch.at> wrote:
>> Because a compiler can emit it right now w/o any change to Parrot.
> That's an advantage for the week it takes to implement the feature.
> For the remaining age of the universe,
Err, I didn't say that this is the final syntax. I just want to have a
syntax to start with right now. Changing it from a comment to a
directive is easy, after parsing is implemented in Parrot.
Why do I get the feeling that Parrot is going to end up stuck supporting
a syntax where:
mean two different things? If the only good thing that can be said
about using '#' for debug info is that compilers can emit it right now,
supported or not, then it's worth noting that compilers can also emit
'#.debug_hll ...' right now and the '#' can simply be removed when
support for it is ready.
OK, I've done this.
I have modified the Amber compiler to generate PIR code that contains
debug directives, so that we can discuss a real example.
You can access the generated PIR file from the link on this page:
There is also a link to the Amber source file from which the PIR is
generated (annotated with line numbers). This is for your interest only;
you don't need to look at it if you are only interested in the PIR debug
I have used a syntax that I found convenient to generate, but of course
I will change it to whatever the Parrot project wants to use. It's a
fairly minimal syntax for the most common cases, which are line numbers,
file numbers and column numbers.
I realise that starting these directives with a PIR comment character is
controversial, however in the short term this enables the PIR to remain
runnable, so I have made every directive start with "#.debug ".
If that is followed by a string, it represents the current filename,
An integer represents a line number:
A colon separates line numbers and column numbers:
Other kinds of data are represented by a key (an identifier) followed by
a value (a string or integer). There are two samples of this in the
current PIR file. The first is a zone, which has the value "assertion"
within code that evaluates assertions, and the value "" elsewhere. The
second is a class_number, which is a distinct positive integer for each
class, or 0 for unknown. For example:
#.debug class_number 4
#.debug zone "assertion"
#.debug zone ""
#.debug class_number 0
SOME ISSUES FOR DISCUSSION:
Amber uses natural numbers for counting, so the first column of the first
line is 1:1. A line number of column number of 0 represents 'unknown'. If
PIR uses a different convention, I will adjust the code generation to match.
Amber sometimes generates multiple directives without any intervening
PIR code. Some of these may be 0:0, for constructs where Amber isn't yet
tracking the line/column number, but imcc should accept this. For
RUNNING THE PIR:
If you want to run this PIR, you will need revision 9911 or later of Parrot.
You have to first build the Amber PMCs:
cd languages/amber ; make pmcs
Then you can run it like this:
This runs a small text-mode display of Conway's game of life for 20
generations (takes only a few seconds). You can choose a different
number of generations by adding a command-line argument:
parrot life.pir 35
I have inserted a trap so that an exception is raised after generation
42 has been displayed. So, if you run the program like this...
parrot life.pir 50
...then you will get the following message after generation 42:
This is an Amber exception raised to test error reporting.
current instr.: 'ANY :: raise' pc 208 (life.pir:81)
called from Sub 'ROOT_CLASS :: life' pc 1699 (life.pir:656)
called from Sub 'ROOT_CLASS :: make' pc 1235 (life.pir:495)
called from Sub '_root' pc 71 (life.pir:39)
In the future, we should be able to report the HLL line numbers instead
of (or in addition to) the PC and PIR counters.
I hope this example is useful for the purposes of discussion, and maybe
also as sample input data for whoever implements this. I will keep the
example updated according to any design decisions that are made.
> Right, the hard bit here was that I needed to specify something other
> than "file". Just agreeing that we need something other than just
I'd have thought the onus is the other way: justify the use of
"file/line" as the primitive concept.
We're going to have aset of "parrot compiler tools", which represent
high level language and subsequent transformations as trees. If these
trees are available, then all that is needed for debug traceability is a
pointer/reference to nodes in the tree. If the node has a "get
file/line" method, then the node (attribute grammar?) can be responsible
for chaining the information back to the source code, even when things
like common-subexpression optimizations have been done (the method can
query the callstack, etc., to resolve this).