Docstrings in Perl 6

Hinrik Sig

unread,

May 3, 2009, 12:03:39 AM5/3/09

to perl6-l...@perl.org

I've been thinking lately about how Perl 6 might offer functionality
similar to Python's docstrings. That is, documentation which is tied
directly to a particular routine, class or module[1]. This is
something which would is very useful in a REPL, and for documentation
readers[2].

Taking one case, the usual Perl 5 way to document methods is to put a
METHODS heading in the Pod and then list all the methods as
subheadings or list items underneath it. However, Synopsis 26 has this
notion of semantic blocks, and already defines some standard ones
including SYNOPSIS, so you can write something like this:

=begin SYNOPSIS
Bla bla
bla bla bla
=end SYNOPSIS

The formatter might then print out a "Synopsis" heading with the block
text beneath it, since the two are reliably tied together, while the
"=head1 SYNOPSIS + any number of paragraphs" combo doesn't tell the
formatter much about the structure.

Now, S26 also specifies METHOD and SUBROUTINE, but their use is not
discussed. As I understand, Perl 6 will give you access to all Pod
blocks through $=BLOCKNAME. I suppose there might be @=BLOCKNAME if
there's more than one block by that name. If so, then using semantic
blocks seems like a shoe-in for providing docstring-like
functionality. We could have something like this:

=begin METHOD :name<reverse>
Acts on a L<List>. Returns a copy with the order of the elements reversed.
=end METHOD

method reverse() {
my @result;
for @.list {
@result.unshift($_);
}
return @result;
}

Something similar could be done for MODULE, CLASS, GRAMMAR, ROLE,
TOKEN, and REGEX.

One advantage to using Pod blocks in place of actual strings á la
Python, is that the documentation is still independent of the source
code, and need not be in the same file.

1. http://www.python.org/dev/peps/pep-0257/
2. E.g. http://svn.pugscode.org/pugs/docs/u4x/README

Damian Conway

unread,

May 3, 2009, 8:03:30 PM5/3/09

to Hinrik Örn Sigurðsson, perl6-l...@perl.org

Hinrik Örn Sigurðsson wrote:

> I've been thinking lately about how Perl 6 might offer functionality
> similar to Python's docstrings. That is, documentation which is tied
> directly to a particular routine, class or module[1]. This is
> something which would is very useful in a REPL, and for documentation
> readers[2].

For the latest S26 proposal that I'm (very quietly) working on, I'm
considering two possible mechanisms to support tying docs to specific
components of a program.

The first is an C<is doc> trait:

method reverse (
Bool $recursive is doc<Reverse any nested L<List>s too>
)
is doc<Returns a copy of the L<List> with the order of elements reversed.>

{
my @result;
for @.list {
@result.unshift($_);
}
return @result;
}

The second is a generalized Pod comment form:

method reverse #={ Returns copy of L<List> with order of elems reversed. }
(
Bool $recursive #={ reverse nested L<List>s too }
)

{
my @result;
for @.list {
@result.unshift($_);
}
return @result;
}

Each approach has advantages and disadvantages.
Feedback via this forum would be most welcome.

> Something similar could be done for MODULE, CLASS, GRAMMAR, ROLE,
> TOKEN, and REGEX.

Indeed. And with both of the above alternatives that's also true.

> One advantage to using Pod blocks in place of actual strings a la

> Python, is that the documentation is still independent of the source
> code, and need not be in the same file.

That's certainly true of your proposal. However, many might argue that
one *disadvantage* of using Pod blocks plus :name<> that way is that the
documentation is independent of the source code, and need not be in the
same file. ;-)

Damian

Mark Overmeer

unread,

May 4, 2009, 3:17:54 AM5/4/09

to Damian Conway, perl6-l...@perl.org

* Damian Conway (dam...@conway.org) [090504 00:03]:

> Hinrik ?rn Sigur?sson wrote:
> > I've been thinking lately about how Perl 6 might offer functionality
> > similar to Python's docstrings. That is, documentation which is tied
> > directly to a particular routine, class or module[1]. This is
> > something which would is very useful in a REPL, and for documentation
> > readers[2].

We had long and fierse fights about it, back in July 2007. Challenged
by Damian, I can up with
http://www.nntp.perl.org/group/perl.perl6.language/2007/07/msg27894.html
And never heard from it since. (I know you are busy)

> For the latest S26 proposal that I'm (very quietly) working on, I'm
> considering two possible mechanisms to support tying docs to specific
> components of a program.
>
> The first is an C<is doc> trait:
> method reverse (
> Bool $recursive is doc<Reverse any nested L<List>s too>
> )
> is doc<Returns a copy of the L<List> with the order of elements reversed.>
>

> The second is a generalized Pod comment form:
> method reverse #={ Returns copy of L<List> with order of elems reversed. }
> ( Bool $recursive #={ reverse nested L<List>s too }
> )

Does this mean that it now will be acceptable to interlink documentation
with information from the code? I would certainly appreciate that change.

> > One advantage to using Pod blocks in place of actual strings a la
> > Python, is that the documentation is still independent of the source
> > code, and need not be in the same file.
>
> That's certainly true of your proposal. However, many might argue that
> one *disadvantage* of using Pod blocks plus :name<> that way is that the
> documentation is independent of the source code, and need not be in the
> same file. ;-)

The "=begin METHOD" is the starter of the method implementation (docs are
as important to programs as code is), so is a big advantage to read the
documented subject on the initial line. Certainly if the documentation
gets longer.

However, for my taste, the focus is attracted to the wrong components of
the line. I would prefer

=begin METHOD :name<reverse> =method reverse
Acts on a L<List>. Returns a copy Acts on a LIST. Returns a copy
=end METHOD =cut

method reverse() { method reverse() {

(=end METHOD suggests that the method ends there, but of course, the
coding part still has to come. That's confusion)

Or, simpler (alternative syntaxes)

method reverse() { method reverse() {
#= Acts on a LIST ` Acts on a LIST.
#= Returns a copy ` Returns a copy.

Documentation and comments are alike: Documentation for external module
users, Comments for internal module users. Therefore, I think that a
'#= ' comes naturally next to the '# '

I suggest a syntax abbreviation from m/^\s*\`/ to "is doc<>"
--
MarkOv

------------------------------------------------------------------------
Mark Overmeer MSc MARKOV Solutions
Ma...@Overmeer.net solu...@overmeer.net
http://Mark.Overmeer.net http://solutions.overmeer.net

Hinrik Sig

unread,

May 4, 2009, 6:45:07 AM5/4/09

to Damian Conway, perl6-l...@perl.org

These two (generalized Pod comments and C<is doc> are both better than
my suggestion. :) They require less typing, so documenting at a finer
granularity (e.g. object attributes) is easier. It would provide an
opportunity for very thorough auto-generated documentation as outlined
in Mark's 2007 post

In Perl 5 I hate having Pod interleaved with code, since it is usually
mostly /user/ documentation and so just gets in the way. Having normal
Pod elsewhere (at end-of-file or in a different file) and keeping API
Pod docs inline (and having tools which might merge the two into nice
HTML if desired) would be the best of both world.

As for "#={ Pod }" vs "is doc", since I've been working on Perl 6
highlighting in vim, the first difference that comes to mind is that
highlighting the #={ Pod } form like other Pod (with L<inks> looking
distinct n' all) is easier and makes more sense since it's not just a
normal string (which might have come from anywhere). Highlighting the
other form would be a special case that might be hard to detect, and
arguably shouldn't be, since it is just a string.

Mark Overmeer

unread,

May 4, 2009, 7:03:26 AM5/4/09

to Hinrik ??rn Sigur??sson, Damian Conway, perl6-l...@perl.org

* Hinrik ??rn Sigur??sson (hinri...@gmail.com) [090504 10:45]:

> In Perl 5 I hate having Pod interleaved with code, since it is usually
> mostly /user/ documentation and so just gets in the way. Having normal
> Pod elsewhere (at end-of-file or in a different file) and keeping API
> Pod docs inline (and having tools which might merge the two into nice
> HTML if desired) would be the best of both world.

I also maintain old and often used modules written by experienced
programmers, which did have the POD and the end of the file and in
seperate files. Transforming that into interleaved documentation
highlighted many mistakes: often, over time, the interface gets extended
or deprecated. When the documentation is far away from the code, people
forget to update it. That is a fact. Besides, user-doc and code-doc
have quite some overlap. By interleaving user-doc with the code, you need
less code-doc to achieve the same clarity.

On the other hand, you also need a global API description, which
demonstrates how the various functions/methods work together. So in my
personal favorit style, I document the nasty details of each method close
to the code of the method, and after all the code I add a long text on
all the main subjects of the code, on how to use the module as a whole.
In that large text, I do not wish to detail all the options because that
frustrates the learning process.
--
Regards,

Charles Bailey

unread,

May 4, 2009, 8:30:10 AM5/4/09

to perl6-l...@perl.org

If the two can be made eqaully expressive (i.e. content is normal POD,
and multiple lines merged logically), I think I favor the pod-comment
form, since it allows one to place the doc close to the thing
documented - in particular, to the head of the function definition.
That's a convenience for the user puzzling out the signature, and a
reminder to the author to update the doc when the signature changes.

--
Regards,
Charles Bailey
Lists: bailey _dot_ charles _at_ gmail _dot_ com
Other: bailey _at_ newman _dot_ upenn _dot_ edu

Hinrik Sig

unread,

May 4, 2009, 10:37:20 AM5/4/09

to Charles Bailey, perl6-l...@perl.org

On Mon, May 4, 2009 at 12:30 PM, Charles Bailey
<bailey....@gmail.com> wrote:
> If the two can be made eqaully expressive (i.e. content is normal POD,
> and multiple lines merged logically), I think I favor the pod-comment
> form, since it allows one to place the doc close to the thing
> documented - in particular, to the head of the function definition.
> That's a convenience for the user puzzling out the signature, and a
> reminder to the author to update the doc when the signature changes.
>

If the #={ } form (or #=<>, #=«», etc) works like embedded comments (
#{ } ), what you're describing would work just fine since they can
span multiple lines. See
http://perlcabal.org/syn/S02.html#Whitespace_and_Comments

Hinrik Sig

unread,

May 4, 2009, 11:06:48 AM5/4/09

to Damian Conway, perl6-l...@perl.org

I suppose that #={ } would work like a normal Pod block which does not
allow implicit code. That way you could keep it properly indented by
default:

class Foo;
method bar ($x, $y) {
#={Swaps $x and $y}
...
}

method baz ($a, @b) {
#={
Takes $a, applies it to the values in @b
Bla bla bla
}
...
}

I remember reading that #{} comments are not allowed at the beginning
of a line. Is that still true? Would that also apply to #={} ? If so,
it would be easy to make a mistake:

class Foo;
#={ This class does something neat } <-- ILLEGAL

method bar () { ...}

Another issue is that Pod directives must be at the beginning of a
line. That would mean having to break the indent in cases like:

method baz ($a, @b) {
#={
Takes $a, applies it to the values in @b
Be mindful of the following:

=item This
=item That
=item And the other thing
}
...
}

Could Perl allow Pod blocks to be indented in #={ } comments? Maybe
the "local" beginning-of-line in the comment could be considered to be
at the position where the #={ appeared, or at the position of the
first whitespace character after it.. If so, then maybe the "no
implicit code blocks" restriction could be lifted, so the whole
comment behaves just like an indented =pod block:

method baz ($a, @b) {
#={
Takes $a, applies it to the values in @b
Be mindful of the following:

=item This
=item That
=item And the other thing

This is rendered as code, since it is more indented further
than the initial #={
}
...
}

Thoughts?

Hinrik Sig

unread,

May 4, 2009, 11:21:44 AM5/4/09

to Damian Conway, perl6-l...@perl.org

The only reason that Pod directives have to be at the beginning of the
line is so that Pod parsers don't need to know Perl (or any other
language) in order to extract Pod from files, right? Allowing them to
be indented like I suggested would break that, but the tool would need
to parse Perl in order to find the #={ } comments anyway, and to tie
them to surrounding Perl structures.

Hinrik Sig

unread,

May 4, 2009, 11:28:10 AM5/4/09

to Damian Conway, perl6-l...@perl.org

2009/5/4 Hinrik Örn Sigurðsson <hinri...@gmail.com>:

> Could Perl allow Pod blocks to be indented in #={ } comments? Maybe
> the "local" beginning-of-line in the comment could be considered to be
> at the position where the #={ appeared, or at the position of the
> first whitespace character after it.

I meant write /non-whitespace character/ here, of course.

Jon Lang

unread,

May 4, 2009, 1:50:13 PM5/4/09

to Hinrik Örn Sigurðsson, Damian Conway, perl6-l...@perl.org

This message deals strictly with the syntax of '#='-based POD; the
semantics is a separate issue.

--

I'd like '#=' to follow similar rules to what '#' follows, with the
caveat that a line beginning with '#' counts as a single-line comment
no matter what the second character is. Specifically, having the
second character be an = does not transform a full-line comment into a
single line of documentation. This preserves the ability to comment
out a series of lines by prepending a '#' to each of them without
having to worry about whether or not any given line will start doing
strange things. This means that '#=' can never appear at the start of
the line if you want it to denote documentation; but anywhere else is
fine. This should be a simple enough rule for a POD parser to handle
with minimal trouble.

There are additional complications that arise with '#=', such as:

say '#=';

In order to keep a POD parser from having to parse the code as well,
we'd want to say that the #= sequence initiates a POD block that
extends to the end of the line. IOW, the POD parser would end up
with:

CODE:
say '
POD:
';

But if that's the case, how would you ever actually print the
character sequence '#='? Conversely, if you say that the fact that
it's within a string literal means that it counts as string characters
rather than the start of some POD, the POD parser will need to know
how to identify string literals - which, as Perl illustrates, may not
be a simple task.

A possible middle ground might be to say that '#=' starts some POD,
but (e.g.) '\#=' doesn't: where a POD extractor would remove '#=' and
the following POD from the resulting code, it would replace '\#=' with
'#='. So to actually display the '#=' character sequence, you'd say:

say '\#=';

With this in play, you can place '#='-based POD literally anywhere
except at the beginning of a line.

--

With this in mind, I'd propose two forms of '#=', based on what comes
after the '='. If it is followed by one or more '['s, you have
bracketed POD which is terminated by an equal number of ']'s;
otherwise, you have POD which is terminated at the end of the current
line. Note that I specifically said '[' rather than 'a bracketing
character'; this is for the same reason that 'C<code>' is POD markup,
but 'C{code}' isn't. As well, I chose '[' instead of '<' to minimize
the need to double or triple up on the encapsulating brackets whenever
inline POD markup is involved. Compare:

#=<<This text has I<italics>!>>
#=[This text has I<italics>!]

#=<<<C<<$x>5>>>>>
#=[C<<$x>5>>]

Conversely:

#=<$x[5] = 7>
#=[[$x[5] = 7]]

...which isn't too bad IMHO (and is pretty close to a worst-case scenario).

Finally, I'd want bracketed POD to follow indentation rules similar to
what Hinrik suggested above: if the '#=' is preceded by nothing but
whitespace and every line within the bracketed POD starts with at
least as much whitespace, trim the shortest whitespace off of every
line within the POD.

--

Again, note that the above addresses only the syntax of '#='-based
POD, and not the semantics.

--
Jonathan "Dataweaver" Lang

Hinrik Sig

unread,

May 4, 2009, 2:24:03 PM5/4/09

to Jon Lang, Damian Conway, perl6-l...@perl.org

2009/5/4 Jon Lang <dataw...@gmail.com>:

> With this in mind, I'd propose two forms of '#=', based on what comes
> after the '='. If it is followed by one or more '['s, you have
> bracketed POD which is terminated by an equal number of ']'s;
> otherwise, you have POD which is terminated at the end of the current
> line. Note that I specifically said '[' rather than 'a bracketing
> character'; this is for the same reason that 'C<code>' is POD markup,
> but 'C{code}' isn't. As well, I chose '[' instead of '<' to minimize
> the need to double or triple up on the encapsulating brackets whenever
> inline POD markup is involved.

The end-of-line Pod option would be nice. I assume you mean something like this?

has $.something; #= This is an important attribute
has $!other;

However, I'm not sure why you would want to restrict the multi-line
form to only allow square brackets. The syntactically similar embedded
comments (#{}) allow almost any valid bracketing character pair, and
even Pod allows C«» nowadays.

> I'd like '#=' to follow similar rules to what '#' follows, with the
> caveat that a line beginning with '#' counts as a single-line comment
> no matter what the second character is. Specifically, having the
> second character be an = does not transform a full-line comment into a
> single line of documentation. This preserves the ability to comment
> out a series of lines by prepending a '#' to each of them without
> having to worry about whether or not any given line will start doing
> strange things. This means that '#=' can never appear at the start of
> the line if you want it to denote documentation; but anywhere else is
> fine.

That makes sense. Embedded comments (#{}) aren't allowed at the
beginning of a line either for the same reason. I hadn't thought of
that when I wrote my earlier post, so ignore what I said about
changing that bit. :)

As for a Pod parser not knowing what to do in case of "say '#='" and
such, I think we should assume that the doc tool /can/ parse Perl
(likely using STD.pm). Extracting comments (be it #, #{}, #=, or #={})
reliably from Perl code without knowing Perl doesn't seem like a
viable option. How do we know that the #= isn't inside a multi-line
string or heredoc, etc? It's easy to do with Pod blocks since they
were designed for that (i.e. = at the beginning of a line is always a
Pod directive), but since the kind of docs we're talking about here
are supposed to be tied to the code they appear next to, it wouldn't
do much good to be able to extract all the comments without
understanding the Perl code around them anyway.

Leon Timmermans

unread,

May 6, 2009, 11:42:19 AM5/6/09

to perl6-l...@perl.org

2009/5/4 Damian Conway <dam...@conway.org>:

>
> Each approach has advantages and disadvantages.
> Feedback via this forum would be most welcome.
>

Most people seem to lean towards the pod comments, though I disagree
with it on a simple ground: aesthetics. Python docstrings aren't just
useful, they are reasonably pretty too, #={} isn't. In fact I'd say
it's pretty fugly. I think that a feature that we intend to be used
often should be more aesthetically pleasing, if only to promote its
use.

Leon

Hinrik Sig

unread,

May 6, 2009, 12:53:05 PM5/6/09

to Leon Timmermans, perl6-l...@perl.org

On Wed, May 6, 2009 at 3:42 PM, Leon Timmermans <faw...@gmail.com> wrote:
> Most people seem to lean towards the pod comments, though I disagree
> with it on a simple ground: aesthetics. Python docstrings aren't just
> useful, they are reasonably pretty too, #={} isn't. In fact I'd say
> it's pretty fugly. I think that a feature that we intend to be used
> often should be more aesthetically pleasing, if only to promote its
> use.
>
> Leon
>

You have a point there. Python's triple double-quote (sextuple?)
delimiters are neater.

I'm not sure what the end-of-line version (see earlier discussion of
#=) of that would look like, though. What about using #= for
end-of-line docstrings and #== for multiline? Just a suggestion.