Synposis 26 - Documentation [alpha draft]

Damian Conway

unread,

Oct 8, 2006, 12:38:16 AM10/8/06

to perl6-l...@perl.org

Before Christmas, as promised!

I have a 95% complete Perl 5 implementation of a parser for this, but it is
too large to fit in the margin. I may release the beta of that next week, once
I'm home from my travels.

Damian

-----cut----------cut----------cut----------cut----------cut-----

=for comment
This file is deliberately specified in Perl 6 Pod format
Clearly a Perl 6 -> Perl 5 documentation translator is a high priority ;-)

=head1 TITLE

[DRAFT] Synopsis 26 - Documentation

=head1 AUTHORS

Damian Conway <dam...@conway.org>

Ingy dE<ouml>t Net <in...@cpan.org>

=head1 VERSION

=for table
Maintainer: Damian Conway <dam...@conway.org>
Date: 9 Apr 2005
Last Modified: 7 Oct 2006

=head1 Perldoc

Perldoc is an easy-to-use markup language with a simple, consistent
underlying document object model. Perldoc can be used for writing the
documentation for Perl 5 and Perl 6, and for Perl programs and modules,
as well as for other types of document composition.

Perldoc allows for multiple syntactic I<dialects>, all of which map onto
the same set of standard document objects. The standard dialect is named
L<"Pod"|#The Pod Dialect>.

=head1 The Pod Dialect

I<Pod> is an evolution of Perl 5's Plain Ol' Documentation (POD) markup.
Compared to Perl 5 POD, Perldoc's Pod dialect is much more uniform,
somewhat more compact, and considerably more expressive.

=head2 General syntactic structure

Pod blocks are specified using I<directives>, which always start with an
C<=> in the first column. Every Pod block directive may be written in
any of three equivalent forms: I<delimited style>, I<paragraph style>,
or I<abbreviated style>.

=head3 Delimited blocks

Delimited blocks are bounded by C<=begin> and C<=end> markers, both of
which are followed by a valid identifierN<A valid identifier is a
sequence of alphanumerics and/or underscores, beginning with an
alphabetic or underscore>, which is the typename of the block. Typenames
that are entirely lowercase (for example: C<=begin head1>) or entirely
uppercase (for example: C<=begin SYNOPSIS>) are reserved.

After the typename, the rest of the C<=begin> marker line is treated as
configuration information for the block. This information is used in
different ways by different types of blocks, and is specified using
Perl6ish C<:key{value}> or C<< key=>value >> pairs (which must, of
course, be constants since Perldoc is a specification language, not a
programming language).
See L<Synposis 2|http://dev.perl.org/perl6/doc/design/syn/S02.html#Literals>
for a summary of the Perl 6 pair notation.

The configuration section may be extended over subsequent lines by
starting those lines with an C<=> in the first column followed by a
horizontal whitespace character.

The lines following the opening delimiter and configuration are the data
or contents of the block, which continue until the block's C<=end> marker
line. The general syntax is:

=begin code :allow< R >
=begin R<BLOCK_TYPE> R<OPTIONAL CONFIG INFO>
= R<OPTIONAL EXTRA CONFIG INFO>
R<BLOCK CONTENTS>
=end R<BLOCK_TYPE>
=end code

For example:

=begin table :title<Table of Contents>
Constants 1
Variables 10
Subroutines 33
Everything else 57
=end Table

=begin Name :required
= :width(50)
The applicant's full name
=end Name

=begin Contact :optional
The applicant's contact details
=end Contact

Note that no blank lines are required around the directives, and blank
lines within the contents are always treated as part of the contents.

Note also that in the following specifications, a "blank line" is a line that
is either empty or that contains only whitespace characters. That is, a blank
line matches C</^\s*?$/>. Pod uses blank lines, rather than empty lines, as
delimiters (on the principle of least surprise).

=head3 Paragraph blocks

Paragraph blocks are introduced by a C<=for> marker and terminated by
the next Pod directive or the first blank line (which is I<not>
considered to be part of the block's contents). The C<=for> marker is
followed by the name of the directive and optional
configuration information. The general syntax is:

=begin code :allow< R >
=for R<BLOCK_TYPE> R<OPTIONAL CONFIG INFO>
= R<OPTIONAL EXTRA CONFIG INFO>
R<BLOCK DATA>

=end code

For example:

=for table :title<Table of Contents>
Constants 1
Variables 10
Subroutines 33
Everything else 57

=for Name :required
= :width(50)
The applicant's full name

=for Contact :optional
The applicant's contact details

Once again, blank lines are not required around the directive (this is a
universal feature of Pod).

=head3 Abbreviated blocks

Abbreviated blocks are introduced by an C<'='> sign in the
first column, which is followed immediately by the typename of the
block. The rest of the line is treated as block data, rather than as
configuration. The content terminates at the next Pod directive or the
first blank line (which is not part of the block data). The general
syntax is:

=begin code :allow< R >
=R<BLOCK_TYPE> R<BLOCK DATA>
R<MORE BLOCK DATA>

=end code

For example:

=table
Constants 1
Variables 10
Subroutines 33
Everything else 57

=Name The applicant's full name
=Contact The applicant's contact details

=head3 Block equivalence

The three equivalent block specifications (delimited, paragraph, and
abbreviated) are treated identically by the underlying documentation
model, so you can use whichever form is most convenient for a particular
documentation task. In the descriptions that follow, the abbreviated form
will generally be used, but should be read as standing for all three
forms equally.

For example, although L<#Headings> shows only:

=head1 TOP LEVEL HEADING

this automatically implies that you could also write that block as:

=for head1
TOP LEVEL HEADING

or:

=begin head1
TOP LEVEL HEADING
=end head1

=head3 Standard configuration options

Pod predefines a small number of standard configuration options that can be
applied uniformly to built-in block types. These include:

=begin item :term<C<:indented>>

This option specifies that the block is to be indented by a particular
amount. If the indentation amount includes a sign (i.e. C<+> or C<->) then
the indentation is relative to the indentation of the surrounding construct;
unsigned indentations are absolute offsets from the first column.

If a simple number is used (e.g. C<:indent(4)>) it indicates "columns" (for
fixed-width renderers) or "ems" for variable-width renderers. You can also
specify a unit after the number. For example:

=for para :indented<1 tab>

=for para :indented<1 em>

=for para :indented<1 col>

=for para :indented<1 lvl>

=for para :indented<4 sp>

=end item

=begin item :term<C<:numbered>>

This option specifies that the block is to be numbered. The most common
use of this option is to create L<numbered headings|#Numbered headings> and
L<ordered lists|#Ordered lists> but it can be applied to any block.

It is up to individual renderers to decide how to display any numbering
associated with other types of blocks.

=end item

=for item :term<C<:bulleted>>
This option specifies that a list item has a bullet. See L<#Unordered lists>.

=for item :term<C<:term>>
This option specifies that a list item is the definition of a term.
See L<#Definition lists>.

=begin item :term<C<:formatted>>

This option specifies that the contents of the block should be treated as if
they had one or more L<formatting codes|#Formatting codes> placed around them.

For example, instead of:

=comment The next para is important, so emphasize it...
=begin para
B<I<
Warning: Do not immerse in water. Do not expose to bright light.
Do not feed after midnight.
>>
=end para

you can just write:

=comment The next para is important, so emphasize it...
=begin para :formatted
Warning: Do not immerse in water. Do not expose to bright light.
Do not feed after midnight.
=end para

Like all formatting codes, these are inherently cumulative. For example,
if the block itself is already inside a formatting code, that formatting
code will still apply, in addition to the extra bold and italic
formatting specified by C<:formatted>. It is also possible to
I<remove> formatting using a C<:formatted> option, by specifying the
formatting code(s) with a minus sign before them:

=comment The next para is less important, so de-emphasize it...
=begin para :formatted<-B>
Fire. The Untamed Element. Oldest of Man's Mysteries. Giver of
warmth. Destroyer of forests. Right now I<this> building is on fire.
Yes! The building is on fire! Leave the building! Enact the age-old
drama of self-preservation!
=end para

=end item

=for item :term<C<:like<R<typename>>>>
This option specifies that a block or config has the same formatting
properties as the type named by its value. This is useful for creating
related L<configurations|#Block pre-configuration>.

=for item :term<C<:allow>>
This option expects a list of formatting codes that are to be recognized
within any C<V<>> codes nested inside the current block. The option is
most often used on C<=code> blocks to allow mark-up within those
(otherwise verbatim) blocks, though it can be used in I<any> block that
contains verbatim text. See L<#Formatting within code blocks>.

=head2 Blocks

Pod offers notations for specifying a range of standard block types...

=head3 Headings

Pod provides an unlimited number of levels of heading, specifed by the
T<=headR<N>> directive. For example:

=head1 A TOP LEVEL HEADING

=head2 A Second Level Heading

=head3 A third level heading

=head86 A "Missed it by I<that> much!" heading

While Pod parsers are required to recognize and distinguish all levels
of heading, Pod formatters are only required to provide distinct
I<renderings> of the first four levels of heading (though they may, of
course, provide more than that). Headings at levels without distinct
renderings would typically be rendered like the lowest distinctly
rendered level.

=head4 Numbered headings

You can specify that a heading is numbered using the C<:numbered> option. The
value of this option should be a sequence of characters containing a C<#>. If
the value is omitted (i.e. C<:numbered>), then it defaults to C<'#.'>.

The C<#> is replaced by the ordinal number of the heading block (within
its particular heading level):

=for head1 :numbered
The Problem

=for head1 :numbered
The Solution

=for head2 :numbered<#:>
Analysis

=for head3 :numbered<(#)>
Overview

=for head3 :numbered<(#)>
Details

=for head2 :numbered<#:>
Design

=for head1 :numbered
The Implementation

which would produce:

=begin indent :formatted
1. The Problem

2. The Solution

=begin indent
2.1: Analysis

=begin indent
(2.1.1) Overview

(2.1.2) Details
=end indent

2.2: Design
=end indent

3: The Implementation
=end indent

It is usually better to preset a numbering scheme for each heading
level, in a series of L<configuration blocks|#Block pre-configuration>:

=config head1 :numbered
=config head2 :numbered<#:>
=config head3 :numbered<(#)>

=head1 The Problem
=head1 The Solution
=head2 Analysis
=head3 Overview
=head3 Details
=head2 Design
=head1 The Implementation

Alternatively, as a short-hand, if the first whitespace-delimited word
in a heading consists of a single literal C<#> character, the C<#> is
removed and the heading is treated as if it had a C<:numbered> option:

=head1 # The Problem
=head1 # The Solution
=head2 # Analysis
=head3 # Overview
=head3 # Details
=head2 # Design
=head1 # The Implementation

Note that, even though renderers are not required to distinctly render
more than the first four levels of heading, they I<are> required to
correctly honour arbitrarily nested numberings. That is:

=head6 # The Rescue of the Kobayashi Maru

should produce something like:

=para :indented
B<2.3.8.6.1.9. The Rescue of the Kobayashi Maru>

=head3 Ordinary paragraph blocks

Ordinary paragraph blocks consist of text that is to be formatted into
a document at the currently level of nesting, with whitespace
squeezed, lines filled, and any special inline mark-up (see
L<#Formatting codes>) applied.

Ordinary paragraphs consist of one or more lines of text, each of which
starts with a non-whitespace character at column 1. The paragraph is
terminated by the first blank line or opening block directive. For example:

This is an ordinary paragraph.
Its text will be squeezed and
short lines filled. It is terminated by
the first blank line

This is another ordinary paragraph.
Its text will also be squeezed and
short lines filled. It is terminated by
the trailing directive on the next line
=head2 This is a heading block, not associated with the previous para

Within a C<=begin pod>/C<=end pod> block, ordinary paragraphs do not
require an explicit marker or delimiters, but there I<is> an explicit
C<para> marker available:

=para
This is an ordinary paragraph.
Its text will be squeezed and
short lines filled.

and likewise the longer C<=for> and C<=begin>/C<=end> forms. For example:

=begin para
This is an ordinary paragraph.
Its text will be squeezed and
short lines filled.
=end para

As the previous example implies, when any form of explicit C<para>
directive is used all whitespace at the start of each line is removed.
Hence the ordinary paragraph text no longer has to begin at column 1.

=head3 Code blocks

Code blocks are used to specify pre-formatted text, which should
be rendered without rejustification, without whitespace-squeezing, and without
recognizing any inline formatting codes. Typically these blocks are used
to show examples of code, data, or I/O, and are set using a fixed-width font.

A code block is specified as one or more lines of text, each of which
starts with a whitespace character. The block is terminated by a blank line.
For example:

This I<ordinary> paragraph introduces a following
B<code> block:

$this = 1 * code('block');
$that.is_specified(:by<indenting>);

There is also an explicit C<code> directive, which allows the contents
of code blocks to start at the first column, to start with whitespace
characters that are preserved exactly, and to contain blank lines:

The C<loud_update()> subroutine adds feedback:

=begin code

sub loud_update ($who, $status) {
say "$who -> $status.";

silent_update($who, $status);
}

=end code

The only limitation on the contents of a C<code> block is that they cannot
begin with an C<=> in the first column. If this is required, the leading C<=>
must be made C<verbatim|#Verbatim text>:

=begin code :allow<V>

V<=> in the first column is always a Perldoc directive

=end code

Renderers would normally indent the contents of any C<code> block (whether
it was implicitly or explicitly specified), but this can be overridden using
the C<:indented> option:

=comment
Indent this code block to the same column
as the surrounding text

=begin code :indented(+0)
sub demo {
say "Hello World";
}
=end code

=head4 Formatting within code blocks

Although C<=code> blocks automatically disregard all L<formatting
codes|#Formatting codes>, occasionally you may still need to use a
specific formatting code within a code block. For example, you may
wish to highlight a particular keyword in an example, by making it
bold. Or you might need to insert a non-ASCII character using the
C<E<>> entity code.

To do so, you can specify those formatting codes that should still be
recognized within any verbatim formatting inside a block, using
the C<:allow> option. The value of the C<:allow> option must be a
list of the names of one or more formatting codes. Those codes will
then remain active inside any implicit or explicit C<V<>> ("verbatim") code
within the block:

=begin code :allow
sub demo {
B<say> "Hello I<World>";
}
=end code

=head3 Lists

Lists in Pod are specified as a series of C<item> directives. No special
"container" directives or other delimiters are required to enclose the
entire list. For example:

The seven suspects are:

=item * Happy
=item * Dopey
=item * Sleepy
=item * Bashful
=item * Sneezy
=item * Grumpy
=item * Keyser Soze

Lists may be nested, using the C<=item1>, C<=item2>, C<=item3>, etc.
directives. Note that C<=item> is just an abbreviation for C<=item1>:

=item1 * Animal
=item2 * Vertebrate
=item2 * Invertebrate

=item1 * Mineral
=item2 * Solid
=item2 * Liquid
=item2 * Gas

which produces:

=begin indent
=item1 * Animal
=item2 - Vertebrate
=item2 - Invertebrate

=item1 * Mineral
=item2 - Solid
=item2 - Liquid
=item2 - Gas
=end indent

It is an error for a "level-N+1" C<item> directive (e.g. an C<=item2>,
C<=item3>, etc.) to appear anywhere except where there is a preceding
"level-N" C<item> directive. That is, an C<=item3> can only be specified if an
C<=item2> appears somewhere before it, and that C<=item2> can only appear if
there is a preceding C<=item1>.

Note that item blocks are not physically nested. That is, lower-level
items should I<not> be specified inside higher-level items:

=comment Wrong...
=begin item1
The choices are:
=item2 Liberty
=item2 Death
=item2 Beer
=end item1

=comment Correct...
=begin item1
The choices are:
=end item1
=item2 Liberty
=item2 Death
=item2 Beer

=head4 Multi-paragraph list items

Use the delimited form of the C<item> directive to specify items that
contain multiple paragraphs. For example:

Let's consider some common proverbs:

=begin item :bulleted
The rain in Spain falls mainly on the plain.

This is a common myth and an unconscionable slur on the Spanish
people, the majority of whom are extremely attractive.
=end item

=begin item :bulleted
The early bird gets the worm.

In deciding whether to become an early riser, it is worth
considering whether you would actually enjoy annelids
for breakfast.
=end item

As you can see, folk wisdom is often of dubious value.

which produces:

=begin indent
Let's consider some common proverbs:

=begin item :bulleted
The rain in Spain falls mainly on the plain.

This is a common myth and an unconscionable slur on the Spanish
people, the majority of whom are extremely attractive.
=end item

=begin item :bulleted
The early bird gets the worm.

In deciding whether to become an early riser, it is worth
considering whether you would actually enjoy annelids
for breakfast.
=end item

As you can see, folk wisdom is often of dubious value.
=end indent

=head4 Ordered lists

An item is part of an ordered list if the item has a C<:numbered>
configuration option:

=for item1 :numbered
Visito

=for item2 :numbered<[#]>
Veni

=for item2 :numbered<[#]>
Vidi

=for item2 :numbered<[#]>
Vici

This would produce:

=begin indent
1. Visito

=begin indent
[1.1] Veni

[1.2] Vidi

[1.3] Vici
=end indent
=end indent

Alternatively, if the first word of the item consists of a single C<#>
character, the item is treated as having a C<:bulleted<#.>> option:

=item1 # Visito
=item2 # Veni
=item2 # Vidi
=item2 # Vici

To specify an I<unnumbered> list item that starts with a literal C<#>, either
make it verbatim:

=item V<#> introduces a comment

or explicitly mark the item itself as being unnumbered:

=for item :!numbered
# introduces a comment

The numbering of successive C<=item1> list items increments
automatically, but is reset to 1 whenever any other kind of Perldoc block
appears between to C<=item1> blocks. For example:

The options are:

=item1 # Liberty
=item1 # Death
=item1 # Beer

The tools are:

=item1 # Revolution
=item1 # Deep-fried peanut butter sandwich
=item1 # Keg

would produce:

=begin indent
The options are:

=item1 1. Liberty
=item1 2. Death
=item1 3. Beer

The tools are:

=item1 1. Revolution
=item1 2. Deep-fried peanut butter sandwich
=item1 3. Keg

=end indent

The numbering of nested items (C<=item2>, C<=item3>, etc.) only resets
(to 1) when the higher-level item's numbering either resets or increments.

To prevent an C<=item1> from resetting after a non-item block, you can
specify the C<:continued> option:

=item1
Start social networking website

=item1
Attract tens of thousands of naE<iuml>ve users

I<???>

=for item1 :continued
Profit!!!

=head4 Definition lists

To create term/definition lists, specify the term as a configuration value
of the item, and the definition as the item's contents:

=for item :term<MAD>
Affected with a high degree of intellectual independence.

=for item :term<MEEKNESS>
Uncommon patience in planning a revenge that is worth while.

=for item :term<MORAL>
Conforming to a local and mutable standard of right.
Having the quality of general expediency.

An item that's specified as a term can still be numbered or bulleted:

=for item :numbered :term<SELFISH>
Devoid of consideration for the selfishness of others.

=for item :numbered :term<SUCCESS>
The one unpardonable sin against one's fellows.

=head4 Unordered lists

To create unordered lists, specify a C<:bulleted> configuration option:

=for item1 :bulleted<*>
Reading

=for item2 :bulleted<->
Writing

=for item3 :bulleted<(+)>
'Rithmetic

A valueless C<:bulleted> defaults to C<< :bulleted<*> >>.

As a short-cut, you can just start the contents with a lone C<*> as the
first whitespace-delimited word of the item's contents:

=item1 * Reading
=item2 * Writing
=item2 * 'Rithmetic

Pod renderers are free to choose how they render short-cut bullets,
either as asterisks on every level:

=item1 V<*> Reading
=item2 V<*> Writing
=item3 V<*> 'Rithmetic

or with distinct bullets for each level:

=item1 V<*> Reading
=item2 - Writing
=item3 + 'Rithmetic

Once again, you can use a L<C<config> directive|#Block pre-configuration>
to ensure that your lists conform to consistent bulleting conventions:

=config item1 :bulleted
=config item2 :bulleted<->
=config item3 :bulleted<+>

=item1 Reading
=item2 Writing
=item3 'Rithmetic

To specify an I<unbulleted> list item that starts with an asterisk,
either specify the starting character(s) verbatim:

=item V<*> is a Perl 5 sigil

or explicitly mark the item itself as being unbulleted:

=for item :!bulleted
* is a Perl 5 sigil

=head3 Indented blocks

Any block can be indented by specifying an C<:indented> option on it:

=begin para :indented<+1 lvl>
We are all of us in the gutter,
but some of us are looking at the stars!
=end para

However, this quickly becomes tedious if there are many such paragraphs
in a sequence, or if multiple levels of nesting are required:

=begin para :indented<+1 lvl>
We are all of us in the gutter,
but some of us are looking at the stars!
=end para
=begin para :indented<+2 lvl>
-- Oscar Wilde
=end para

So Pod provides a nestable C<=indent> block that indents all its contents:

=begin indent
We are all of us in the gutter,
but some of us are looking at the stars!
=begin indent
-- Oscar Wilde
=end indent
=end indent

By default an C<=indent> block indents its contents by one extra "level"
(i.e. by whatever the formatter considers one extra level of indentation
to be) relative to the surrounding block. However, this default can be
changed, by L<preconfiguring|#Block pre-configuration> the block type
with the C<:indented> option:

=config indent :indented<+4 ems>

=head3 Tables

=for Conjecture
# Larry has previously indicated Perldoc shouldn't have a built-in
# table type, but there seems to be a considerable amount of general
# support and desire for this highly useful feature. This section is
# included here in case Larry should decide to invoke Rule 2. ;-)

Tables can be specified in Perldoc using the C<=table> directive.
The table may be given a name using the C<:title> option.

Columns are separated by whitespace, vertical lines (C<|>), or line
intersections (C<+>). Rows can be specified in one of two ways: either
one row per line, with no separators; or multiple lines per row with
explicit horizontal separators (whitespace, intersections (C<+>), or
horizontal lines: C<->, C<=>, C<_>) between I<every> row. Either style
can also have an explicitly separated header row at the top.

Each individual table cell is separately formatted, as if it were a
nested C<=para>.

This means you can create tables compactly, line-by-line:

=table
The Shoveller Eddie Stevens King Arthur's singing shovel
Blue Raja Geoffrey Smith Master of cutlery
Mr Furious Roy Orson Ticking time bomb of fury
The Bowler Carol Pinnsler Haunted bowling ball

or line-by-line with multi-line headers:

or with multi-line headers I<and> multi-line data:

=begin table :title('The Other Guys')

Secret
Superhero Identity Superpower
============= =============== ===================
The Shoveller Eddie Stevens King Arthur's
singing shovel

Blue Raja Geoffrey Smith Master of cutlery

Mr Furious Roy Orson Ticking time bomb
of fury

The Bowler Carol Pinnsler Haunted bowling ball

=end table

=head3 Named blocks

Blocks whose names are not recognized as Pod built-ins are assumed to be
destined for specialized formatters or parser plug-ins. For example:

=for Xhtml
<object type="video/quicktime" data="onion.mov">

or:

=Image http://www.perlfoundation.org/images/perl_logo_32x104.png

Named blocks are converted by the Perldoc parser to block objects,
specifically, to a subclass of the standard C<Block> class. The
resulting object's C<.typename> method retrieves the name of the block
type: C<'Xhtml'>, C<'Image'>, etc. The object's C<.contents> method
retrieves a list of the block's (verbatim, unformatted) contents.

Note that all block names consisting entirely of lower-case or entirely of
upper-case letters are reserved.

=head3 Comments

Comments are Pod blocks that are never rendered by any formatter. They
are, of course, still included in any internal Perldoc representation,
and are accessible via the Perldoc APIs.

Comments are useful for meta-documentation (documenting the documentation):

=comment Add more here about the algorithm

and for temporarily removing parts of a document:

=item # Retreat to remote Himalayan monastery

=item # Learn the hidden mysteries of space and time

=item # Achieve enlightenment

=begin comment
=item # Prophet!
=end comment

Note that, since the Perl interpreter never executes embedded Perldoc
blocks, C<comment> blocks can also be used as (nestable!) block comments
in Perl 6:

# This is a Perl 5 style
# code comment
# spanning multiple lines

=begin comment
This is a Perl 6 style
delimited code comment
spanning multiple lines
=end comment

=head3 Other standard block types

All uppercase block typenames are reserved for specifying standard
documentation components. In particular, all the standard components of
Perl documentation have reserved uppercase typenames:

=NAME
=VERSION
=SYNOPSIS
=DESCRIPTION
=USAGE
=INTERFACE
=METHOD
=SUBROUTINE
=OPTION
=DIAGNOSTIC
=ERROR
=WARNING
=DEPENDENCY
=BUG
=SEEALSO
=ACKNOWLEDGEMENT
=AUTHOR
=COPYRIGHT
=DISCLAIMER
=LICENCE
=LICENSE
=SECTION
=CHAPTER
=APPENDIX

The plural forms of each of these keywords are also reserved, and are
aliases for the singular forms.

Most of these blocks would typically be used in their full delimited forms:

=begin SYNOPSIS
use Perldoc::Parser

my Perldoc::Parser $parser .= new();

my $tree = $parser.parse($fh);
=end SYNOPSIS

The use of these reserved keywords is not required; you can still just write:

=head1 SYNOPSIS
=begin code
use Perldoc::Parser

my Perldoc::Parser $parser .= new();

my $tree = $parser.parse($fh);
=end code

However, using the keywords adds semantic information to the
documentation, which may assist various formatters, summarizers,
coverage tools, and other utilities.

=head2 Formatting codes

Formatting codes provide a way to add inline mark-up to a piece of text
within the contents of (most types of) block. They are themselves a type
of block, and most of them may nest sequences of any other type of block
(most often, other formatting codes). Specifically, you can nest
comments blocks in the middle of a formatting code:

B

All Pod formatting codes consist of a single capital letter followed
immediately by a set of angle brackets. The brackets contain the text or
data to which the formatting code applies. You can use a set of single
angles (C«<...>»), a set of double angles (C<«...»>), or multiple
single-angles (C«<<<...>>>»).

Within the angles, sequences of angles that are the same as the delimiter
must be balanced. For example:

C<$foo<bar>>

C<< $foo<<bar>> >>

If you need an unbalanced angle, use different delimiters (or more
consecutive angles than your delimiter contains):

C«$foo < $bar»
C<<$foo < $bar>>

The Perl 5 heredoc syntax was: C« <<END_MARKER »
The Perl 5 heredoc syntax was: C<<< <<END_MARKER >>>

A formatting code ends at the matching closing angle bracket, or at the
end of the enclosing block or formatting code in which the opening angle
bracket was specified (whichever comes first). Pod parsers are required
to issue a warning whenever a formatting code is terminated by the end
of an outer block rather than by its own delimiter (unless the user
explicitly disables the warning).

=head3 Typesetting specifiers

The C<B<>> formatting code specifies that the contained text is
to be set in a B<bold style>.

The C<I<>> formatting code specifies that the contained text is
to be set in an I<italic style>

The C<T<>> formatting code specifies that the contained text is
to be set in a T<typewriter style> (typically fixed width).

The C<C<>> formatting code specifies that the contained text is
to be set in a C<code style>, typically fixed width. The contents
of a C<C<>> code are always treated as L<verbatim | #Verbatim text> and
L<space-preserving | #Space-preserving text>
Hence, the C<C<...>> code is usually just a short-hand for
C<T<S<V<...>>>> (though specific formatters are
always free to chose some other visual representation for code text).

The C<D<>> formatting code specifies that the contained text is
to be set in a "deleted" or "diff" style (typically strike-through).

The C<U<>> formatting code specifies that the contained text is
to be set in an underlined style.

The C<R<>> formatting code specifies that the contained text is a
replacable item or a placeholder. It is used to indicate a component of a
syntax or specification that should be replaced by an actual value:
For example:

The C<link> command has the syntax:
C<link R<source_file> R<target_file>>

Typically replacables are set in fixed-width italics.

These (and most other) formatting codes may be arbitrarily nested.
Formatters should endeavour to convey that nesting accurately, using
appropriate typesetting conventions. For example, something like:

I<So>, she thought, I<the I<Marie Celeste> mystery B<is> solved at last!>

should produce:

=indent
I<So>, she thought, I<the> Marie Celeste I<mystery B<is> solved at last!>

with the nested italics switching back to roman in the traditional manner.

=head3 Verbatim text

The C<V<>> formatting code disregards every apparent formatting code within
it, treating them as being verbatim text. For example:

The B<V< V<> >> formatting code disarms other codes
such as T<V< I<>, B<> and C<> >>.

The hash entry T<V< %LOAD<full> >> indicates whether the
load is full

Note, however that the C<V<>> code only changes the way its
contents are parsed, I<not> the way they are rendered. That is, the
contents are still wrapped and formatted like plain text, and the
effects of any formatting codes surrounding the C<V<>> code
are still applied to its contents. For example the previous example
is rendered:

=begin indent

The B<V< V<> >> formatting code disarms other codes
like T<V< I<>, B<>, E<>, and C<> >>.

The hash entry T<V< %LOAD<full> >> indicates whether the
load is full

=end indent

=back

You can prespecify formatting codes that remain active within
a C<V<>> code, using the L<C<:allow>|#Formatting within code blocks>
option.

=head3 Comments

The C<Z<>> formatting code indicates that its contents constitute a
(zero-width) comment, and should not be rendered by any formatter.
For example:

The "exeunt" command Z<Think about renaming this command?> is used
to quit all applications.

Previously, the C<Z<>> code was widely used to break up text that would
otherwise be considered mark-up:

Previously, the T<ZZ<><>> code was widely used to break up text
that would otherwise be considered mark-up.

That still works, but is now better done with a verbatim formatting code:

Previously, the T<V<Z<>>> code was widely used to break up text
that would otherwise be considered mark-up.

Moreover, the C<C<>> code automatically treats its contents as being
verbatim, which often eliminates the need for the C<V<>> as well:

Previously, the C<Z<>> code was widely used to break up text
that would otherwise be considered mark-up.

The C<Z<>> formatting code is the inline equivalent of a C<=comment>
block.

=head3 Links

The C<L<>> code is used to specify all kinds of links, filenames,
and cross-references (both internal and external).

A link specification consists of a I<scheme specifier> terminated by a
colon, followed by an I<external address> (in the scheme's preferred
syntax), followed by an I<internal address> (again, in the scheme's syntax).
All three components are optional (though at least one must be present in
any link specification).

Usually, in schemes where an internal address makes sense, it will be
separated from the preceding external address by a C<#>, unless the
particular addressing scheme requires some other syntax. When new
addressing schemes are created specifically for Perldoc it is strongly
recommended that C<#> be used to mark the start of internal addresses.

Standard schemes include:

=begin item :term('C<http:> and C<https:>')
A standard URL. For example:

This module needs the LAME library
(available from L<http://www.mp3dev.org/mp3/>)

=end item

=begin item :term<C<file:>>

A filename on the local system. For example:

Next, edit the config file (L<file:~/.configrc>).

=end item

=begin item :term<C<man:>>

A link to the system man pages. For example:

This module implements the standard
Unix L<man:find(1)> facilities.

=end item

=begin item :term<C<doc:>>

A link to some other Perldoc documentation, typically a module or core
Perl documentation. For example:

You may wish to use L<doc:Data::Dumper> to
view the results. See also: L<doc:perldata>.

=end item

C<:doc> is the default link scheme, in that if the scheme specifier is
omitted in any link, it is assumed to be C<doc:>.

To refer to a specific section within a webpage, manpage, or Perldoc
document, add the name of that section after the main link, separated by
a C<#>. For example:

Also see: L<man:bash(1)#Compound Commands>,
L<doc:perlsyn#For Loops>, and
L<http://dev.perl.org/perl6/syn/S04.html#The_for_statement>

To refer to a section of the current document, omit the external address:

This mechanism is described under L<doc:#Special Features> below.

The scheme may also be omitted in that case:

This mechanism is described under L<#Special Features> below.

Normally a link is presented as some rendered version of the link
specification itself. However, you can specify an alternate
presentation by prefixing the link with the desired text and a
vertical bar. For example:

This module needs the L<LAME library|http://www.mp3dev.org/mp3/>.

You could also write the code
L<in Latin|doc:Lingua::Romana::Perligata>

=head3 Placement links

A second kind of link--the C<P<>> or placement link--works in the
opposite direction. Instead of directing focus out to another document,
it allows you to draw the contents of another document into your own.

In other words, the C<P<>> formatting code takes a URL
and--if possible--places the contents of that document inline in place
of the code itself.

C<P<>> codes are handy for breaking out standard components of
your documentation set into reusable components that can then be
incorporated directly into multiple documents. For example:

=COPYRIGHT

P<file:/shared/docs/std_copyright.pod>

=DISCLAIMER

P<http://www.megagigatera.com/std/disclaimer.txt>

might produce:

=begin indent

B<COPYRIGHT>

B<DISCLAIMER>

ABSOLUTELY NO WARRANTY IS IMPLIED. NOT EVEN OF ANY KIND. WE HAVE SOLD
YOU THIS SOFTWARE WITH NO HINT OF A SUGGESTION THAT IT IS EITHER USEFUL
OR USABLE. AS FOR GUARANTEES OF CORRECTNESS...DON'T MAKE US LAUGH! AT
SOME TIME IN THE FUTURE WE MIGHT DEIGN TO SELL YOU UPGRADES THAT PURPORT
TO ADDRESS SOME OF THE APPLICATION'S MANY DEFICIENCIES, BUT NO PROMISES
THERE EITHER. WE HAVE MORE LAWYERS ON STAFF THAN YOU HAVE TOTAL
EMPLOYEES, SO DON'T EVEN *THINK* ABOUT SUING US. HAVE A NICE DAY.

=end indent

If a renderer cannot find or access the external data source for a
placement link, it must issue a warning and render the URL directly in
some form. For example:

=begin indent

B<COPYRIGHT>

See: /shared/docs/std_copyright.pod

B<DISCLAIMER>

See: http://www.megagigatera.com/std/disclaimer.txt

=end indent

=head3 Space-preserving text

Any text enclosed in an C<S<>> code is formatted normally, except that
every whitespace character in it--including any newline--is preserved.
These characters are also treated as being non-breaking (except for the
newlines, of course). For example:

The emergency signal is:
S< dot dot dot dash dash dash dot dot dot>.

would be formatted like so:

=indent
The emergency signal is:
E<nbsp>E<nbsp>dotE<nbsp>dotE<nbsp>dotE<nbsp>E<nbsp>E<nbsp>dashE<nbsp>dashE<nbsp>dashE<nbsp>E<nbsp>E<nbsp>E<nbsp>dotE<nbsp>dotE<nbsp>dot.>

rather than:

=indent
The emergency signal is: dot dot dot dash dash dash dot dot dot.

=head3 Entities

To include named Unicode or XML entities, use the C<E<>> code.

If the contents are not a number, they are interpreted as an upper-case
Unicode character name, or as a lower-case XML entity. For example:

Perl 6 makes considerable use of E<LEFT DOUBLE ANGLE BRACKET>
and E<RIGHT DOUBLE ANGLE BRACKET>.

or, equivalently:

Perl 6 makes considerable use of E<laquo> and E<raquo>.

If the contents of the C<E<>> are a number, that number is
treated as the decimal Unicode value for the desired codepoint.
For example:

Perl 6 makes considerable use of E<171> and E<187>.

You can also use explicit binary, octal, decimal, or hexadecimal numbers:

Perl 6 makes considerable use of E<0b10101011> and E<0b10111011>.
Perl 6 makes considerable use of E<0o253> and E<0o273>.
Perl 6 makes considerable use of E<0d171> and E<0d187>.
Perl 6 makes considerable use of E<0xAB> and E<0xBB>.

Multiple consecutive entities can be specified in a single C<E<>> code,
separated by semicolons:

Perl 6 makes considerable use of E<laquo;hellip;raquo>.

The C<E<>> formatting code is like any other in that it is disabled
inside a C<V<>>. In particular, it is not special inside the implicit
C<V<>> provided by a C<C<>> formatter or C<=code> block. To insert an
entity in an inlined code fragment, format that code with C<T<...E<>...>>
instead of C<C<...E<>...>>:

In Perl 6 the use of T<E<laquo>> and T<E<raquo>> as delimiters
implies shell-like interpolation.

To insert an entity in a code block, use the
L<C<:allow> option|#Formatting within code blocks> on that block:

=begin code :allow<E>

In Perl 6 the use of E«laquo» and E«raquo» as delimiters
implies shell-like interpolation.

=end code

=head3 Indexing terms

Anything enclosed in an C<X<>> code is an index entry. The contents
of the code are both formatted into the document and used as the
(case-insensitive) index entry:

An X<array> is an ordered list of scalars indexed by number,
starting with 0. A X<hash> is an unordered collection of scalar
values indexed by their associated string key.

You can specify an index entry where the indexed text and the index entry are
different, by separating the two with a vertical bar:

An X<array|arrays> is an ordered list of scalars indexed by number,
starting with 0. A X<hash|hashes> is an unordered collection of
scalar values indexed by their associated string key.

In the two-part form, the index entry comes after the bar and is
case-sensitive.

You can specify hierarchical index entries by separating indexing levels
with commas:

An X<array|arrays, definition of> is an ordered list of scalars
indexed by number, starting with 0. A X<hash|hashes, definition of>
is an unordered collection of scalar values indexed by their
associated string key.

You can specify two or more entries for a single indexed text, by separating
the entries with semicolons:

A X<hash|hashes, definition of; associative arrays>
is an unordered collection of scalar values indexed by their
associated string key.

The indexed text can be empty, creating a "zero-width" index entry:

X<|puns, bad>This is called the "Orcish Manoeuvre"
because you "OR" the "cache".

=head3 Notes

Anything enclosed in an C<N<>> code is an inline annotation.
For example:

Use a C<for> loop instead.N<The Perl 6 C<for> loop is far more
powerful than its Perl 5 predecessor.>

Different formatters may render such annotations in a variety of
ways: as footnotes, as endnotes, as sidebars, as pop-ups, as
expandable tags, etc. They are never, however, rendered as
unmarked in-line text. So the previous example might be rendered as:

=indent
Use a C<for> loop instead.E<dagger>

and later:

=begin indent
B<Footnotes>

=for item :bulleted<E<dagger>>
The Perl 6 C<for> loop is far more powerful than its Perl 5 predecessor.
=end indent

=head3 User-defined formatting codes

Perldoc extensions and plug-ins can define their own formatting codes,
using the C<M<>> code. An C<M<>> code must start with a
colon-terminated scheme specifier. The rest of the enclosed text is
treated as the contents of the formatting code. For example:

=heading1 Overview of the M<Metadata: $?CLASS.name > class

The C<M<>> formatting code is the inline equivalent of a
L<named block|#Named blocks>.

If the formatting code is unrecognized, the contents of the code (i.e.
everything after the first colon) would normally be treated as
ordinary text.

=head2 Encoding

By default, Perldoc assumes that documents are Unicode, encoded in one
of the three common schemes (UTF-8, UTF-16, or UTF-32). The particular
scheme a document uses is autodiscovered by examination of the first few
bytes of the file (where possible). If the autodiscovery fails, UTF-8 is
assumed, and parsers should treat any non-UTF-8 bytes later in the
document as fatal errors.

At any point in a document, you can explicitly set or change the encoding
of its content using the C<encoding> directive:

=encoding ShiftJIS

=encoding Macintosh

=encoding KOI8-R

The specified encoding is used from the start of the I<next> line in
the document. If a second C<=encoding> directive is encountered, the
current encoding changes again after that line. Note, however, that
the second encoding directive must itself be encoded using the first
encoding scheme.

This applies to an C<=encoding> directive at the very beginning of the
file as well: it must itself be encoded in UTF-8, -16, or -32. However,
as a special case, the autodiscovery mechanism will (as far as possible)
also attempt to recognize "self-encoded" C<=encoding> directives that
begin at the first byte of the file. For example, at the start of a
ShiftJIS-encoded file you can specify C<=encoding ShiftJIS> in the
ShiftJIS encoding.

=head2 Modules

Perldoc provides a mechanism by which you can extend the syntax and semantics
of your documentation notation: the C<=use> directive.

Specifying a C<=use> causes a Perldoc processor to load the corresponding
Perldoc module at that point, or to throw an exception if it cannot.
Such modules can register new types of block directives and formatting
codes.

Note that a module loaded via a C<=use> statement can affect the
I<interpretation> of subsequent blocks, but not the initial parsing of
those blocks. The block directives themselves must still conform to the
syntax described in this document. Typically, a module will change the
way that renderers parse the I<contents> of specific blocks.

The general syntax is:

=for code :allow< R >
=use R<MODULE_NAME> R<OPTIONAL CONFIG DATA>
= R<OPTIONAL EXTRA CONFIG DATA>

For example:

=comment Install the Tree plugin to show pretty trees...
=use Perldoc::Plugin::Tree :autodetect

=begin Tree

=end Tree

The C<=use> statement causes the Perldoc processor immediately to look
for a module named C<Perldoc::Plugin::Tree> and to load it with the
specified import option (C<:autodetect>). For example, if the processor
were written in Perl 6, the C<=use> directive in the previous example
might cause it to execute:

require Perldoc::Plugin::Tree :autodetect
err die "=use failed ($!) at $LOCATION_IN_DOCUMENT\n";

You can use fully and partially specified module names (as with Perl 6
modules):

=use Perldoc::Plugin::XHTML-1.2.1-(*)

and pass any options you wish:

=use Perldoc::Plugin::Image :Jpeg prefix=>'http://dev.perl.org'

Note that C<=use> is a fundamental Perldoc directive, like C<=begin> or
C<=for>; it is not an instance of an L<abbreviated block|#Abbreviated
blocks>. Hence there is no paragraph or delimited form of the C<=use>
directive (just as there is no paragraph or delimited form of the
C<=begin> or C<=for> directives).

=head2 Block pre-configuration

The C<=config> directive allows you to prespecify standard configuration
information that is applied to every block of a particular type.

For example, to specify particular formatting for different levels of
heading, you could preconfigure all the heading directives with
appropriate formatting schemes:

=config head1 :formatted :numbered
=config head2 :like<head1> :formatted
=config head3 :formatted
=config head4 :like<head3> :formatted

The general syntax for configuration blocks is:

=for code :allow< R >
=config R<BLOCK_TYPE> R<CONFIG OPTIONS>
= R<OPTIONAL EXTRA CONFIG OPTIONS>

Like C<=use>, a C<=config> is a directive, not a block. Hence, there is no
paragraph or delimited form of the C<=config> directive.

Note that, if a particular block later specifies a configuration option
with the same key, that option overrides the pre-configured option. For
example, to specify a non-bold second-level heading:

=for head2 :formatted
Details

The C<:like> option is replaced by the complete formatting information
of the named block type (which must already have been preconfigured).
Any additional formatting specifications are subsequently added to
that config.

C<=config> specifications are lexically scoped to the block in which
they're specified.

You can also preconfigure L<formatting codes|#Formatting codes>, by naming
them with a pair of angles as a suffix. For example:

=comment Always allow E<> codes in any (implicit or explicit) V<> code...
=config V<> :allow<E>

=comment All code to be italiciized...
=config C<> :formatted

Note that, even though the code is named using single-angles, the
preconfiguration applies regardless of the actual delimiters used
on subsequent instances of the code.

-----END----------END----------END----------END----------END-----

Daniel Hulme

unread,

Oct 8, 2006, 5:54:11 AM10/8/06

to perl6-l...@perl.org

I liked it. Just one nit, near the end:

>You can also preconfigure L<formatting codes|#Formatting codes>, by
>naming them with a pair of angles as a suffix. For example:
>
> =comment Always allow E<> codes in any (implicit or explicit) V<>
> code... =config V<> :allow<E>
>
> =comment All code to be italiciized...

^^

> =config C<> :formatted
>
>Note that, even though the code is named using single-angles, the
>preconfiguration applies regardless of the actual delimiters used
>on subsequent instances of the code.

s/italiciized/italicized/ in the marked place.

--
<Customer> Waiter, waiter! There's a fly in my soup!
<Waiter> That's not a bug, it's a feature.
http://surreal.istic.org/ It sounded right in my head.

Jonathan Lang

unread,

Oct 8, 2006, 3:40:32 PM10/8/06

to dam...@conway.org, perl6-l...@perl.org

The only thing that I'd like to see changed would be to allow a more
flexible syntax for formatting codes - in particular, I'd rather use
something analogous to the 'embedded comments' described in S02,
replacing the leading # with an appropriate capital letter (as defined
by Unicode) and insisting on a word break just prior to it.

I'd also prefer a more Wiki-like dialect at some point (e.g.,
'__underlined text__', '_italicized text_' and '*bold*' instead of
'U<underlined text>', 'I<italicized text>' and 'B<bold>'); but that
can wait.

Otherwise, looks good.

--
Jonathan "Dataweaver" Lang

Dave Whipp

unread,

Oct 8, 2006, 3:53:57 PM10/8/06

to perl6-l...@perl.org

Damian Conway wrote:
> Delimited blocks are bounded by C<=begin> and C<=end> markers...
> ...Typenames that are entirely lowercase (for example: C<=begin

> head1>) or entirely uppercase (for example: C<=begin SYNOPSIS>)
> are reserved.

I'm not a great fan of this concept of "reservation" when there is no
mechanism for its enforcement (and this is perl...). Typical programmers
ignore it, just as they ignore similar reservations of the type
"lower-case subroutine names are reserved".

If "use strict" will flag an error for their use, then perhaps "is
reserved" would become "must be predeclared" (imported via =use). Then
any module will be able to add its own typenames, without needing some
distinguishing "this is a core module" trait to enable the typename.
Reservation then simply becomes a note to module authors, not part of
the language specification.

Damian Conway

unread,

Oct 11, 2006, 3:24:16 PM10/11/06

to Jonathan Lang, perl6-l...@perl.org

Jonathan Lang wrote:

> The only thing that I'd like to see changed would be to allow a more
> flexible syntax for formatting codes - in particular, I'd rather use
> something analogous to the 'embedded comments' described in S02,
> replacing the leading # with an appropriate capital letter (as defined
> by Unicode) and insisting on a word break just prior to it.

It was a deliberate decision to restrict the delimiters to angles. Unlike
embedded comments, formatting codes are predominantly embedded in text, not
code, so it's important to keep them easy-to-locate (i.e. with a consistent
delimiter) and not to allow too many syntaxes (which increases the chance of
unintended codes in normal text).

A leading word break is not really practical either, since documenters will
need to use codes in the middle of words:

PractI<ise> (and then practI<ice>) saying "GarE<ccedil>on!"

> I'd also prefer a more Wiki-like dialect at some point (e.g.,
> '__underlined text__', '_italicized text_' and '*bold*' instead of
> 'U<underlined text>', 'I<italicized text>' and 'B<bold>'); but that
> can wait.

That's Kwid. Which Ingy has proposed as a standard Perldoc dialect.
You'll be able to flip into kwid mode (for Perldoc parsers that support it) using:

=begin kwid

=end kwid

Damian

Damian Conway

unread,

Oct 12, 2006, 12:55:57 AM10/12/06

to perl6-l...@perl.org

Dave Whipp wrote:

> I'm not a great fan of this concept of "reservation" when there is no
> mechanism for its enforcement (and this is perl...).

What makes you assume there will be no mechanism for enforcement? The standard
Pod parser (of which I have a 95% complete Perl 5 implementation) will
complain bitterly--as in cyanide--when unknown pure-upper or pure-lower block
names are used.

The whole point of reserving these namespaces is not to prevent users from
misusing them, but to ensure that when we eventually get around to using a
particular block name, and those same users start screaming about it, we can
mournfully point to the passage in the original spec and silently shake our
heads. ;-)

Damian

Tim Bunce

unread,

Oct 12, 2006, 6:38:18 PM10/12/06

to Damian Conway, perl6-l...@perl.org

On Thu, Oct 12, 2006 at 02:55:57PM +1000, Damian Conway wrote:
> Dave Whipp wrote:
>
> >I'm not a great fan of this concept of "reservation" when there is no
> >mechanism for its enforcement (and this is perl...).
>
> What makes you assume there will be no mechanism for enforcement? The
> standard Pod parser (of which I have a 95% complete Perl 5 implementation)
> will complain bitterly--as in cyanide--when unknown pure-upper or
> pure-lower block names are used.

That's going to cause pain when people using older parsers try to read
docs written for newer ones. Would a loud warning plus some best-efforts
fail-safe parsing be possible?

Tim.

Jonathan Lang

unread,

Oct 12, 2006, 6:57:24 PM10/12/06

to perl6language,

Tim Bunce wrote:
> Damian Conway wrote:
> > Dave Whipp wrote:
> > >I'm not a great fan of this concept of "reservation" when there is no
> > >mechanism for its enforcement (and this is perl...).
> >
> > What makes you assume there will be no mechanism for enforcement? The
> > standard Pod parser (of which I have a 95% complete Perl 5 implementation)
> > will complain bitterly--as in cyanide--when unknown pure-upper or
> > pure-lower block names are used.
>
> That's going to cause pain when people using older parsers try to read
> docs written for newer ones.

If I understand you correctly, the pain to which you're referring
would come from the possibility of a name that's reserved by the newer
version of Pod, but not by the older version. Wouldn't the simplest
solution be to let a Pod document announce its own version, much like
Perl can?

--
Jonathan "Dataweaver" Lang

Tim Bunce

unread,

Oct 13, 2006, 1:53:52 AM10/13/06

to Jonathan Lang, Tim Bunce

On Thu, Oct 12, 2006 at 03:57:01PM -0700, Jonathan Lang wrote:
> Tim Bunce wrote:
> >Damian Conway wrote:
> >> Dave Whipp wrote:
> >> >I'm not a great fan of this concept of "reservation" when there is no
> >> >mechanism for its enforcement (and this is perl...).
> >>
> >> What makes you assume there will be no mechanism for enforcement? The
> >> standard Pod parser (of which I have a 95% complete Perl 5
> >implementation)
> >> will complain bitterly--as in cyanide--when unknown pure-upper or
> >> pure-lower block names are used.
> >
> >That's going to cause pain when people using older parsers try to read
> >docs written for newer ones.
>

> If I understand you correctly, the pain to which you're referring
> would come from the possibility of a name that's reserved by the newer
> version of Pod, but not by the older version.

Yes.

> Wouldn't the simplest solution be to let a Pod document announce its
> own version, much like Perl can?

How would that actually help? The old parser still wouldn't know what
new keywords have been added or how to parse them.

Tim.

Damian Conway

unread,

Oct 13, 2006, 5:48:40 AM10/13/06

to perl6-l...@perl.org

Tim Bunce wrote:

> That's going to cause pain when people using older parsers try to read
> docs written for newer ones. Would a loud warning plus some best-efforts
> fail-safe parsing be possible?

Indeed. And that's a important use-case.

But best-effort is difficult when you're talking about future-compatibility
of core constructs, which these are supposed to be. I guess best-effort
for uppercase (semantic) mark-up is just to map:

=begin UNKNOWN
mumble mumble mumble
=end UNKNOWN

to:

=head1 UNKNOWN

=begin para
mumble mumble mumble
=end para

But it's harder to see how to cope with unknown all-lower directives:

=begin frobnication
...
=end frobnication

=for franistat

=wassname

Especially the last of those, since it might be either an abbreviated
block or a pure directive. I suspect that these should either still be
fatal, or they should warn-and-ignore.

Damian

Damian Conway

unread,

Oct 13, 2006, 6:27:47 AM10/13/06

to perl6language,

Jonathan Lang wrote:

> If I understand you correctly, the pain to which you're referring
> would come from the possibility of a name that's reserved by the newer
> version of Pod, but not by the older version. Wouldn't the simplest
> solution be to let a Pod document announce its own version, much like
> Perl can?

That would presumably be:

=use 6.0.2

Though it's not quite an exact analogy. If a Perl interpreter isn't recent
enough, it can't really fall back on "best attempt" to execute a program.
Code is either valid or unusable.

For documentation, even if you don't know how to interpret a particular
mark-up, you can always just display it as raw text and the reader can
still get most of the benefit of it.

It's hard to imagine a circumstance in which a refusal to render Pod:

Perldoc v6.0.2 required--this is only v6.0.1, stopped at S26.pod, line 1

would be preferable to actually rendering that Pod, no matter how badly.

Damian

Brent 'Dax' Royal-Gordon

unread,

Oct 13, 2006, 8:06:16 PM10/13/06

to dam...@conway.org, perl6-l...@perl.org

On 10/7/06, Damian Conway <dam...@conway.org> wrote:
> The C<I<>> formatting code specifies that the contained text is
> to be set in an I<italic style>

I've probably been hanging around Web standards nazis for too long,
but can we get a separate code to mark the title of a document that
can't be linked to (say, a book) along the lines of HTML's <cite> tag?

> =begin item :term('C<http:> and C<https:>')
> A standard URL. For example:
>
> This module needs the LAME library
> (available from L<http://www.mp3dev.org/mp3/>)
>
> =end item
>
> =begin item :term<C<file:>>
>
> A filename on the local system. For example:
>
> Next, edit the config file (L<file:~/.configrc>).
>
> =end item
>
> =begin item :term<C<man:>>
>
> A link to the system man pages. For example:
>
> This module implements the standard
> Unix L<man:find(1)> facilities.
>
> =end item
>
> =begin item :term<C<doc:>>
>
> A link to some other Perldoc documentation, typically a module or core
> Perl documentation. For example:
>
> You may wish to use L<doc:Data::Dumper> to
> view the results. See also: L<doc:perldata>.
>
> =end item

Actually, a couple more link schemes could probably handle my previous request:

L<Perl 6 and Parrot Essentials|urn:isbn:059600737X>
L<Parrot Magic Cookies in The Perl Review|urn:issn:1553667X/3/0#11>

> If a renderer cannot find or access the external data source for a
> placement link, it must issue a warning and render the URL directly in
> some form. For example:
>
> =begin indent
>
> B<COPYRIGHT>
>
> See: /shared/docs/std_copyright.pod
>
> B<DISCLAIMER>
>
> See: http://www.megagigatera.com/std/disclaimer.txt
>
> =end indent

Oooh, transclusion--shiny. Perhaps the pipe character can be used to
provide alternative text:

P<See standard copyright terms in the
distribution.|file:/shared/docs/std_copyright.pod>

Also, what about non-textual files? If I type
P<http://www.perlfoundation.org/images/onion_64x64.png>, will an onion
appear in my Pod document? That would obviate custom =Image
directives.

> Perldoc provides a mechanism by which you can extend the syntax and semantics
> of your documentation notation: the C<=use> directive.

Um...how can this be made to work? Are renderers going to have to
know about every possible plugin? Are plugins going to have to know
about every possible renderer? Will dogs and cats be living together?

> C<=config> specifications are lexically scoped to the block in which
> they're specified.

=config head3 :numbered
=cut

method foo($bar, $baz) {
...
}

=head3 C<foo(>R<bar>C<, >R<baz>C<)>
...

Is that =head3 numbered, or is it in a different lexical scope?

(Actually, I don't see any reference to =cut in this spec. Is it
still there or not?)

--
Brent 'Dax' Royal-Gordon <br...@brentdax.com>
Perl and Parrot hacker

Damian Conway

unread,

Oct 14, 2006, 8:14:26 AM10/14/06

to perl6-l...@perl.org

Brent wrote:

> I've probably been hanging around Web standards nazis for too long,
> but can we get a separate code to mark the title of a document that
> can't be linked to (say, a book) along the lines of HTML's <cite> tag?

Hmmmmmm. Maybe. Care to nominate a letter for that? C<>, I<>, T<>, and E<> are
all take already. ;-)

> Actually, a couple more link schemes could probably handle my previous
> request:
>
> L<Perl 6 and Parrot Essentials|urn:isbn:059600737X>
> L<Parrot Magic Cookies in The Perl Review|urn:issn:1553667X/3/0#11>

Why wouldn't that just be:

L<Perl 6 and Parrot Essentials|isbn:059600737X>
L<Parrot Magic Cookies in The Perl Review|issn:1553667X/3/0#11>

????

> Oooh, transclusion--shiny. Perhaps the pipe character can be used to
> provide alternative text:
>
> P<See standard copyright terms in the distribution.
> |file:/shared/docs/std_copyright.pod>

I like it. The only concern would be that, everywhere else that a pipe is
valid, the LHS is rendered instead of the RHS. Here it would be reversed.
Arguably, by that measure, it ought to be:

P<file:/shared/docs/std_copyright.pod

|See standard copyright terms in the distribution.>

Of course, you could always argue that the LHS is the "text side" and the RHS
the "URL side". Hmmmmmmmmm. I need to think about that a little more.

> Also, what about non-textual files? If I type
> P<http://www.perlfoundation.org/images/onion_64x64.png>, will an onion
> appear in my Pod document? That would obviate custom =Image
> directives.

That would depend on the renderer. The parser will certainly accept it. I'd
expect that renderers that can render images would probably do so.

>> Perldoc provides a mechanism by which you can extend the syntax and
>> semantics of your documentation notation: the C<=use> directive.
>
> Um...how can this be made to work? Are renderers going to have to
> know about every possible plugin? Are plugins going to have to know
> about every possible renderer? Will dogs and cats be living together?

To answer your questions in order: Easy. No. No. Hell no!

The parser doesn't change when you extend syntax and semantics. Plugins can
only change the syntax of the *contents* of a new block type, not the way the
parser parses those blocks. For example, to get Markdown syntax and semantics,
you write:

=use Perldoc::Plugin::Markdown

=begin Markdown

*Markdown* syntax and semantics _in this block_

=end Markdown

The parser would still parse the Markdown block to create a
Perldoc::Block::Markdown object, even if you hadn't C<=use>'d the module. The
C<=use> merely allows the parser and/or renderer to load the class definition
of the Perldoc::Block::Markdown class, so that the object can be constructed
correctly (and, presumably, the contents of the block can be interpreted
meaningfully).

So, in other words, the syntax of Pod blocks is invariant, allowing the parser
to reduce Pod to a standard internal object stream, which each renderer (and
any plug-in extension) can do with as it will.

I obviously need to make those points clearer in the synopsis. Thanks.

>> C<=config> specifications are lexically scoped to the block in which
>> they're specified.
>
> =config head3 :numbered
> =cut

There is no C<=cut> in Perl 6. And in your example it wasn't needed, BTW,
since Pod reverts to ambient code after each block unless you're nested inside
a =begin...=end pair.

> method foo($bar, $baz) {
> ...
> }
>
> =head3 C<foo(>R<bar>C<, >R<baz>C<)>
> ...
>
>
> Is that =head3 numbered, or is it in a different lexical scope?

Assuming the =cut wasn't there, the =head3 would be numbered, since you'd be
in the same lexical scope. Lexical scopes are defined by =begin..=end pairs,
not by the "chunking" of Pod within ambient code.

> (Actually, I don't see any reference to =cut in this spec. Is it
> still there or not?)

Not. :-)

Damian

Smylers

unread,

Oct 16, 2006, 4:51:43 PM10/16/06

to perl6-l...@perl.org

On October 7th Damian Conway wrote:

> Before Christmas, as promised!

>
> [DRAFT] Synopsis 26 - Documentation

Thank you for that, Damian! Apologies for taking a while to respond,
but I wanted to leave reading the document until I had a sufficient
chunk of time to do it justice. And I was very impressed.

One quibble:

> To include named Unicode or XML entities, use the C<E<>> code.
>
> If the contents are not a number, they are interpreted as an upper-case
> Unicode character name, or as a lower-case XML entity. For example:
>

> Perl 6 makes considerable use of E<laquo> and E<raquo>.

I think the only standard XML entities are C<<>, C<>>, and
C<&>. Particular XML languages can define further entities which
use that syntax, but they aren't included by default. However, the
examples you give are HTML entities, defined in the HTML 4 spec:

http://www.w3.org/TR/REC-html40/sgml/entities.html

Smylers

Danny Brian

unread,

Oct 16, 2006, 6:26:58 PM10/16/06

to perl6-l...@perl.org, Smylers

On Oct 16, 2006, at 2:51 PM, Smylers wrote:
...

>> Perl 6 makes considerable use of E<laquo> and E<raquo>.
>
> I think the only standard XML entities are C<<>, C<>>, and
> C<&>. Particular XML languages can define further entities which
> use that syntax, but they aren't included by default.

The default entities are C<<>, C<>>, C<&>, C<'>, and
C<">.

So glad I could contribute that.

- Danny

Damian Conway

unread,

Oct 16, 2006, 8:45:37 PM10/16/06

to perl6-l...@perl.org

Smylers pointed out (and Danny Brian confirmed):

> The default entities are C<<>, C<>>, C<&>, C<'>, and
> C<">.

I *knew* there was a good reason I shun XML! ;-)

Clearly five entities is I<not> going to suffice. The synposis now reads:

To include named Unicode or XHTML entities, use the C<E<>> code.

If the contents are not a number, they are interpreted as an upper-case

Unicode character name, or as a lower-case XHTML entity. For example:

Thanks for that.

Damian

Christopher J. Madsen

unread,

Oct 23, 2006, 12:19:25 PM10/23/06

to perl6-l...@perl.org

On October 16th Damian Conway wrote:
> If the contents are not a number, they are interpreted as an upper-case
> Unicode character name, or as a lower-case XHTML entity. For example:

One more problem: not all XHTML entities are lower-case. For example:

Ð Þ É Θ

For a complete list, see:

http://www.w3.org/TR/xhtml-modularization/dtd_module_defs.html#a_xhtml_character_entities

I was thinking that we could distinguish them because Unicode character
names are always multiple words, but a quick search turned up ANGLE
(U+2220), so that won't work.

We could special-case ETH and THORN (the only all-uppercase entities)
and require translators to recognize them as entities.

We could allow an ampersand to indicate that it's an entity reference:
E<&ETH> and E<&THORN>. The ampersand would be optional if the entity
name contains lowercase: either E<&Eacute> or E<Eacute> would be ok.

We could disallow E<ETH> & E<THORN> and require the Unicode names:
E<LATIN CAPITAL LETTER ETH> & E<LATIN CAPITAL LETTER THORN>.

--
Chris Madsen c...@pobox.com
------------------ http://www.pobox.com/~cjm ------------------