Hi Dan,
Thanks for starting the process.
The idea of developing the specification under the auspices of an
independent entity is a very good one. It will help to make the
process obviously open and independent and allow it to have a life
beyond its initial creators. I have lots of detailed queries about
that, which I will address in another email, because I want to query
something very fundamental that your mail raises and I don't want it
buried in the details.
You mention the TCK, and I agree an unambiguous and preferably
automated way of allowing implementations to demonstrate compliance is
a good thing, but...
It raises the rather basic question of "what are we standardising?"
1. Asciidoc source format, definitely, but how do you test syntax
acceptance in a markup language. Because in a markup language very
little is illegal, if its not recognised as markup its just text, so
nearly any document content is likely to be accepted by the standard,
unlike programming language syntax.
2. Asciidoc markup semantics, definitely, but at what level, eg is
*foo* specified as "must be styled bold", to be emphasised (in Docbook
speak), to be strong (in HTML speak), or some other wording that would
tend to indicate that, or is it just a grouping of characters to be
styled in "some" way. And particularly statements like "must be bold"
are presentational and not semantic, and also would mean any styling
customisation is "not standard".
3. Asciidoc output, I see that as having several problems:
3a. which output? html4, xhtml, HTML5, docbook, pdf, epub, man etc,
any output used in the specification means an implementation must
provide that format to demonstrate compliance, or if the standard
tried to specify all outputs it is preventing new targets from being
"standards compliant". For example an implementation that produces
only wonderful pdf books should not have to artificially produce HTML
as well, just to be able to demonstrate compliance which allows it to
say its compliant.
The common mark spec uses HTML, and has just this issue, it simply
doesn't address anything else. But being less formal than a
specification with trademarks managed by an entity like Eclipse
Foundation, that may not matter for it. But a formal entity like a
foundation may find it hard to accept as "compliant and therefore
allowed to use the trademark" something that doesn't pass an automated
test, and that makes it hard to include all outputs that are not
standardised.
3b. Computer language specifications like Java, C and C++ don't
specify the machine code compilers must produce, and an ARM compiler
doesn't have to work for x86 as well, just to claim compliance.
3c. I'm not convinced any of the current implementations have any
output that would be classed as "best practice" for that output, so
that it would be worthy of being standardised? (am relying on Dan's
Github comments about Asciidoctor output, I havn't examined it in
detail, but understand it is similar to Asciidoc Python which isn't
terribly "best practice")
And designing a new "standard output" during the standardisation
process without an implementation is risky.
3d. Even for an output that was specified, that output may be used in
different environments that impose constraints, eg HTML could render
to a blog or as part of a site managed by a framework, places where
there may be requirements or limitations on how the HTML is
structured. It doesn't seem sensible to prevent such uses from being
able to claim standards compliance if they accept the whole language.
If such an implementation cannot be standards compliant, there is no
incentive to implement all of Asciidoc, and no incentive to not add
some new markups just to suit their use-case. That way just leads to
fragmentation.
3e. Implementations that use follow up toolchains may not have the
level of control over the output to exactly match examples in the
specification. Even if they accept the full Asciidoc source are those
to be condemned as not standards compliant?
3f. What about generated content that is not a direct transcode of
input, such as tables of contents, indexes, section numbers. Which
organisation of those is to be standardised?
To me the point of the standardisation process is to ensure that
markup in a document is interpreted in the same way in all
implementations, and the semantics of that markup are the same, not
its presentation. Thats the core of the "Asciidoc is a semantic
markup, not a presentational markup" statement.
So it seems to me that we need to standardise the syntax of Asciidoc
markup and to some extent the semantics, but not the output, and that
unfortunately makes it difficult to generate automated tests, however
I'm happy to hear solutions.
I guess we better get these ducks in a row[1] before we propose
anything to an organisation like Eclipse.
Cheers
Lex
[1]
https://dictionary.cambridge.org/dictionary/english/get-have-your-ducks-in-a-row