Hi all,
I have been trying to solve a particular problem, and have come up with a solution that involves adding a new extension point to the base OT. Further, one plugin that I have developed to use this extension point may well be worth absorbing into the base OT as built-in functionality (see (4) below). I wanted to discuss this here prior to firing off a pull request.
Feedback please!
Thanks,
Tom
My implementation is here:
1. The problem I needed to solve
I wanted to allow wiki-style references (primarily keyconrefs) within text nodes within any DITA XML file - text like "[[SomeKey]]". From here on, I will call these "wikilinks". The wikilinks are transformed as appropriate into XML like "<ph conkeyref="SomeKey"/>. The content inside [[ ]] can be richer than just a key name - I'm simplifying here. The transformation is implemented using XSLT. The overall format of the source files remains fully DITA compliant.
2. Available extension points
Because the wikilinks transform creates conkeyrefs, the transform needs to run before gen-list and debug-filter. Because the original DITA source files are not copied to the temp directory prior to gen-list/debug-filter, it's not possible to add a new module that modifies the files because it would have to modify the original source files, not the (not yet copied) temp files. The only possible extension point I could find was dita.parser, but it was not clear how this could be used for all dita files as it appears to have been designed for supporting additional non-DITA formats (eg, markdown). I also wanted the ability to chain several filters.
3. Solution
To solve this, I added a new extension point that I've called "dita.preprocess.prefilter". This adds XMLFilter objects to the start of the pipeline (in GenMapAndTopicListModule.getProcessingPipe(...) and DebugAndFilterModule.getProcessingPipe(...)). The XMLFilter objects can be expressed as java class names or as XSLT file names. Plugin dependency order is respected so that the transforms are added to the pipe in the expected order.
4. XSLT/DOCTYPE issue
There is a further XSLT/DOCTYPE issue. The wikilinks transform needs to handle XML documents of various doctypes, and emit documents with the same doctype as the input. My research shows that this is not possible without giving XSLT some help. So I implemented an XMLFilter that uses org.xml.sax.ext.LexicalHandler to examine DTD events during parsing, and emit processing instructions that expose the public and system doctypes. These can then be used by an XSLT 2.0 transform to handle input documents with arbitrary doctype. It is this injection of the doctype-public and doctype-system processing instructions that I think might be worth absorbing into the base OT. The implementation of this XMLFilter is not on github (it's currently one of our in-house plugins). If it's deemed desirable to have it in base OT, I'll add it to my dita-ot fork.
The XMLFilter adds PIs like:
<?doctype-public -//OASIS//DTD DITA Composite//EN?>
<?doctype-system ditabase.dtd?>
An XSLT transform can then be written like:
<xsl:template match="@*|node()" mode="#all">
<xsl:copy>
<xsl:apply-templates select="@*|node()" mode="#current"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/">
<xsl:result-document
doctype-public="{processing-instruction('doctype-public')}"
doctype-system="{processing-instruction('doctype-system')}">
<xsl:apply-templates/>
</xsl:result-document>
</xsl:template>
</xsl:stylesheet>