Fwd: dtdanalyzer parameter entities

22 views
Skip to first unread message

Chris Maloney

unread,
Jan 15, 2014, 3:23:54 PM1/15/14
to dtdan...@googlegroups.com
Forwarding this thread to the group list ...




---------- Forwarded message ----------
From: Heidrich, Andrea <Andrea....@thieme.de>
Date: Wed, Jan 15, 2014 at 3:36 AM
Subject: AW: dtdanalyzer parameter entities
To: Chris Maloney <vold...@gmail.com>


Hi, Chris,

 

thanks a lot for the quick response and the detailed information.

Please move the conversation to the google group.

 

Thanks and regards,

Andrea

 

Andrea Heidrich

Studium und Lehre | Content Management

 

Georg Thieme Verlag KG

Rüdigerstraße 14 | 70469 Stuttgart

 

Fon +49[0]711/8931-962

Fax +49[0]711/8931-870

andrea....@thieme.de

www.thieme.de

 

Georg Thieme Verlag KG | Rüdigerstr. 14 | 70469 Stuttgart

Rechtsform: KG | Sitz und Handelsregister: Stuttgart, HRA 3499

 

Von: Chris Maloney [mailto:vold...@gmail.com]
Gesendet: Mittwoch, 15. Januar 2014 03:32
An: Heidrich, Andrea
Betreff: Re: dtdanalyzer parameter entities

 

Hi, Andrea,

 

First, do you mind if I move this conversation to the dtdanalyzer google group

(https://groups.google.com/forum/#!forum/dtdanalyzer)?  There are a few others on that

list that might have an interest.

 

I've gotten started digesting your problem, but haven't had a chance to look at your

DTD yet.  You are definitely on the bleeding edge here -- you're the first one to try

to use this tool so thoroughly.  I apologize that the linking and escaping mechanism 

for pandoc is a bit hacky -- we really didn't have time to try to make it robust at 

the time we wrote it.

 

Anyway, from an initial perusal, I still think pandoc might be a good choice.  I've

answered point-by-point below.

 

> We run it from the split-example directory using pandoc like this:

> ..\..\dtdanalyzer.bat --system split-example.dtd split-mockup.daz.xml –m

> ..\..\dtddocumentor.bat -d split-instance.xml –m

> 

> &fleegle-pic; is a link referencing the page for the general entity, but even following 

> the link, you never get to the picture. That’s not what is intended, is it?

 

That's a good point, and I hadn't considered it.  By default, the documentor does not produce

documentation for general and parameter entities.  That was a design decision based on the

idea that most *users* of the DTD wouldn't be interested in them, that they'd really be for

designers of customizations.  But the problem is that it is faithfully rendering the markup

of the home page, which includes a link to the entity page. If you run it with

  ..\..\dtddocumentor.bat -d split-instance.xml –m -e

instead, it should create documentation without any broken links.

 

> With pandoc switched off, we still get a “not valid XML”-error message.

 

Right, the "split" example documentation is written in Markdown format, so it requires

pandoc.  To get the split example to work without pandoc, you'd have to rewrite it in 

XHTML.  So, for example, instead of 

 

> Here is how you make links to other documentation pages:

> * Element tags must be preceded with a backtick:  \`\<split> -> `<split>.

> ...

 

you'd have:

 

> <p>Here is how you make links to other documentation pages:</p>

> <ul>

>   <li>Element tags must be preceded with a backtick:  \`\&lt;split> -> `&lt;split>.</li>

> ...

 

But I think you probably already know this ... it sounds like you've gotten this far.

 

> We’d like to avoid pandoc. The reason for this is that we have rather complex 

> example sections, and, as far as we can see, pandoc doesn’t open up the 

> opportunity to include comments, processing-instructionss, simple text ...

 

I'm not sure what you mean by "include comments, processing instructions, ...". You can

certainly include quoted text, including markup, using pandoc.  As I checked this, I do 

see that there's a problem including quoted comments, since a comment inside a DTD can't

contain the string "--".  So the following *almost* works as desired:

 

> Here is some quoted markup:

> ```xml

> <?processing-instruction foo="bar"?>

> <!-\- comment comment comment -\->

> <splits>fun</splits>

> ```

 

But the backslashes separating the dashes in the comment delimiters remain, instead of being

removed.  That should be an easy fix, though.  The "xml" introducing the text block produces

a syntax-highlighted block of preformatted text.  To get the syntax highlighting to show up,

you have to use a css file.  For example, the default one from pandoc appears to be this 

`split-example` directory, and then cause it to be loaded in the documentation by adding the 

`--include ../syntax.css` argument.

 

> ... or even 

> to use color and emphasis within the examples section. If we’re wrong here, 

> please let us know. 

> First we thought about trying another pandoc flavor, but 

> couldn’t find out how to make extensions work. We think if the extentions are 

> available, pandoc still is no option for us, however.

 

One thing you might not know is that pandoc lets you mix in XHTML pretty freely.  For example,

you could write most of your text in Markdown (which is much more readable than XHTML, IMO)

and switch into XHTML for the complex examples, something like this:

 

> Here is an example, written in XHTML, that has color and emphasis:

> <pre>&lt;split>

>   <span color='style: color: red;'>&lt;banana instrument='guitar'>Fleegle&lt;/banana></span>

>   <strong><em>&lt;banana instrument='drums'>Bingo&lt;/banana></em></strong>

> &lt;/split></pre>

 

This *almost* works perfectly, except that the document processor is getting confused by the

`&lt;`, thinking that they are entity references, and hyperlinking them.  I see that this is

exactly what you're complaining about with reference to the numeric character references like

`&#252;`, and that your fix to the dtddocumentor.xsl should take care of that.

 

That is as far as I've gotten.  I will take a look at your DTD tomorrow.

 

Best,

Chris

 

On Tue, Jan 14, 2014 at 9:57 AM, Chris Maloney <vold...@gmail.com> wrote:

Hi, I will take a look at this today, but I have a lot of other things on my plate -- might not respond till this evening (my time).  

 

Chris

 

On Tue, Jan 14, 2014 at 7:14 AM, Heidrich, Andrea <Andrea....@thieme.de> wrote:

Dear Chris,

 

we‘ve tried the split.example.dtd in the DtdAnalyzer-0.4 directory which seems to be identical except for one opening bracket in banana.ent:

 

<!--~~ banana.ent
This module defines the `<banana> element, and all the slippery things associated
with it.
~~-->

 

Master directory: “&lt;banana>” instead

 

We run it from the split-example directory using pandoc like this:

..\..\dtdanalyzer.bat --system split-example.dtd split-mockup.daz.xml –m

..\..\dtddocumentor.bat -d split-instance.xml –m

 

&fleegle-pic; is a link referencing the page for the general entity, but even following the link, you never get to the picture. That’s not what is intended, is it?

 

With pandoc switched off, we still get a “not valid XML”-error message.

 

We’d like to avoid pandoc. The reason for this is that we have rather complex example sections, and, as far as we can see, pandoc doesn’t open up the opportunity to include comments, processing-instructionss, simple text or even to use color and emphasis within the examples section. If we’re wrong here, please let us know. First we thought about trying another pandoc flavor, but couldn’t find out how to make extensions work. We think if the extentions are available, pandoc still is no option for us, however.

 

Our DTD documentation is written in German, which means that it includes characters from Latin-1 and we don’t want a link to be created for e.g. &#252; (Latin small letter u with diaeresis). Moreover our notes sections contain opening/closing/empty tags, for which we don’t want a link in this place. Therefore we modified the dtddocumentor.xsl, this is the preliminary version:

 

<!-- Redo the hyperlink with a new href, but preserve other attributes and content -->
    <xsl:choose>
      <xsl:when test="@href[starts-with(.,'#p=ge-')]">
        <xsl:value-of select="."></xsl:value-of>
      </xsl:when>
      <xsl:otherwise>
        <a>
          <xsl:apply-templates select='@* except @href' mode='content'/>
          <xsl:attribute name='href'>
            <xsl:text>#p=</xsl:text>
            <xsl:call-template name="makeSlug">
              <xsl:with-param name='name' select='$name'/>
              <xsl:with-param name="type" select='$type'/>
            </xsl:call-template>
          </xsl:attribute>
          <xsl:apply-templates select='node()' mode='content'/>
        </a>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

 

Before making this modification, we tried our own DTD with a general entity with pandoc switched off and got a “not valid XML” message, too. So, general entities don’t seem to work with pandoc switched off. At the moment, it looks like we don’t necessarily need general entities that need to be declared, but we’re not sure, yet. Our first intention was using general-entities (like &fleegle-pic;) to include images in our notes or examples section, but including an image-element directly seems to work fine in this case.

 

You find a stripped-down version of our DTD attached. If you write &fleegle; or &fish; somewhere into the notes section it will fail with pandoc switched off.

 

Thanks for helping, we might have a buch of questions following for the attributes section and attribute pages.

 

Regards,

Andrea

 

 


Maloney, Christopher (NIH/NLM/NCBI) [C]

unread,
Jan 15, 2014, 10:24:50 PM1/15/14
to dtdan...@googlegroups.com, Heidrich, Andrea

Hi, Andrea

 

I got a chance to look at your stripped-down DTD, and I think I got it working the way you would like, and am attaching a new version, along with generated documentation, in a zip file.

 

Here’s what I came up with:

 

    <!--~~ <excursus>

    ~~ notes

    Der Exkurs ist eine \&lt;/erweiterte\&gt; \&#60;Sonderumgebung/\&gt;, die eingeführt ...

 

    Here's a reference to \&amp;fleegle;

 

    text

 

    <div>

    <img src="..\fish.jpg" title="fish fish fish" alt="a fish"/>

    </div>

 

    Here's a preformatted example section that has a lot of weirdness:

 

    <pre>\&lt;splits>

      <em><strong>\&lt;fleegle/></strong></em>

      <span style='color: green;'>\&lt;drooper/></span>

      <span style='color: red;'>Rhababerbarbarabarbarbarenbärte</span>

    \&lt;/splits></pre>

 

Notes:

 

·         As mentioned in the documentation on the wiki (https://github.com/NCBITools/DtdAnalyzer/wiki/Overview-of-DTD-annotations) in order to disable auto-linking to entities, you just precede them with a backlash.  The documentor should be smart enough to know that &lt; &gt;, etc., and numeric character refs are not DTD entities, but it is not.  So for now, precede those with a backslash.  I wrote a new issue for this, https://github.com/NCBITools/DtdAnalyzer/issues/45, and hopefully someone will be able to get to it soon.

·         Rather than using numeric character references like “&#252;”, if your DTD is encoded in UTF-8, you could just type the “ü” directly.  But if that doesn’t work, go ahead and use the NCR, but just remember to precede it with a backslash:  “\&#252;”.

·         If you want the actual text “&fleegle;” to show up in the documentation, you have to xml-escape the ampersand.  Remember, when you turn off Markdown, the text is interpreted as XHTML, and “&fleegle;”, unescaped, is not valid.  That’s probably the problem you described in your email.

 

Let me know if (when) you have more questions.

 

 

 

Chris Maloney

NIH/NLM/NCBI (Contractor)

Building 45, 5AN.24D-22

301-594-2842

 

--
You received this message because you are subscribed to the Google Groups "DtdAnalyzer" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dtdanalyzer...@googlegroups.com.
To post to this group, send email to dtdan...@googlegroups.com.
Visit this group at http://groups.google.com/group/dtdanalyzer.
For more options, visit https://groups.google.com/groups/opt_out.

fish.zip

Chris Maloney

unread,
Jan 17, 2014, 9:11:01 AM1/17/14
to dtdan...@googlegroups.com, Heidrich, Andrea
Hi, again, Andrea,

I did a bit of work on this, and fixed some things so that you do not need so many backslashes.  See the issue:  https://github.com/NCBITools/DtdAnalyzer/issues/45, and this commit:  https://github.com/NCBITools/DtdAnalyzer/commit/e4e0cf1729c7b79a040c4d5f1a0885121552772f.

Now, you never need backslashes before the general entities &lt;, &apos;, &gt;, &quot;, or &amp;.  Also, if you're generating the documentation without the "-e" flag, then you should never need those backslashes.

So hopefully this will make it much easier to write the documentation you want in XHTML inside comments.  I'm not closing the issue, because I still think it could be improved, but anyway it is better than it was.

Let me know how it is going.

Thanks.
Chris
Reply all
Reply to author
Forward
0 new messages