DOCTYPE Syntax

19 views
Skip to first unread message

Allen Barnett

unread,
Nov 3, 2010, 3:57:25 PM11/3/10
to FoX-discuss
Hi: We have a lot of XML documents with this structure:

<!DOCTYPE my-root-element>
<my-root-element>
...
</my-root-element>

The FoX 4.1.0 SAX parser says that these documents have an "Invalid
document name". The problem appears to arise in
m_sax_parser::sax_parse in that the fx%token passed to checkQName at
line 1178 (or checkName at line 1180) has the '>' character at the
end. So, in my example above, checkQName gets 'my-root-element>' as
the name argument.

I checked a few sites on the web to see if this is valid syntax. It's
hard to say. FoX 3.2 accepts it, as does Xerces, the two XML libraries
we usually use, but of course that doesn't make it correct.

Thanks,
Allen

Andrew Walker

unread,
Nov 3, 2010, 4:48:49 PM11/3/10
to fox-d...@googlegroups.com
Hi Allen,

I've had a quick look at the spec and I think you've found a bug in FoX.
The relevant bit of the BNF grammar is listed as [28] at
http://www.w3.org/TR/REC-xml/#NT-doctypedecl and, if I'm reading that
correctly, there is no need for a space between the end of the name and
the end of the doctype declaration. Presumably this works if you add a
space after my-root-element in the doctype?

I suspect fixing this needs a relatively minor change to the block of code
starting at line 475 of m_sax_tokenizer.F90 ... but as yet I don't see
what the fix is.

Cheers,

Andrew

> --
> You received this message because you are subscribed to the Google Groups
> "FoX-discuss" group.
> To post to this group, send email to fox-d...@googlegroups.com.
> To unsubscribe from this group, send email to
> fox-discuss...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/fox-discuss?hl=en.
>
>


--


Allen Barnett

unread,
Nov 3, 2010, 4:54:07 PM11/3/10
to fox-d...@googlegroups.com
Hi Andrew: It does indeed work if there is a space between the end of
the element name and the closing >.
Thanks,
Allen

Andrew Walker

unread,
Nov 3, 2010, 6:22:15 PM11/3/10
to fox-d...@googlegroups.com
Hi Allen,

I think I have a fix for this. We need to teach the tokenizer to treat a
">" at the end of a name as ending the name and not being part of the
name, and at the same time ending the doctype declaration. I think this
can be done by modifying the elseif block at line 511 of
m_sax_tokenizer.F90. At the moment this reads:

elseif (q==" ".and.verify(c, XML_WHITESPACE)==0) then
call push_chars(fb, c)
fx%tokenType = TOK_NAME
else

changing this to:

elseif (q==" ".and.verify(c, XML_WHITESPACE//">")==0) then
if (c==">") then
fx%nextTokenType = TOK_END_TAG
else
call push_chars(fb, c)
endif
fx%tokenType = TOK_NAME
else

seems to fix things for me: I no longer get an error for the no-space case
and the case with the space is also fine. However, this change could much
more testing. I don't know if this messes up the parsing of more complex
DTDs, for example.

Cheers,

Andrew

Allen Barnett

unread,
Nov 3, 2010, 6:38:41 PM11/3/10
to fox-d...@googlegroups.com
Hi Andrew: None of our XML files have inline DTDs to test. But I'll try
to help test if you can give me some guidance.
Thanks,
Allen

Andrew Walker

unread,
Nov 4, 2010, 8:14:54 AM11/4/10
to fox-d...@googlegroups.com
Hi Allen,

It turns out that last night's fix does break the parsing of more complex DTDs. I've committed a different patch (see https://github.com/andreww/fox/commit/ef47e5ceb435e05b03131de2442b4f64c7231287 ) with a fix that seems to work correctly.

Cheers,

Andrew

--

Andrew Walker <andrew...@bris.ac.uk>

Department of Earth Sciences,
University of Bristol,
Wills Memorial Building,
Queen’s Road,
Bristol, BS8 1RJ, UK


Allen Barnett

unread,
Nov 8, 2010, 8:48:33 AM11/8/10
to fox-d...@googlegroups.com
Hi Andrew: I just wanted to say Thanks! for patching my difficulty. FoX
4.1.0 now passes all our regression tests. I had a bit of trouble
because WXML in 3.2 apparently splits an element tag with a long string
of attributes at column 80. In 4.1 it's all on one line; but, anyway,
the content is the same. So, excellent work!

Thanks,
Allen

Reply all
Reply to author
Forward
0 new messages