To achieve higher spec compliance, DTD validation is something that
perhaps should be reintroduced for both XHTML 1.1, DTBook and NCX.
EpubCheck uses SAX to parse the docs so this shouldnt be a very big
deal.
On the other hand (and this is where I am essentially in agreement
with Peters approach to use RNG), IMO the basic issue boils down to
our approach to namespace support. In todays fully namespaced XML
world, a schema type that does not support the W3C namespace spec
(read: DTDs) is fundamentally flawed, and should be avoided. Another
option for us could be to push harder to come up with a formalized
RNGs for all epub fileset members, and have a revision drop the
reference to the (soon to be deprecated anyway) DTDs. I think DAISY
would be fine with this re NCX and DTBook, as we are entering a
revision phase of Z39.86 in the autumn.
Ill have a look at the files uploaded as soon as I can find some time.
It should be noted that Jings messaging isnt the most helpful at all
times. Sun MSV (another RelaxNG implementation) is oftentimes more
friendly: as an example, instead of saying "unfinished content model"
it will say "unfinished content model: expecting element X, Y or Z",
and the same for attribute nodes/values.
hth, /markus
> In 'liza-orig.epub', the NCX is certainly malformed. First, it is not
> valid to the NCX DTD (it contains no DOCTYPE as required by the spec).
> Second, it obviously namespaces everything to death, which of course
> would not make it valid per the NCX DTD.
Assuming you're using the same epub file that you mentioned in your
personal email to me, it does validate according to epubcheck 0.93:
$ java -jar test/epubcheck.jar dist/A-Tale-of-Two-Cities_Charles-Dickens.epub
No errors or warnings detected
As Markus says, the problem here is around DTDs and namespaces, not
the file format in the package, which is correct in the
namespace-aware world.
It's certainly no problem to generate a namespace-free version that
does include the DOCTYPE. Making those two changes to the file cause
it to validate against the DTD, so it isn't otherwise malformed. I
agree with Markus that it is a step backwards; what do other people
think?
(For reference, here's the epub document:
http://www.threepress.org/static/epub/A-Tale-of-Two-Cities_Charles-Dickens.epub)
--Liza
That's just semantics, though. From an XML perspective this:
<ncx xmlns="http://www.daisy.org/z3986/2005/ncx/" />
and this:
<ncx:ncx xmlns:ncx="http://www.daisy.org/z3986/2005/ncx/" />
are equivalent. (At the parsing stage the prefix no longer even "exists".)
I generated my documents with explicit namespaces out of habit, mostly
because I've gotten bitten by default-namespace-related bugs too many
times.
It would be strange, in an epub context, to generate some XML (e.g.
NCX) without explicit namespaces, when other epub documents (e.g. the
OPF file) absolutely require them.
--Liza
Dave
--
Dave Cramer
Technical Lead
TexTech, Inc.
70 Landmark Hill Drive
Brattleboro, VT 05301
802.254.6073 x127
d.cr...@textechinc.us
If you can fix it, I definitely think you should. It will keep your
content on the
safe side interoperabilitywise, and in line with the offical epub
integrity checker,
thats worth a lot on its own IMO.
My earlier comment was not as much about indicating that invalid content is ok,
rather about a longer term strategy to improve a heritage-laden situation.
/markus