Is it in the scope of epubcheck to check XML for text encoding conformance?

5 views
Skip to first unread message

Jon

unread,
Nov 3, 2009, 10:54:29 AM11/3/09
to epubcheck
Everyone,

Based on some TeleRead comments on a new ePub plug-in for Firefox,
developed by Michael Volz, I am concerned about text encoding issues,
especially for the NCX.

First, is it within the scope of epubcheck to check all XML documents
(including content documents, the package, and NCX) for proper
encoding?

Second, as far as I can see, in the OPF spec, and the DTBook spec
describing NCX, that we don't require the NCX to be UTF-8/16 encoded
-- it only needs to be valid XML which means the XML may be in any
encoding so long as non-UTF-8/16 encodings are properly specified
within the XML prolog.

I will add a note to the ePub Maintenance database on the second point
above.

Jon

Peter Sorotokin

unread,
Nov 3, 2009, 11:46:47 AM11/3/09
to epub...@googlegroups.com
My take is that EPUb requires all XML documents to be UTF-8/16 encoded. Epubcheck is supposed to check it, but I do not know if it checks NCX and if this check is in the build (I know it is in the mainline).

Peter

Liza Daly

unread,
Nov 3, 2009, 11:51:41 AM11/3/09
to epub...@googlegroups.com
I believe the file I submitted as a test case only had a non-UTF 8/16
OPS document, not NCX:

http://code.google.com/p/epubcheck/issues/detail?id=34&can=1&q=encoding

I agree this is something that should be clarified in the spec (I'd
like to see the encoding mandate apply to the whole epub file too).

Liza

Jon

unread,
Nov 3, 2009, 12:04:12 PM11/3/09
to epubcheck
As noted in the submission of an item to the ePub Maintenance group,
my study of the OPS/OPF/DTBook specs indicated that we made no
statement regarding the encoding of the NCX. It is possible I missed
any explicit statement to the contrary.

Regarding the intent, I'm not sure we ever discussed limiting the
encoding for NCX to only UTF8-16. It may have fallen through the
cracks.

So, it is something I hope the Maintenance group takes up and
resolves.

Regarding epubcheck, I do hope it already checks for character
encoding of all the XML documents...

Stuart A. Yeates

unread,
Nov 3, 2009, 2:23:37 PM11/3/09
to epub...@googlegroups.com
> -----Original Message-----
> From: epub...@googlegroups.com [mailto:epub...@googlegroups.com] On Behalf Of Jon
> Sent: Tuesday, November 03, 2009 7:54 AM
> To: epubcheck
> Subject: Is it in the scope of epubcheck to check XML for text encoding conformance?
>
>
> Everyone,
>
> Based on some TeleRead comments on a new ePub plug-in for Firefox,
> developed by Michael Volz, I am concerned about text encoding issues,
> especially for the NCX.

I've posted a review of this, with a handful of bugs at:
https://addons.mozilla.org/en-US/firefox/addon/45281

Despite the number of issues in the review, I think the software as a
whole shows promise, particularly since this is the first public
release.

cheers
stuart

Reply all
Reply to author
Forward
0 new messages