I'm trying to clarify how backslashes are used in DTD files (so that we
can ensure we handle them consistently when we convert to PO files and
do other l10n things)
According to the w3c XML standard
(http://www.w3.org/TR/2004/REC-xml-20040204/#sec-entity-decl), entity
definitions contain the entity name, and an EntityDef, which is an
EntityValue in the mozilla dtd case. An EntityValue is defined thus:
EntityValue ::= '"' ([^%&"] | PEReference | Reference)* '"'
| "'" ([^%&'] | PEReference | Reference)* "'"
Ignoring PEReference and Reference then (which allow %Name; &Name; 
and $#x1A3;), this implies that backslashes have no special escaping
meaning within XML entity declarations.
In Mozilla DTD files, I have found the following uses of backslashes
(searching the Firefox 1.5 branch of the main source and l10n tree):
Main Source code:
file entity name description
calendar/resources/locale/en-US/prefs.dtd pref.categories.overwrite \n implying newline
extensions/p3p/resources/locale/en-US/p3p.dtd p3p.individualanalysis.init \' escaping apostrophe within double-quoted string
mail/locales/en-US/chrome/messenger/messenger.dtd collapseAllThreadsCmd.key "\" - single character string representing the backslash key
mailnews/base/resources/locale/en-US/messenger.dtd collapseAllThreadsCmd.key "\" - single character string representing the backslash key
So in the main source code, there are just two places where \ is used in
a manner implying escaping - both in extensions. It seems more
consistent for me to leave the apostrophe un-escaped, and to replace the
\n with a <br/> or other way of representing a newline that is more
XMLish. Any comments here?
l10n trees:
file entity name description
ca,el,gu-IN,mn,nn-NO,pa-IN,sq dom/chrome/netError.dtd malformedURI.longDesc contains \ representing backslash key
ja,ja-JP-mac ja/browser/chrome/overrides/netError.dtd malformedURI.longDesc contains \ representing backslash key
bg editor/ui/chrome/composer/pref-publish.dtd adjustDesc.label,saveDesc.label \t representing tabs
da mail/chrome/messenger/credits.dtd credit.title \' escaping apostrophe within double-quoted string
lt toolkit/chrome/mozapps/preferences/removemp.dtd removePassword.title,... \u017e etc - escaped unicode
tr other-licenses/branding/thunderbird/brand.dtd sidebarName \u00E& etc - escaped unicode
Should it be recommended to replace the \t, \' etc? And is \uNNNN supported as a means of representing unicode within DTDs?
Regards
David
<br> is no-go. That's html. Could you verify that the calendar stuff
isn't crap? The p3p stuff is totally unmaintained, I'd like to know of
that is non-crap, too.
Looking at the XML spec myself, an '\' is an '\', period.
There's no problem to just have a newline in an DTD value, though.
> l10n trees:
>
> file entity name description
> ca,el,gu-IN,mn,nn-NO,pa-IN,sq dom/chrome/netError.dtd malformedURI.longDesc contains \ representing backslash key
> ja,ja-JP-mac ja/browser/chrome/overrides/netError.dtd malformedURI.longDesc contains \ representing backslash key
> bg editor/ui/chrome/composer/pref-publish.dtd adjustDesc.label,saveDesc.label \t representing tabs
> da mail/chrome/messenger/credits.dtd credit.title \' escaping apostrophe within double-quoted string
> lt toolkit/chrome/mozapps/preferences/removemp.dtd removePassword.title,... \u017e etc - escaped unicode
> tr other-licenses/branding/thunderbird/brand.dtd sidebarName \u00E& etc - escaped unicode
>
> Should it be recommended to replace the \t, \' etc? And is \uNNNN supported as a means of representing unicode within DTDs?
Same here, any escaping-foo of \ is a bug, AFAICT, I'd be suprised to
see it do anything halfway useful.
The \u00E scares the heck out of me, that looks like properties
encoding. Shouldn't pop up on dtds.
Axel
But for me the main thing is now clear, we don't need to try and
understand these funny uses of backslash, we just need to fix them, so
thanks!
David