Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Escaping and Backslashes in DTD files

49 views
Skip to first unread message

David Fraser

unread,
Jun 5, 2006, 11:30:48 AM6/5/06
to dev-...@lists.mozilla.org
Hi all

I'm trying to clarify how backslashes are used in DTD files (so that we
can ensure we handle them consistently when we convert to PO files and
do other l10n things)

According to the w3c XML standard
(http://www.w3.org/TR/2004/REC-xml-20040204/#sec-entity-decl), entity
definitions contain the entity name, and an EntityDef, which is an
EntityValue in the mozilla dtd case. An EntityValue is defined thus:

EntityValue ::= '"' ([^%&"] | PEReference | Reference)* '"'
| "'" ([^%&'] | PEReference | Reference)* "'"


Ignoring PEReference and Reference then (which allow %Name; &Name; 
and $#x1A3;), this implies that backslashes have no special escaping
meaning within XML entity declarations.

In Mozilla DTD files, I have found the following uses of backslashes
(searching the Firefox 1.5 branch of the main source and l10n tree):

Main Source code:

file entity name description
calendar/resources/locale/en-US/prefs.dtd pref.categories.overwrite \n implying newline
extensions/p3p/resources/locale/en-US/p3p.dtd p3p.individualanalysis.init \' escaping apostrophe within double-quoted string
mail/locales/en-US/chrome/messenger/messenger.dtd collapseAllThreadsCmd.key "\" - single character string representing the backslash key
mailnews/base/resources/locale/en-US/messenger.dtd collapseAllThreadsCmd.key "\" - single character string representing the backslash key

So in the main source code, there are just two places where \ is used in
a manner implying escaping - both in extensions. It seems more
consistent for me to leave the apostrophe un-escaped, and to replace the
\n with a <br/> or other way of representing a newline that is more
XMLish. Any comments here?

l10n trees:

file entity name description
ca,el,gu-IN,mn,nn-NO,pa-IN,sq dom/chrome/netError.dtd malformedURI.longDesc contains \ representing backslash key
ja,ja-JP-mac ja/browser/chrome/overrides/netError.dtd malformedURI.longDesc contains \ representing backslash key
bg editor/ui/chrome/composer/pref-publish.dtd adjustDesc.label,saveDesc.label \t representing tabs
da mail/chrome/messenger/credits.dtd credit.title \' escaping apostrophe within double-quoted string
lt toolkit/chrome/mozapps/preferences/removemp.dtd removePassword.title,... \u017e etc - escaped unicode
tr other-licenses/branding/thunderbird/brand.dtd sidebarName \u00E& etc - escaped unicode

Should it be recommended to replace the \t, \' etc? And is \uNNNN supported as a means of representing unicode within DTDs?

Regards
David

Axel Hecht

unread,
Jun 5, 2006, 8:46:23 PM6/5/06
to

<br> is no-go. That's html. Could you verify that the calendar stuff
isn't crap? The p3p stuff is totally unmaintained, I'd like to know of
that is non-crap, too.

Looking at the XML spec myself, an '\' is an '\', period.
There's no problem to just have a newline in an DTD value, though.

> l10n trees:
>
> file entity name description
> ca,el,gu-IN,mn,nn-NO,pa-IN,sq dom/chrome/netError.dtd malformedURI.longDesc contains \ representing backslash key
> ja,ja-JP-mac ja/browser/chrome/overrides/netError.dtd malformedURI.longDesc contains \ representing backslash key
> bg editor/ui/chrome/composer/pref-publish.dtd adjustDesc.label,saveDesc.label \t representing tabs
> da mail/chrome/messenger/credits.dtd credit.title \' escaping apostrophe within double-quoted string
> lt toolkit/chrome/mozapps/preferences/removemp.dtd removePassword.title,... \u017e etc - escaped unicode
> tr other-licenses/branding/thunderbird/brand.dtd sidebarName \u00E& etc - escaped unicode
>
> Should it be recommended to replace the \t, \' etc? And is \uNNNN supported as a means of representing unicode within DTDs?

Same here, any escaping-foo of \ is a bug, AFAICT, I'd be suprised to
see it do anything halfway useful.

The \u00E scares the heck out of me, that looks like properties
encoding. Shouldn't pop up on dtds.

Axel

David Fraser

unread,
Jun 6, 2006, 2:15:52 PM6/6/06
to Axel Hecht, dev-...@lists.mozilla.org
Axel Hecht wrote:
> David Fraser wrote:
>> Hi all
>>
>> I'm trying to clarify how backslashes are used in DTD files (so that we
>> can ensure we handle them consistently when we convert to PO files and
>> do other l10n things)
>>
>> [snip]

>>
>> So in the main source code, there are just two places where \ is used in
>> a manner implying escaping - both in extensions. It seems more
>> consistent for me to leave the apostrophe un-escaped, and to replace the
>> \n with a <br/> or other way of representing a newline that is more
>> XMLish. Any comments here?
> <br> is no-go. That's html. Could you verify that the calendar stuff
> isn't crap? The p3p stuff is totally unmaintained, I'd like to know of
> that is non-crap, too.
>
> Looking at the XML spec myself, an '\' is an '\', period.
> There's no problem to just have a newline in an DTD value, though.
Great, that's how I see it. I'll try follow up on the calendar and p3p
stuff and get them sorted out.

>> l10n trees:
>>
>> file entity name description
>> ca,el,gu-IN,mn,nn-NO,pa-IN,sq dom/chrome/netError.dtd
>> malformedURI.longDesc contains \ representing backslash key
>> ja,ja-JP-mac ja/browser/chrome/overrides/netError.dtd
>> malformedURI.longDesc contains \ representing backslash key
>> bg editor/ui/chrome/composer/pref-publish.dtd
>> adjustDesc.label,saveDesc.label \t representing tabs
>> da mail/chrome/messenger/credits.dtd credit.title \'
>> escaping apostrophe within double-quoted string
>> lt toolkit/chrome/mozapps/preferences/removemp.dtd
>> removePassword.title,... \u017e etc - escaped unicode
>> tr other-licenses/branding/thunderbird/brand.dtd sidebarName
>> \u00E& etc - escaped unicode
>>
>> Should it be recommended to replace the \t, \' etc? And is \uNNNN
>> supported as a means of representing unicode within DTDs?
> Same here, any escaping-foo of \ is a bug, AFAICT, I'd be suprised to
> see it do anything halfway useful.
>
> The \u00E scares the heck out of me, that looks like properties
> encoding. Shouldn't pop up on dtds.
Indeed, I'll try encourage the teams to fix these.

But for me the main thing is now clear, we don't need to try and
understand these funny uses of backslash, we just need to fix them, so
thanks!

David

0 new messages