Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Doubts about htmlparser.properties (bug 482921)

50 views
Skip to first unread message

flod

unread,
Nov 2, 2011, 2:30:03 AM11/2/11
to dev-...@lists.mozilla.org
I was going to write in the bug, but this started to grow a bit too much
for BugZilla. Here some thoughts/doubts on that file.

First of all: landing this file one week before string freeze is madness.

> errAlmostStandardsDoctype=Almost standards mode doctype. Expected
> \u201C<!DOCTYPE html>\u201D.
"Almost standards mode" is a mode like "Quirks mode", or the mode is
"Standards mode". Is that what is called "Standards compliance mode" in
pageInfo.properties. If that's the case, consistency is bad.

> errAstralNonCharacter=Character reference expands to an astral
> non-character.
What is an "astral non-character"?

> errBogusComment=Bogus comment.Bogus comment
(and others similar) Can someone give an example of a "bogus comment"?
Bogus should be "fake", but I can't imagine a bogus comment. Isn't that
a bit too colloquial? Same question for "garbage" and "stray".
Excluding "garbage" as in "garbage collector", I can't find references
in other technical glossaries (e.g. Microsoft).

> errStrayStartTag=Stray end tag \u201C%1$S\u201D.
> errStrayEndTag=Stray end tag \u201C%1$S\u201D.
First string is wrong, and I wouldn't change it without changing its name.


"end of file inside x", "end of file reached...", "end of file
occurred...", "end of file seen...", "saw end of file" (and I read only
half the file). Is that variety wanted?

Probably more to come.

Francesco.

Jonathan Kew

unread,
Nov 2, 2011, 3:23:23 AM11/2/11
to flod, dev-...@lists.mozilla.org, Henri Sivonen
On 2 Nov 2011, at 06:30, flod wrote:

>> errAstralNonCharacter=Character reference expands to an astral non-character.
> What is an "astral non-character"?

I assume (hope Henri will correct me if wrong!) that "astral" refers to a character code outside the Unicode "basic multilingual plane" (i.e., what's sometimes known as a supplementary-plane character), and that "non-character" means a code value that is defined to NOT represent a valid character - i.e. one of the last two codes in each "plane", at codepoint 0x??FFF[EF].

However, IMO the use of "astral" here is (somewhat colloquial) jargon, and "non-character" is also a specialized technical term that may not be appropriate for Firefox UI usage, or at least should be a secondary clarification of the primary message, which is that the character code provided was not valid.

ISTM that something along the lines of

Invalid character code (non-character) in character reference.

may be all that's needed. The fact that it's a "astral" rather than BMP code value is irrelevant.

Similar messages for other errors might be

Invalid character code (outside permissible Unicode range) in character reference.
Invalid character code (control character) in character reference.
Invalid character code (surrogate codepoint) in character reference.

JK

flod

unread,
Nov 2, 2011, 4:26:37 AM11/2/11
to dev-...@lists.mozilla.org
Thanks Jonathan for your answer ;-)
> errEndTagBr=End tag \u201Cbr\u201D.
When is this message displayed?
> errBadStartTagInHead=Bad start tag in \u201C%1$S\u201D in \u201Chead\u201D.
An example to understand what "%s$S" is?
> errNonSpaceInTable=Misplaced non-space characters insided a table.
Typo
> errNonSpaceInTrailer=Non-space character in page trailer.
What is a page trailer?
> errUnquotedAttributeQuote=Quote in an unquoted attribute value.
> Probable causes: Attributes running together or a URL query string in
> an unquoted attribute value.
> errUnquotedAttributeEquals=\u201C=\u201D in an unquoted attribute
> value. Probable causes: Attributes running together or a URL query
> string in an unquoted attribute value.
Explanation of "attributes running together"?
> errQuirkyDoctype=Quirky doctype. Expected \u201C<!DOCTYPE html>\u201D.
Can you explain "quirky"?

Francesco


Henri Sivonen

unread,
Nov 2, 2011, 4:45:19 AM11/2/11
to dev-...@lists.mozilla.org
flod wrote:
> First of all: landing this file one week before string freeze is madness.

Sorry. I thought the rapid release model allowed stuff to land on
trunk at any time. If landing this at this time is a serious problem,
we could turn it off on Aurora immediately after the uplift.

>> errAlmostStandardsDoctype=Almost standards mode doctype. Expected
>> \u201C<!DOCTYPE html>\u201D.
>
> "Almost standards mode" is a mode like "Quirks mode", or the mode is
> "Standards mode". Is that what is called "Standards compliance mode" in
> pageInfo.properties. If that's the case, consistency is bad.

There are three HTML modes. In the HTML5 specification and the DOM4
specification, they are called:
quirks mode
limited-quirks mode
no-quirks mode

In common usage, they are instead called:
quirks mode
almost standards mode
standards mode

"Standards compliance mode" is neither a specification term nor a term
in common usage but it means the same as "no-quirks mode" or
"standards mode".

>> errBogusComment=Bogus comment.Bogus comment
>
> (and others similar) Can someone give an example of a "bogus comment"?

<!foo> and <!DOCTYP html> (not the lack of "E") are examples of bogus comments.

> Bogus should be "fake", but I can't imagine a bogus comment. Isn't that
> a bit too colloquial?

The term "bogus comment" comes from the HTML5 specification:
http://www.whatwg.org/specs/web-apps/current-work/#bogus-comment-state

Bogus comment means a part of syntax that parses into a comment node
in the DOM but does not meet the syntax rules for valid comments.

>> errStrayStartTag=Stray end tag \u201C%1$S\u201D.
>> errStrayEndTag=Stray end tag \u201C%1$S\u201D.
>
> First string is wrong, and I wouldn't change it without changing its name.

Indeed, the first string is wrong.
https://bugzilla.mozilla.org/show_bug.cgi?id=698935

What are the rules for changing the string keys when fixing obvious
problems in the en-US string?

> "end of file inside x", "end of file reached...", "end of file
> occurred...", "end of file seen...", "saw end of file" (and I read only
> half the file). Is that variety wanted?

Probably the phrasings should be more similar with each other in en-US, yeah.

On Wed, Nov 2, 2011 at 9:23 AM, Jonathan Kew <jfkt...@googlemail.com> wrote:
> On 2 Nov 2011, at 06:30, flod wrote:
>
>>> errAstralNonCharacter=Character reference expands to an astral non-character.
>> What is an "astral non-character"?
>
> I assume (hope Henri will correct me if wrong!) that "astral" refers to a character code outside the Unicode "basic multilingual plane" (i.e., what's sometimes known as a supplementary-plane character), and that "non-character" means a code value that is defined to NOT represent a valid character - i.e. one of the last two codes in each "plane", at codepoint 0x??FFF[EF].

Indeed.

> However, IMO the use of "astral" here is (somewhat colloquial) jargon, and "non-character" is also a specialized technical term that may not be appropriate for Firefox UI usage, or at least should be a secondary clarification of the primary message, which is that the character code provided was not valid.
>
> ISTM that something along the lines of
>
>  Invalid character code (non-character) in character reference.
>
> may be all that's needed. The fact that it's a "astral" rather than BMP code value is irrelevant.
>
> Similar messages for other errors might be
>
>  Invalid character code (outside permissible Unicode range) in character reference.
>  Invalid character code (control character) in character reference.
>  Invalid character code (surrogate codepoint) in character reference.

Seems reasonable.

On Wed, Nov 2, 2011 at 10:26 AM, flod <fl...@lodolo.net> wrote:
> Thanks Jonathan for your answer ;-)
>>
>> errEndTagBr=End tag \u201Cbr\u201D.
>
> When is this message displayed?

If the source contains </br>

>> errBadStartTagInHead=Bad start tag in \u201C%1$S\u201D in
>> \u201Chead\u201D.
>
> An example to understand what "%s$S" is?

Any tag name that isn't one of html, link, basefont, bgsound, meta,
style, noframes, head or noscript.

>> errNonSpaceInTable=Misplaced non-space characters insided a table.
>
> Typo

https://bugzilla.mozilla.org/show_bug.cgi?id=698866

>> errNonSpaceInTrailer=Non-space character in page trailer.
>
> What is a page trailer?

The part after the </html> tag.

>> errUnquotedAttributeQuote=Quote in an unquoted attribute value. Probable
>> causes: Attributes running together or a URL query string in an unquoted
>> attribute value.
>> errUnquotedAttributeEquals=\u201C=\u201D in an unquoted attribute value.
>> Probable causes: Attributes running together or a URL query string in an
>> unquoted attribute value.
>
> Explanation of "attributes running together"?

Lacking spaces between. E.g.
<a class=foohref="http://example.com">

>> errQuirkyDoctype=Quirky doctype. Expected \u201C<!DOCTYPE html>\u201D.
>
> Can you explain "quirky"?

A doctype that triggers the quirks mode.

--
Henri Sivonen
hsiv...@iki.fi
http://hsivonen.iki.fi/

flod

unread,
Nov 2, 2011, 4:57:25 AM11/2/11
to dev-...@lists.mozilla.org
Il 02/11/11 09.45, Henri Sivonen ha scritto:
> Sorry. I thought the rapid release model allowed stuff to land on
> trunk at any time. If landing this at this time is a serious problem,
> we could turn it off on Aurora immediately after the uplift.
Yes, but it means you have less than one week to find and fix errors in
your 107 strings, since strings are not supposed to change on Aurora.
Localizers will have a complete Aurora cycle to work on these.
> Indeed, the first string is wrong.
> https://bugzilla.mozilla.org/show_bug.cgi?id=698935
> What are the rules for changing the string keys when fixing obvious
> problems in the en-US string?
Rules are that you should change also the key name, unless Axel thinks
it's not necessary in this case since we're discussing it here
https://developer.mozilla.org/en/Making_String_Changes

Also involving localizers before the commit would have been useful (see
typos and consistency comments that could have been addressed before
going in mozilla-central).

Thanks for all the explanations.

Francesco

Rimas Kudelis

unread,
Nov 2, 2011, 9:34:49 AM11/2/11
to dev-...@lists.mozilla.org
In addition to everything else, I also have a small rant regarding these
(and many other) strings:

>> errStrayStartTag=Stray end tag \u201C%1$S\u201D.
>> errStrayEndTag=Stray end tag \u201C%1$S\u201D.

the rant is: do we really need to escape every non-ASCII character in
.properties files? Please don't do that unless there's a really good
reason to. Quote characters (in this case) are typographical features
that should be localized together with the string. IMO, by escaping
them, the original author not only makes them less likely to be
localized, but also makes the string harder to read and parse in a plain
text editor. And for what?

Rimas

Henri Sivonen

unread,
Nov 2, 2011, 9:51:50 AM11/2/11
to dev-...@lists.mozilla.org
On Wed, Nov 2, 2011 at 3:34 PM, Rimas Kudelis <r...@rq.lt> wrote:
> the rant is: do we really need to escape every non-ASCII character in
> .properties files? Please don't do that unless there's a really good
> reason to. Quote characters (in this case) are typographical features
> that should be localized together with the string. IMO, by escaping
> them, the original author not only makes them less likely to be
> localized, but also makes the string harder to read and parse in a plain
> text editor. And for what?

.properties files as originally implemented in the JDK are locked to
ISO-8859-1 (terrible design decision) and require everything else to
be escaped, hence I escaped the non-ISO-8859-1 characters.

Are .properties files in Gecko UTF-8?

Robert Kaiser

unread,
Nov 2, 2011, 9:58:19 AM11/2/11
to
Henri Sivonen schrieb:
>>> errStrayStartTag=Stray end tag \u201C%1$S\u201D.
>>> errStrayEndTag=Stray end tag \u201C%1$S\u201D.
>>
>> First string is wrong, and I wouldn't change it without changing its name.
>
> Indeed, the first string is wrong.
> https://bugzilla.mozilla.org/show_bug.cgi?id=698935
>
> What are the rules for changing the string keys when fixing obvious
> problems in the en-US string?

You need to change the string ID any time you change the string itself
in a way that requires localizers to re-view and possibly change their
localization (i.e. an obvious typo fix doesn't need it, this one does).

>>> errEndTagBr=End tag \u201Cbr\u201D.
>>
>> When is this message displayed?
>
> If the source contains</br>

For one thing, our .properties files are always expected to be UTF-8, so
no need for those ugly \u sequences there. For the other, it would be
good if you would add localization notes (comments) in this file to
explain things where the localizer might not know what's up.


>>> errBadStartTagInHead=Bad start tag in \u201C%1$S\u201D in
>>> \u201Chead\u201D.
>>
>> An example to understand what "%s$S" is?
>
> Any tag name that isn't one of html, link, basefont, bgsound, meta,
> style, noframes, head or noscript.

Again, it would be nice to remove the \u - and please explain things
like that in a localization note comment in the file.

Robert Kaiser

--
Note that any statements of mine - no matter how passionate - are never
meant to be offensive but very often as food for thought or possible
arguments that we as a community should think about. And most of the
time, I even appreciate irony and fun! :)

Robert Kaiser

unread,
Nov 2, 2011, 10:00:52 AM11/2/11
to
flod schrieb:
> Il 02/11/11 09.45, Henri Sivonen ha scritto:
>> Sorry. I thought the rapid release model allowed stuff to land on
>> trunk at any time. If landing this at this time is a serious problem,
>> we could turn it off on Aurora immediately after the uplift.
> Yes, but it means you have less than one week to find and fix errors in
> your 107 strings, since strings are not supposed to change on Aurora.
> Localizers will have a complete Aurora cycle to work on these.

OK, right, he only has that short time to fix problems. I was already
wondering why you complain about landing that short before the freeze,
as we have 6 weeks for L10n _after_ that anyhow, but the comments on
fixing the original are of course valid. I think though that this just
means that review of the L10n file was not good, as usually all those
things should be caught in review and not land with such problems at all.

Robert Kaiser

unread,
Nov 2, 2011, 10:02:03 AM11/2/11
to
Henri Sivonen schrieb:
> .properties files as originally implemented in the JDK are locked to
> ISO-8859-1 (terrible design decision) and require everything else to
> be escaped, hence I escaped the non-ISO-8859-1 characters.

Our files are not JDK files. ;-)

> Are .properties files in Gecko UTF-8?

Yes, we have made that change ages ago.

Rimas Kudelis

unread,
Nov 2, 2011, 1:41:26 PM11/2/11
to
Robert already said they are. I'll just add that unescaping them is not
really a semantic string change, so it would not mandate changing the
string identifiers. ;)

Rimas

Ricardo Palomares Martí­nez

unread,
Nov 2, 2011, 7:46:57 PM11/2/11
to
El 02/11/11 14:58, Robert Kaiser escribió:
> Henri Sivonen schrieb:
>>>> errStrayStartTag=Stray end tag \u201C%1$S\u201D.
>>>> errStrayEndTag=Stray end tag \u201C%1$S\u201D.
>>>
>>> First string is wrong, and I wouldn't change it without changing
>>> its name.
>>
>> Indeed, the first string is wrong.
>> https://bugzilla.mozilla.org/show_bug.cgi?id=698935
>>
>> What are the rules for changing the string keys when fixing obvious
>> problems in the en-US string?
>
> You need to change the string ID any time you change the string itself
> in a way that requires localizers to re-view and possibly change their
> localization (i.e. an obvious typo fix doesn't need it, this one does).


As the reporter of that bug and the one having provided a patch that
didn't change the key name, I have to say that this case seems to a
clear typo for any localizer and thus shouldn't have required a key
name change. Yes, it is true that replacing "end" for "start" may
qualify as a semantic change indeed, but the key name itself and the
fact that the string was inmediately followed by another identical one
with a complementary key name should have led to every localizer to
translate the string from the intended original value.

--
Ricardo Palomares (RickieES)
http://www.mozilla-hispano.org/
http://www.proyectonave.es/
https://diasp.eu/u/rickiees


flod

unread,
Nov 3, 2011, 1:55:11 AM11/3/11
to dev-...@lists.mozilla.org
Il 03/11/11 00.46, Ricardo Palomares Martí­nez ha scritto:
> As the reporter of that bug and the one having provided a patch that
> didn't change the key name, I have to say that this case seems to a
> clear typo for any localizer and thus shouldn't have required a key
> name change.
Hi Ricardo,
I don't agree with you: I don't usually check key names when I localize,
so I realized that this string was wrong by chance (something like "wait
a second, I've already translated this string", so I checked back).

Francesco

Robert Kaiser

unread,
Nov 3, 2011, 9:56:09 AM11/3/11
to
Ricardo Palomares Martí­nez schrieb:
> As the reporter of that bug and the one having provided a patch that
> didn't change the key name, I have to say that this case seems to a
> clear typo for any localizer and thus shouldn't have required a key
> name change.

Wrong. A lot of people don't really look at the string ID and then they
will never realize that there might be a problem.

Robert Kaisre

Rimas Kudelis

unread,
Nov 3, 2011, 12:58:12 PM11/3/11
to dev-...@lists.mozilla.org
by the way, errNcrControlChar seems to be defined twice in that
.properties file.

Rimas

Rimas Kudelis

unread,
Nov 3, 2011, 5:00:43 PM11/3/11
to
2011.11.03 18:58, Rimas Kudelis rašė:
> by the way, errNcrControlChar seems to be defined twice in that
> .properties file.

Also:
errBadStartTagInHead=Bad start tag in \u201C%1$S\u201D in \u201Chead\u201D.
- what does this mean, and does the first 'in' really belong here?

errNoSelectInTableScope=No \u201Cselect\u201D in table scope.
- what does this mean? Does HTML5 say something special about <select>
elements inside <table> elements?

Rimas

Henri Sivonen

unread,
Nov 4, 2011, 7:17:29 AM11/4/11
to Rimas Kudelis, dev-...@lists.mozilla.org
On Thu, Nov 3, 2011 at 11:00 PM, Rimas Kudelis <r...@rq.lt> wrote:
> 2011.11.03 18:58, Rimas Kudelis rašė:
>> by the way, errNcrControlChar seems to be defined twice in that
>> .properties file.

Thanks. https://bugzilla.mozilla.org/show_bug.cgi?id=699753

> Also:
> errBadStartTagInHead=Bad start tag in \u201C%1$S\u201D in \u201Chead\u201D.
> - what does this mean, and does the first 'in' really belong here?

No. The first "in" shouldn't be there.
https://bugzilla.mozilla.org/show_bug.cgi?id=699752

Thanks.

> errNoSelectInTableScope=No \u201Cselect\u201D in table scope.
> - what does this mean? Does HTML5 say something special about <select>
> elements inside <table> elements?

It means that a select element is open but another <select> start tag
cannot close it, because the first select not in the current table
scope. This check is required by the spec. The message is rather
spec-oriented and might not be ideal. There's currently no way to
trigger this error in Firefox, because the error can only occur in the
fragment case, but I included all error messages, because it would be
harder to track situations where error messages become triggerable
later even if they aren't now, so it would be error-prone to try to
not include messages that aren't triggerable right now.

smo

unread,
Nov 6, 2011, 12:08:26 PM11/6/11
to
> hsivo...@iki.fihttp://hsivonen.iki.fi/

This is btw the reason OmegaT will screw up any UTF8 property file
which humbly yours has been getting kistenweise out of Mozilla repos.

Well, some people are strict and some are stricter (sigh) and some
dont give a d*n (ssssigh).

Btw - no need to flame OmegaT: choose key=value file type and
add *.properties to the lineup, with UTF8 for source and target
character encoding.

Have you tried OmegaT lately? No? See you at MozCamp Berlin
- check my talks.

smo
0 new messages