Making the encoding selection UI easier to use

38 views
Skip to first unread message

Matthew Thomas

unread,
Dec 16, 2000, 6:43:50 AM12/16/00
to mozill...@mozilla.org, mozil...@mozilla.org
A bug is open <http://bugzilla.mozilla.org/show_bug.cgi?id=10999> on
making the UI for selecting an encoding easier to use. Katsuhiko Momoi
has suggested I discuss it in n.p.m.i18n before proceeding any further,
so that's what I'm doing now.

What follows is a brief background to text encodings for those of you in
the n.p.m.ui group. Members of the n.p.m.i18n group are welcome to jump
on me if I'm wrong in any of the details.

A document transmitted by a Web server may use HTTP's `Content-Type'
header (or an equivalent META element if it is an HTML document) to
specify that a document has been sent in one of a number of
*encodings*. An encoding is a particular way of determining which
byte (or pattern of bytes) sent by the Web server represents which
character of text. (An equivalent mechanism is used for specifying
the encoding of e-mail and Usenet messages, but for simplicity's
sake I won't cover that here, as it does not affect the user
interface requirements.) From now on, I'll refer to the encoding
specified using HTTP or META as the `document-specified encoding'.

The user agent (e.g. the Web browser) uses the document-specified
encoding information to decide which *character set* to use to
display the text of the page. In modern operating systems, all
encodings are usually displayed using various parts of the Unicode
character; but the character sets used to display text in particular
encodings may vary from platform to platform.

For example, while modern OSes will usually use the Unicode
character set to display text in the ISO-8859-1 encoding, older
computers often use some strain of the ANSI character set, and IBM
mainframes used the EBCDIC character set, to display that same
encoding.

However, sometimes Web servers get it wrong -- they send the wrong
encoding information with a Web page, or no encoding information at
all. In this case the user agent has to choose an encoding itself in
order to display the page. It typically does this through a
combination of two approaches: firstly by trying to guess the
encoding using an `auto-detection module' (which works by looking
for particular patterns of bytes which are common in particular
encodings), and secondly by assuming that the page has used a
particular encoding which is specified by the user.

This discussion is about what the user interface for specifying
that particular encoding, and/or specifying an auto-detection
module, should look like.

The current spec for this
<http://mozilla.org/projects/intl/uidocs/browsercharmenu.html> is
impressively detailed in specifying the user interface for selecting
encodings and auto-detection modules. Katsuhiko Momoi has said in bug
10999 that the spec needs updating to reflect some of the changes that
have been made, so my comments below reflect current behavior in Mozilla
rather than behavior described in the spec (where the two are different).

From using the `Character Coding' submenu on pages written in a variety
of writing systems (Latin, Japanese kana, Chinese kanji, Korean,
Cyrillic), I can see a number of things in the submenu which make it
unnecessarily difficult to use. Working at an Internet cafe where people
from many different countries (especially Asian countries) view Web
pages written in their own languages, I am confident that they would
have a lot more trouble using the encoding selection UI in Mozilla than
they do using the equivalent UI in Internet Explorer 5.0. (That is not
to say that the encoding UI in Internet Explorer is perfect -- far from
it. But I would like the Mozilla UI to be better than that in Internet
Explorer, not worse.)

The problems with Mozilla's UI for encoding selection are as follows.

1. The title of the submenu is `Character Coding'. Many users may not
know what a `character' is. They are far more likely to know what
`text' is, so I suggest that the submenu be called `Text Encoding'
instead.

2. Making a selection from the menu will override the
document-specified encoding (if there was one), but there is no
obvious way to switch back to the document-specified encoding again.
This leaves the user unable to backtrack on their action -- the
equivalent of a word processor without an Undo command.

The spec says:
|
| An override will not persist if a new document is loaded or if the
| current document is reloaded by Super-Reloading, which will
| refresh the document by going to the source page again. If the
| user wants to undo the overrride that has been committed, then the
| user can super-reload the document.

This approach is completely unacceptable for three reasons. Firstly,
super-reloading is to all intents and purposes a hidden feature, so
it is not at all obvious to the user how to reset the encoding of
the page. Secondly, current methods for super-reloading are
practically impossible for disabled users, as I have described
elsewhere <http://bugzilla.mozilla.org/show_bug.cgi?id=46845>. And
thirdly, super-reloading will (and ordinary reloading should) will
clear any form data entered in the page; so using this to reset the
encoding can result in data loss, as I have also described elsewhere
<http://bugzilla.mozilla.org/show_bug.cgi?id=17889>.

3. The `Character Coding' menu item has *three* levels of submenus
-- e.g. `View' > `Character Coding' > `More' > `SE & SW Asian' >
`Vietnamese (Windows-1258)'. Menus are the most difficult of all
common UI controls for users to manipulate, and as the level of
nesting increases linearly, the level of difficulty increases
geometrically. In any professionally-designed software app, having
more than *one* level of submenus is inexcusable.

/The Microsoft Windows user experience/ guidelines say:
|
| [...] Be aware that cascading menus add complexity to the menu
| interface by requiring the user to navigate further through the
| menu structure to get to a particular choice. The navigation also
| requires more effort on the part of the user. In light of these
| design trade-offs, use cascading menus sparingly. Minimize the
| number of levels for any given menu item, ideally limiting your
| design to a single submenu. Avoid using cascading menus for
| frequent, repetitive commands.
|
<http://msdn.microsoft.com/library/books/winguide/ch08b.htm>

Similarly, the /Macintosh Human Interface Guidelines/ say:
|
| Never use more than *one level* of submenus. A submenu at the
| second level would be buried too deep in the interface and would
| unnecessarily create another level of complexity. Also, it takes
| more time for the user to use and peruse a hierarchical menu than
| a pull-down menu. It is physically difficult to use a second level
| of submenus without slipping off the first submenu. Figure 4-40
| shows an example of a technique to avoid using with submenus.
|
| Figure 4-40 Avoid more than one level of submenus
|
| [picture of a submenu which is two levels deep (one level *less*
| nested than the `Character Coding' submenu), with the caption:
| `Don't even *think* of doing this.']
|
<http://developer.apple.com/techpubs/mac/HIGuidelines/HIGuidelines-90.html>

4. The division of encodings in the `More' submenu into what Momoi
describes as `geo-linguistic classifications fairly well-accepted in
linguistic typology'
<http://bugzilla.mozilla.org/show_bug.cgi?id=10999> may be elegant
from a linguistic point of view, but from a user's point of view
it is just an unnecessary level of complication.

Usually, the situation which requires the use of this UI is that of
a page written in a language other than English, and designed for a
non-Latin encoding, being rendered as if it was in a Latin encoding.
In this case, when I see users use the equivalent submenu in IE,
they open the submenu and immediately look for their chosen
*language*. Not the world region in which their chosen language
originated, or is dominant, or whatever, but the name of the
language itself.

When presented with Mozilla's nested submenu of encoding categories,
none of which are actual languages, and most of which have
similarly-shaped names, I suggest that users will not pause (holding
the submenu open) while they calculate which category their desired
language fits into; instead, in an attempt to find the item they
want quickly, they will begin a depth-first traversal of the
submenus looking for the desired encoding, wasting time in the long
run.

5. The use of linguistic region submenus to subcategorize encodings
requires newly-installed encodings to specify which of Mozilla's
UI-specific regions they belong to, something which they may do
incorrectly, or not do at all
<http://forbes.co.il/dev/bidi-mozilla.gif>.

6. The use of third-level submenus to subcategorize the encodings
available implies that selections in each of the submenus are
independent from each other, when they are not. This is especially
confusing since the second-level submenus `Auto-Detect' and `More'
are *not* independent from each other; if I choose `Auto-Detect' >
`Auto-Detect (Chinese)' on a document intended to be displayed
using ISO-8859-1, the Auto-Detection module works correctly and
uses ISO-8859-1, but this is not indicated in the `More' submenu.
(So both `View' > `Character Coding' > `Auto-Detect' > `Auto-Detect
(Chinese)' and `View' > `Character Coding' > `Western (ISO-8859-1)'
are checked at the same time, but `View' > `Character Coding' >
`More' > `Western' > `Western (ISO-8859-1)' is not also checked.)

In any group of radio buttons, one item must be selected at all
times. Similarly, in any radio group of menu items, one item must be
checked at all times. If that is not possible because there are so
many options that they need to be split over several submenus, then
you shouldn't be using menus to present the options at all.

7. The submenus make a distinction between individual encodings and
auto-detection modules. From the user's point of view such a
distinction is completely irrelevant -- they don't care how clever
the programmers were in writing code to distinguish between multiple
encodings, they just want their pages to look right. If there is a
choice available in the `Text Encoding' menu which is able to work
out which of the variety of Japanese encodings is being used on a
particular Japanese Web page, that's great, but the user shouldn't
have to go hunting in the `Auto-Detect' submenu for such an item,
otherwise she won't use it at all. Auto-detect modules should be
available right next to the items for the individual encodings which
they cover.

8. The five most recently-used encodings are selectable from the
first-level submenu. However, it is also possible (using the
misleadingly-named `Customize Character Coding' dialog) to specify
encodings which will always appear in the first-level submenu,
regardless of whether they have been recently used or not. These
encodings are mixed with the recently-used encodings in the
first-level submenu, without any visual distinction being made
between them.

As I said in bug 10999, this seems about as redundant as allowing
the user to specify which files will appear in their `Recent Files'
submenu -- it is only likely to be of any use to a tiny minority of
users who regularly apply more than five different encodings (which
would mean they would probably need to know at least three
languages). For the vast majority of users, their list of
recently-used encodings will be so good as an approximation of the
list of encodings which they want to switch to quickly, that having
an extra dialog for this is just not worth the confusion and bloat
generated by its existence.

I suggest that this feature be ditched. It is only present because
the nested submenus are unnecessarily difficult to use in the first
place, and I believe it will cause more confusion than it allays.

9. When the user then visits an uncached page with a different
document-specified encoding from that which was present in the page
where the encoding was last manually specified, Mozilla appears to
continue using the user-specified encoding. (Internet Explorer
appears to do this as well.) This seems rather unintelligent --
since if you go to a page which has a different document-specified
encoding, it is highly likely you have gone to a different Web site,
so Mozilla should return to its default assumption that the Web page
is able to specify its own encoding correctly.

Therefore, I would suggest that the most reliable behavior would be
to revert to using the document-specified encoding when the user
firsts visits a page which has a different document-specified
encoding from that which was present when the user last overrode the
encoding. (In most cases, this should not affect the encoding used
to show documents navigated to via the Back button, because the
encoding used should be stored in the cache along with each cached
document.)

As a solution to these problems (except the last one above, which
requires back-end work), I suggest that the current set of nested
submenus be replaced by a single submenu and a dialog.

The submenu would look like this:
_________________________________
Te_xt Encoding > |/ _Automatic |
|---------------------------------|
|* Western (ISO-8859-1) |
| Central European (ISO-8859-2) |
| Chinese (Simplified) |
| Japanese (Auto-Detect) |
| _Other ... |
|---------------------------------|
|* _Left to Right |
| _Right to Left |
`"""""""""""""""""""""""""""""""""'

`Automatic' would specify whether or not the document-specified encoding
was being used. The four items following that would be the four most
recently-used encodings *or* auto-detection modules. These four items
would be listed in alphabetical order (rather than order of most use or
order of most recent use), with the exception of ISO-8859-1, to maximize
the stability of the arrangement of the submenu.

`Other ...' would open this dialog:

+--------------------------------------------+
| Text Encoding :::::::::::::::::::::::::::::|
+--------------------------------------------+
| (*) _Automatic (use the encoding |
| specified by the document) |
| |
| ( ) Use a different _encoding: |
| +----------------------------------+-+ |
| |Western (ISO-8859-1) |A| |
| |Armenian (ARMSCII-8) |:| |
| |Baltic (ISO-8859-4) |:| |
| |Cyrillic (Auto-Detect) |:| |
| |Cyrillic (ISO-8859-5) |:| |
| |Cyrillic (KOI8-R) |:| |
| |Cyrillic (Windows-1251) |:| |
| |Cyrillic (ISO-IR-111) |V| |
| +----------------------------------+-+ |
| Direction: |
| (*) _left to right ( ) _right to left |
| |
| ( Cancel ) (( Ok )) /
+-------------------------------------------/+

The placement of auto-detection modules in the same list as individual
encodings in the UI has the corollary that auto-detection modules should
be named with the the writing system for which they offer the greatest
support first (e.g. `Japanese and East Asian'), so that users can find
them easily in the list.

--
Matthew `mpt' Thomas, Mozilla user interface QA
Mozilla UI decisions made within 48 hours, or the next one is free

Xslf

unread,
Dec 16, 2000, 2:37:06 PM12/16/00
to mozill...@mozilla.org, mozil...@mozilla.org, "Matthew Thomas"
Another problem I have found with the current menu system, that it is not
clear how can I (the user) change the encoding in a *single* frame in a
frameset without changing the encoding in the rest of the frames.

In IE, changing the encoding from the menu (or toolbar button/menu) changes
the encoding for the full frameset, while changing the encoding from the
context menu changes the encoding only in the frame that has focus at the
moment.

About the "revert to assuming the site has the correct encoding specified"
behavior- I would not want mozilla to behave that way, just for the sheer
number of (in my case Hebrew) web sites which have incorrect encoding
specified.

I am afraid that if such a behavior is implemented, I would spend half my
surfing time changing encoding manually.

I think that the behavior that assumes the default from the previous site
makes sense, since most users I know spend most of their time using just one
or two languages, so most pages are in the encoding of the previous page.

Just IMHO.

Shoshannah Forbes
--
Hebrew services for site owners http://www.bool.co.il
Moderator, site builder forum http://forums.nana.co.il/forum.asp?id=17
Personal site: http://www.forbes.co.il

Roozbeh Pournader

unread,
Dec 16, 2000, 3:42:27 PM12/16/00
to Xslf, mozil...@mozilla.org, mozill...@mozilla.org, "Matthew Thomas"

On 16 Dec 2000, Xslf wrote:

> I think that the behavior that assumes the default from the previous site
> makes sense, since most users I know spend most of their time using just one
> or two languages, so most pages are in the encoding of the previous page.

This should of course be optional, since Persian is not encoded in one
wrong way only, but many!!

--roozbeh

Katsuhiko Momoi

unread,
Dec 16, 2000, 7:12:07 PM12/16/00
to m...@mailandnews.com, mozill...@mozilla.org, mozil...@mozilla.org, mozill...@mozilla.org, b...@netscape.com
[Missed Mozilla L10n newsgroup and so re-sending it with the corrected
address. -KM]

Matthew, I appreciate you taking the time to raise some interesting
issues. Below I have done my best to explain what the current specs are
and ways to solve the problems you mention by means other than the ones
you suggest.


Matthew Thomas wrote:
>
> A bug is open <http://bugzilla.mozilla.org/show_bug.cgi?id=10999> on
> making the UI for selecting an encoding easier to use. Katsuhiko Momoi
> has suggested I discuss it in n.p.m.i18n before proceeding any further,
> so that's what I'm doing now.
>
> What follows is a brief background to text encodings for those of you in
> the n.p.m.ui group. Members of the n.p.m.i18n group are welcome to jump
> on me if I'm wrong in any of the details.
>
> A document transmitted by a Web server may use HTTP's `Content-Type'
> header (or an equivalent META element if it is an HTML document) to
> specify that a document has been sent in one of a number of
> *encodings*. An encoding is a particular way of determining which
> byte (or pattern of bytes) sent by the Web server represents which
> character of text. (An equivalent mechanism is used for specifying
> the encoding of e-mail and Usenet messages, but for simplicity's
> sake I won't cover that here, as it does not affect the user
> interface requirements.) From now on, I'll refer to the encoding
> specified using HTTP or META as the `document-specified encoding'.
>
> The user agent (e.g. the Web browser) uses the document-specified
> encoding information to decide which *character set* to use to
> display the text of the page.

You mean which "font" that corresponds to the encoding used?

> In modern operating systems, all
> encodings are usually displayed using various parts of the Unicode
> character; but the character sets used to display text in particular
> encodings may vary from platform to platform.

What is the "Unicode character" you mention here?
Also what do you really mean by "but the character sets used to display


text in particular encodings may vary from platform to platform."

These may be nit picky points but this paragraph makes no sense.


>
> For example, while modern OSes will usually use the Unicode
> character set to display text in the ISO-8859-1 encoding,

Again, what does this mean? Unicode is a character encoding scheme for
character and text.


> older computers often use some strain of the ANSI character set, and IBM
> mainframes used the EBCDIC character set, to display that same
> encoding.

This is not correct. ISO-8859-1 is both an encoding method and character
set but the so-called ANSI and the EBCDIC are 2 of the many ASCII coding
methods for the ASCII set. ISO-8859-1 is a superset of ASCII and thus
this paragraph makes little sense.

>
> However, sometimes Web servers get it wrong -- they send the wrong
> encoding information with a Web page, or no encoding information at
> all. In this case the user agent has to choose an encoding itself in
> order to display the page. It typically does this through a
> combination of two approaches: firstly by trying to guess the
> encoding using an `auto-detection module' (which works by looking
> for particular patterns of bytes which are common in particular
> encodings), and secondly by assuming that the page has used a
> particular encoding which is specified by the user.

This is not what Mozilla does. Auto-detection and/or user-specified
default encoding is used as a fallback only if the server or document
offers no encoding information. We don't use "encoding which is
specified by the user" except indireclty in the case of inheritance as
described below.

>
> This discussion is about what the user interface for specifying
> that particular encoding, and/or specifying an auto-detection
> module, should look like.
>
> The current spec for this
> <http://mozilla.org/projects/intl/uidocs/browsercharmenu.html> is
> impressively detailed in specifying the user interface for selecting
> encodings and auto-detection modules. Katsuhiko Momoi has said in bug
> 10999 that the spec needs updating to reflect some of the changes that
> have been made, so my comments below reflect current behavior in Mozilla
> rather than behavior described in the spec (where the two are different).
>
> From using the `Character Coding' submenu on pages written in a variety
> of writing systems (Latin, Japanese kana, Chinese kanji, Korean,
> Cyrillic), I can see a number of things in the submenu which make it
> unnecessarily difficult to use. Working at an Internet cafe where people
> from many different countries (especially Asian countries) view Web
> pages written in their own languages, I am confident that they would
> have a lot more trouble using the encoding selection UI in Mozilla than
> they do using the equivalent UI in Internet Explorer 5.0. (That is not
> to say that the encoding UI in Internet Explorer is perfect -- far from
> it. But I would like the Mozilla UI to be better than that in Internet
> Explorer, not worse.)

I said this in the above bug but the experience of users at Internet
Cafe should not be the driving force for this part of the UI spec.
Internet cafes need to serve users of different language background with
one program with one profile. The reason IE seems easier to use has
little to do with the menu UI -- rather it has to do with the existence
of universal auto-detection module they use and the fact that a sizable
number of web sites send HTTP charset or document-based charset info.
Netscape 6 has a *proprietary* universal auto-detector in it. It needs
to be improved but when this is ON, it may help such customers by making
it unnecessary to use this menu. This is largely how IE feels better in
this regard, i.e. users don't have to use this menu much. That should be
our goal, too. Note that there is a lot of unknown about universal
auto-detection. Some studies have shown than IE universal auto-detection
has less than 50% success rate when matched against pages from diverse
languages and web sites. Mozilla developers might want to offer their
own version of universal detector.

Our main focus should be taking care of the scenario where a single
monolingual user uses Mozilla on a personal machine.

>
> The problems with Mozilla's UI for encoding selection are as follows.
>
> 1. The title of the submenu is `Character Coding'. Many users may not
> know what a `character' is. They are far more likely to know what
> `text' is, so I suggest that the submenu be called `Text Encoding'
> instead.
>

This menu under Communicator was named "Character Set". Users may have
difficulty with "Set" part but I doubt that they had much difficulty
with the "Character" part. We thought it would be easier for users to
use a familiar term "Character". One reason we did not use "encoding" is
that items that come under this menu are not all classifiable as
"encoding". For example, auto-detection modules are not encodings. They
are tools to detect an encoding of the document. (It is essential to
keep them apart as you will see below.) On the other hand, "Coding" is a
neutral term free of the meaning associated with "encoding". This is one
main reason why I and others suggested this term. If readers of an
earlier thread of discussion on this topic can offer other reasons why
this term was chosen during a discussion in the i18n newsgroups, I would
appreciate hearing about them again.

> 2. Making a selection from the menu will override the
> document-specified encoding (if there was one), but there is no
> obvious way to switch back to the document-specified encoding again.
> This leaves the user unable to backtrack on their action -- the
> equivalent of a word processor without an Undo command.
>
> The spec says:
> |
> | An override will not persist if a new document is loaded or if the
> | current document is reloaded by Super-Reloading, which will
> | refresh the document by going to the source page again. If the
> | user wants to undo the overrride that has been committed, then the
> | user can super-reload the document.
>
> This approach is completely unacceptable for three reasons. Firstly,
> super-reloading is to all intents and purposes a hidden feature, so
> it is not at all obvious to the user how to reset the encoding of
> the page. Secondly, current methods for super-reloading are
> practically impossible for disabled users, as I have described
> elsewhere <http://bugzilla.mozilla.org/show_bug.cgi?id=46845>. And
> thirdly, super-reloading will (and ordinary reloading should) will
> clear any form data entered in the page; so using this to reset the
> encoding can result in data loss, as I have also described elsewhere
> <http://bugzilla.mozilla.org/show_bug.cgi?id=17889>.


I think comment and suggestions here are based on misunderstanding of
the charset override function. Users will not override an existing HTTP
charset or document charset unless there is a problem with display. In a
majority of web pages, pages display correctly with 1) HTTP charset, 2)
document charset info, or 3) auto-detection (if chosen and applicable),
or 4) the default encoding chosen by the user via "Edit |Prefs |
Navigator | Character Coding" in case 1), 2), and 3) are unavailable.
Overriding a document-based charset is needed only in those cases where
the charset info provided is wrong and the page is displaying
incorrectly. In such a case, Mozilla currently offers a single override.
You just corrected a wrong display to the right one with the menu action
of choosing another encoding. Why do you want to go back to the
incorrect display? If your 1st attempt at correcting does not give you
the right result, then you try another menu item to correct it. Let's
say that it is finally displaying correctly after 2 tries. Why would we
want to go back to incorrect display?

With regard to the form example and data loss, you want the page to
display correctly in the first place before inputting any data. No user
will attempt to enter form data on a page which is incorrectly
displaying due to a wrong HTTP or document charset. So, you do an
override, now the page displays correctly, and then you begin entering
the data for the form. Why would anyone want to super-reload the
correctly displaying page, lose data and then to boot end up in the
original incorrect display? That is not the user scenario we should be
covering with this function.


>
> 3. The `Character Coding' menu item has *three* levels of submenus
> -- e.g. `View' > `Character Coding' > `More' > `SE & SW Asian' >
> `Vietnamese (Windows-1258)'. Menus are the most difficult of all
> common UI controls for users to manipulate, and as the level of
> nesting increases linearly, the level of difficulty increases
> geometrically. In any professionally-designed software app, having
> more than *one* level of submenus is inexcusable.

I am aware of these UI considerations. But shoving these items into the
dialog does not solve the problem we are trying to address.
Average users' needs are covered by the following default settings:

1. Default Browser/Editor character coding: Edit | Prefs | Languages |
Character Coding
2. Default Mail Viewing character coding: Edit | Prefs | Mail &
Newsgroup | Message Display | Character Coding
3. Default Mail Send character coding: Edit | Prefs | Mail & Newsgroup |
Message Composition | Composing Messages | Character Coding
4. (optionally for languages with auto-detection modules). An
appropriate auto-detection module set.

These 3 (or 4 with auto-detection) once they are set correctly serve
almost all average users. They key is in improving the default setting
mechanism. Currently, localizers will set them correctly for their own
locale. Thus, people who have to learn to set these themselves are a
small set of users like "Internet Cafe" users whose admin people have
not yet learned how to set the defaults. Or since Internet cafe admins
cannot predict the preference of each user ahead of time, an easier way
to set the defaults.
Setting the defaults more easily will solve the problem Matthew is
trying to solve.

Sub menus thus are there mainly for advanced users who need more
flexibility or access to many encodings. For these rarely used items,
overhauling the sub menus seems too much of wasted efforts. It works Ok
now for the segment of users we are targeting. Putting them in the
dialog only makes the access harder for the advanced users who might
want to seem them in the menu. Now, if enough of the target people
dislike geo-linguistic sub-groupings, then one way to make a change is
to put them all under one sub menu called "Others...". If we can include
separators between different language or encoding groups, then that
would make it easier.
I would say that I have received some positive comments about
sub-grouping of encoding menu items from advanced users.


(MS documentation skipped...)

>
> 4. The division of encodings in the `More' submenu into what Momoi
> describes as `geo-linguistic classifications fairly well-accepted in
> linguistic typology'
> <http://bugzilla.mozilla.org/show_bug.cgi?id=10999> may be elegant
> from a linguistic point of view, but from a user's point of view
> it is just an unnecessary level of complication.
>
> Usually, the situation which requires the use of this UI is that of
> a page written in a language other than English, and designed for a
> non-Latin encoding, being rendered as if it was in a Latin encoding.
> In this case, when I see users use the equivalent submenu in IE,
> they open the submenu and immediately look for their chosen
> *language*. Not the world region in which their chosen language
> originated, or is dominant, or whatever, but the name of the
> language itself.

See my comments above. If enough people dislike sub-grouping of encoding
families, it would be easy to switch to MS style single sub-menu.
However, let me also say that Mozilla covers many more encodings than MS
and our fear was that with that many more items, some sub-groupings will
be helpful.
Remember that we are likely to add some more items to this menu so that
eventually the number might reach 80-90. Is a single sub-menu easier for
advanced users? Or the sub-groups? (I am assuming that average users
will not be using this sub-menu at all or very rarely.) Let me hear your
opinions.


>
> When presented with Mozilla's nested submenu of encoding categories,
> none of which are actual languages, and most of which have
> similarly-shaped names, I suggest that users will not pause (holding
> the submenu open) while they calculate which category their desired
> language fits into; instead, in an attempt to find the item they
> want quickly, they will begin a depth-first traversal of the
> submenus looking for the desired encoding, wasting time in the long
> run.

I think there is some truth in this comment. But let me also suggest
that that hesitation would be for the first time. Given the
sub-categorizations like East Asian, I think the user will quickly get
used to the fact that Chinese, Japanese and Korean belong there. I doubt
very much that they will keep on having this problem. Also, if you are
heavy users of certain encodings, you should put them in the permanent
or static list on the 1st tier via the "Customize..." menu. Again, for
the target users we have in mind, I don't see much difference. A long
menu with 80 or more items will be much worse, I fear.

>
> 5. The use of linguistic region submenus to subcategorize encodings
> requires newly-installed encodings to specify which of Mozilla's
> UI-specific regions they belong to, something which they may do
> incorrectly, or not do at all
> <http://forbes.co.il/dev/bidi-mozilla.gif>.

The example image has to do with Arabic and Hebrew, which hopefully get
a full support in the near future. I think we will be classing them into
Middle Eastern when the support for Bi-di becomes finalized. Thus, this
will not be a user-level decision. The engineers involved will put them
correctly. This is a non-issue and something we can resolve via a bug.

>
> 6. The use of third-level submenus to subcategorize the encodings
> available implies that selections in each of the submenus are
> independent from each other, when they are not.

They are independent from each other. You're simply misunderstanding how
these menu items work.

> This is especially
> confusing since the second-level submenus `Auto-Detect' and `More'
> are *not* independent from each other; if I choose `Auto-Detect' >
> `Auto-Detect (Chinese)' on a document intended to be displayed
> using ISO-8859-1, the Auto-Detection module works correctly and
> uses ISO-8859-1, but this is not indicated in the `More' submenu.

Auto-detection choice is independent of the encoding menu items. This is
why they cannot be under the same sub-group. Once you choose an
auto-detection, that detection module is ON at all times for every page
until auto-detection is tuned OFF. (Maybe we should mark this fact by
changing a bullet to a checkmark for the Auto-detection modules.

> (So both `View' > `Character Coding' > `Auto-Detect' > `Auto-Detect
> (Chinese)' and `View' > `Character Coding' > `Western (ISO-8859-1)'
> are checked at the same time, but `View' > `Character Coding' >
> `More' > `Western' > `Western (ISO-8859-1)' is not also checked.)

Yes. That is true phenomenally. Ideally we would like the items in the
sub-menu to disappear when it moves to the top tier because it now is in
the cache. We haven't had time to implement it yet. That should solve
the confusion. I should say this now again so that there is no
misunderstanding of the current spec:

A. Auto-detect modules are always ON if chosen and always participate in
determining the document coding if HTTP charset or document-based
charset is no available.
B. The bullet mark you see under "`View' > `Character Coding'" does NOT
indicate what you choose as the default encoding. (The default encoding
is set in the Preferences.). The bullet show the feedback from Mozilla
indicating what Mozilla considers the current encoding of the document
is. That is all that the bullet means -- what Mozilla has determined
its current encoding based on HTTP charset, document-based charset,
Auto-detection, or Pref-based default encoding.

The bullet mark is a feedback mechanism -- not the default choice by the
user.

Thus choosing another encoding item is an instruction to Mozilla to
reload that document assuming that it is in the chosen encoding. This
manual action will override HTTP, document-based charset info, or the
auto-detection result. Otherwise (i.e. when none of the 3 obtains), it
will simply reload the document under a new charset.

>
> In any group of radio buttons, one item must be selected at all
> times. Similarly, in any radio group of menu items, one item must be
> checked at all times. If that is not possible because there are so
> many options that they need to be split over several submenus, then
> you shouldn't be using menus to present the options at all.

Again, there is a bug. We should temporarily remove the item under a
sub-menu if that item gets the cached status.

>
> 7. The submenus make a distinction between individual encodings and
> auto-detection modules. From the user's point of view such a
> distinction is completely irrelevant -- they don't care how clever
> the programmers were in writing code to distinguish between multiple
> encodings, they just want their pages to look right. If there is a
> choice available in the `Text Encoding' menu which is able to work
> out which of the variety of Japanese encodings is being used on a
> particular Japanese Web page, that's great, but the user shouldn't
> have to go hunting in the `Auto-Detect' submenu for such an item,
> otherwise she won't use it at all. Auto-detect modules should be
> available right next to the items for the individual encodings which
> they cover.

First, Auto-detection is not an encoding and should not be mixed with
encodings. This becomes even clearer with more discussion below.
The user does not have to hunt for auto-detection menu usually. Each
localizer will have set that item appropriately for that locale. Thus
the default option for auto-detection is ON if it is available for the
user's locale/language. What you are not taking into account is that we
cannot generalize from the behavior of Mozilla users who are mostly
using English UI for their languages. These are minority of people --
developers, testers, Mozilla enthusiasts. Average users will get the UI
in their language and the default encoding and auto-detection set
properly for that locale. It is the localizer's job to set them
correctly in the first place. Mozilla is designed with these defaults so
that average users will not have to worry about issues you're talking
about.

>
> 8. The five most recently-used encodings are selectable from the
> first-level submenu. However, it is also possible (using the
> misleadingly-named `Customize Character Coding' dialog) to specify
> encodings which will always appear in the first-level submenu,
> regardless of whether they have been recently used or not. These
> encodings are mixed with the recently-used encodings in the
> first-level submenu, without any visual distinction being made
> between them.

This is how IE menu is designed also. IE has 1 or more static menu items
which are there at all times plus the cached items without any
separator. If enough people want it, we can probably put in a separator.

>
> As I said in bug 10999, this seems about as redundant as allowing
> the user to specify which files will appear in their `Recent Files'
> submenu -- it is only likely to be of any use to a tiny minority of
> users who regularly apply more than five different encodings (which
> would mean they would probably need to know at least three
> languages). For the vast majority of users, their list of
> recently-used encodings will be so good as an approximation of the
> list of encodings which they want to switch to quickly, that having
> an extra dialog for this is just not worth the confusion and bloat
> generated by its existence.
>
> I suggest that this feature be ditched. It is only present because
> the nested submenus are unnecessarily difficult to use in the first
> place, and I believe it will cause more confusion than it allays.

That is not so. The Character Coding Customize is there because for
advanced users, there is no way to predict which sets of encodings they
would like to have there permanently. We want that to be flexible and be
left up to the users who need it. A vast majority of people will get
their defaults set by localizers. Note also that we can have the static
menu even when the cache is lost. It is also wrong to assume that some
users will not need more than 5 items. If you want to do Chinese and
Japanese -- then you need a minimum of 5 and plus Western which is
needed for just about everyone. That is 6. Some languages like Russian
may have more than 5 available encodings in use. Yes, to some extent
this is an overkill but I believe it is better to provide this
flexibility to advanced users.

>
> 9. When the user then visits an uncached page with a different
> document-specified encoding from that which was present in the page
> where the encoding was last manually specified, Mozilla appears to
> continue using the user-specified encoding. (Internet Explorer
> appears to do this as well.) This seems rather unintelligent --
> since if you go to a page which has a different document-specified
> encoding, it is highly likely you have gone to a different Web site,
> so Mozilla should return to its default assumption that the Web page
> is able to specify its own encoding correctly.

I think you're misunderstanding how this works in Mozilla currently.
If you go to a new uncached page, it will go through the following
determination hierarchy process again:

1) Honor HTTP charset if it exists
2) Honor document-based charset if it exists
3) Try auto-detection if one is chosen
4) If 1-3 all fail, then display the page with the default set in Prefs
| Navigator | Languages | Character Coding.

These catch a huge number of cases already unless the pages are
mislabeled with 1) or 2). Many hours of testing indicate this to be
true. We don't pay attention to manually specified encoding for the last
page. There is one exception to this rule and that is when the user
opens a new navigator page from an existing page. In this case, it would
be best to inherit whatever is on that previous page.

>
> Therefore, I would suggest that the most reliable behavior would be
> to revert to using the document-specified encoding when the user
> firsts visits a page which has a different document-specified
> encoding from that which was present when the user last overrode the
> encoding.

If there is document-specified encoding and Mozilla is not honoring it,
it is simply a bug and not a spec. Please file a bug and provide a
specific instance.

> (In most cases, this should not affect the encoding used
> to show documents navigated to via the Back button, because the
> encoding used should be stored in the cache along with each cached
> document.)
>
> As a solution to these problems (except the last one above, which
> requires back-end work), I suggest that the current set of nested
> submenus be replaced by a single submenu and a dialog.
>

Since your analysis is so fraught with problems and misunderstanding, I
suggest that we don't adopt these proposals.

> The submenu would look like this:
> _________________________________
> Te_xt Encoding > |/ _Automatic |
> |---------------------------------|
> |* Western (ISO-8859-1) |
> | Central European (ISO-8859-2) |
> | Chinese (Simplified) |
> | Japanese (Auto-Detect) |
> | _Other ... |
> |---------------------------------|
> |* _Left to Right |
> | _Right to Left |
> `"""""""""""""""""""""""""""""""""'
>
> `Automatic' would specify whether or not the document-specified encoding
> was being used.

Remember that the bullet indicates what the current document encoding
is, NOT what you have chosen as the default. So this menu is incoherent.
A bullet mark as a feedback should be placed only to a true encoding,
not to an auto-detection module. Even IE does NOT do that. If you look
closely at IE behavior, you will notice that even if you choose an
auto-detection module like Japanese, when you go to a page, the bullet
is placed next to one of the 3 Japanese encodings. This is very
confusing! That is because they don't distinguish "auto-detect" module
which is ALWAYS ON unless you choose something else from real encodings.
That is why even when you choose an auto-detection module, you will
never see that module bulleted again IE. This IE-style UI should not be
imitated. It also makes no sense given what Mozilla backend does.

We improved on this confusing UI by separating auto-detection (always ON
if chosen) from true encodings. A bullet means only one thing under
Mozilla, i.e. the current encoding of document as determined by Mozilla.
One improvement would be to make auto-detection choice like a radio menu
or mark with a check mark to separate it cognitively from a bullet.

Changing the menu means only one thing: Reload the document assuming
that it is in the chosen encoding.

> The four items following that would be the four most
> recently-used encodings *or* auto-detection modules. These four items
> would be listed in alphabetical order (rather than order of most use or
> order of most recent use), with the exception of ISO-8859-1, to maximize
> the stability of the arrangement of the submenu.

I think these are too few for Japanese, Chinese or Cyrillic users.
Japanese uses at minimum 3 encodings plus Western. If you add Chinese to
the mix which some Japanese users will do, for example, the number jumps
to 6. For Russian, Chinese and other language users (advanced),
dictating the permanent set by way of cache "indirectly" is a big
mistake. If there is one thing we learned over the years, language needs
cannot be easily categorized for a sizable number of international users
-- though they may be in the minority. There are many languages which
use both ISO and Windows type encoding and possibly Mac type as well. If
you mix 3 of these languages, the number may have to easily exceed 5 or
6. In my opinion, it is a mistake to limit the number in the way you
suggest via cache. That is why we offered the Customize menu and I still
think it is a good idea for advanced users we serve.

How do you distinguish auto-detection which is always ON from an
encoding which only indicates what the current document is in? This
mixing is incoherent and will not make sense given what our backend
does.

What you have not taken into account at all is that Mozilla already has
default settings via the Pref menu. You're confusing the default setting
from a list of encodings which can be potentially used to mark the
current encoding.

Here is my overall assessment of your proposal. The proposal as it
stands makes little sense due to misunderstanding on how the underlying
mechanisms work and does not do anything to improve the lot of people
who use Mozilla at Internet Cafes or average users who will normally be
using Mozilla in their own language UI with all the defaults set
properly. (Try using Japanese version of Mozilla 0.6 and see how well
they set the defaults for the average Japanese users.)

The solution to Internet Cafe customers, Mozilla developers, and other
minority users of this type lie in a better way to set all the defaults
correctly with one action when Mozilla knows the preferred language of
the user. Currently, you have to change 2 to 4 settings (if you include
Mail) to get this done. We know that the correct defaults make a huge
difference.

Our efforts should be focused on this area rather than on the sub-menus
which target mostly advanced users. I envision an option which asks what
the preferred language of the user is and once given that value, sets 1)
Default Browser/Editor Character coding, 2) Default Mail Display
Character coding, 3) Default Mail Send Character Coding, and 4) an
auto-detection module, all at once. This type of option will help a very
large number of users.

It is possible to change sub-menus into one long sub-menu with
separators. We might do that but let's also make sure that because
Mozilla covers many more languages/encodings than IE and the sub-menu
idea should be evaluated by real target users, not by UI pros only.

Frankly, in my opinion putting menu items into a dialog will
disadvantage the target advanced users we have in mind.

My suggesrions are then as follows:

1. The suggested proposal should be rejected.
2. However, Matthew's article points to some problems. Let's solve them
via bugs. I believe they are fixable.
3. Let's try to improve the default settings -- a proposal to do this
from the preferred language value will be forthcoming. This is where our
focus should be, not on features which target small set of advanced
users.
4. If you are concerned about Internet Cafe users, just increase the
number of permanent menu items via the Customize menu to match your
customer base so that all the encodings your customers need will be
visible on the 1st tier. Some Internet Cafe may cater to mostly Chinese,
Japanese, and Korean users. In that case, customize the menu to place 7
items needed for these 3 languages, etc. Customize dialog offers this
type of flexibility to un-predictable language prference environments.

Finally, I would like to stress that testing and making adjustments to
have both the UI and backend work well for many differnt types of users
and languages is not a trivial task. Netscape internal QAs and other
volunteers have spent many many hours to resolve problems and
inconsistencies in the current implementation. In my opinion, any
overhaul of this menu requires serious long-term commitment from the
same set of people for a large number of test cases and testing hours.
Please believe me when I say that we need consistent and committed
support for testing. For this reason, too, I am not inclined to agree
with changes which do not benefit the main target users (i.e. average
users) in a big way. If it is an incremental bug fix within the current
franmework, let's do it one by one. And let's try to do the dialog to
set all the lang prefs in one action. (This of course means that we need
to come up with the set of defaults for all the languages we support --
not a trivial task but can be done eventually.)

If it helps people who are not necessarily in the i18n field, I would be
happy to update the current spec document and also include introductory
remarks to explain these issues in a less technical way.

- Kat

Katsuhiko Momoi

unread,
Dec 16, 2000, 7:10:41 PM12/16/00
to m...@mailandnews.com, mozill...@mozilla.org, mozil...@mozilla.org, mozill...@netscape.com, b...@netscape.com
Matthew, I appreciate you taking the time to raise some interesting
issues. Below I have done my best to explain what the current specs are
and ways to solve the problems you mention by means other than the ones
you suggest.


Matthew Thomas wrote:
>
> A bug is open <http://bugzilla.mozilla.org/show_bug.cgi?id=10999> on
> making the UI for selecting an encoding easier to use. Katsuhiko Momoi
> has suggested I discuss it in n.p.m.i18n before proceeding any further,
> so that's what I'm doing now.
>
> What follows is a brief background to text encodings for those of you in
> the n.p.m.ui group. Members of the n.p.m.i18n group are welcome to jump
> on me if I'm wrong in any of the details.
>
> A document transmitted by a Web server may use HTTP's `Content-Type'
> header (or an equivalent META element if it is an HTML document) to
> specify that a document has been sent in one of a number of
> *encodings*. An encoding is a particular way of determining which
> byte (or pattern of bytes) sent by the Web server represents which
> character of text. (An equivalent mechanism is used for specifying
> the encoding of e-mail and Usenet messages, but for simplicity's
> sake I won't cover that here, as it does not affect the user
> interface requirements.) From now on, I'll refer to the encoding
> specified using HTTP or META as the `document-specified encoding'.
>
> The user agent (e.g. the Web browser) uses the document-specified
> encoding information to decide which *character set* to use to
> display the text of the page.

You mean which "font" that corresponds to the encoding used?

> In modern operating systems, all


> encodings are usually displayed using various parts of the Unicode
> character; but the character sets used to display text in particular
> encodings may vary from platform to platform.

What is the "Unicode character" you mention here?
Also what do you really mean by "but the character sets used to display


text in particular encodings may vary from platform to platform."

These may be nit picky points but this paragraph makes no sense.


>

> For example, while modern OSes will usually use the Unicode
> character set to display text in the ISO-8859-1 encoding,

Again, what does this mean? Unicode is a character encoding scheme for
character and text.


> older computers often use some strain of the ANSI character set, and IBM
> mainframes used the EBCDIC character set, to display that same
> encoding.

This is not correct. ISO-8859-1 is both an encoding method and character


set but the so-called ANSI and the EBCDIC are 2 of the many ASCII coding
methods for the ASCII set. ISO-8859-1 is a superset of ASCII and thus
this paragraph makes little sense.

>

> However, sometimes Web servers get it wrong -- they send the wrong
> encoding information with a Web page, or no encoding information at
> all. In this case the user agent has to choose an encoding itself in
> order to display the page. It typically does this through a
> combination of two approaches: firstly by trying to guess the
> encoding using an `auto-detection module' (which works by looking
> for particular patterns of bytes which are common in particular
> encodings), and secondly by assuming that the page has used a
> particular encoding which is specified by the user.

This is not what Mozilla does. Auto-detection and/or user-specified


default encoding is used as a fallback only if the server or document
offers no encoding information. We don't use "encoding which is
specified by the user" except indireclty in the case of inheritance as
described below.

>

> This discussion is about what the user interface for specifying
> that particular encoding, and/or specifying an auto-detection
> module, should look like.
>
> The current spec for this
> <http://mozilla.org/projects/intl/uidocs/browsercharmenu.html> is
> impressively detailed in specifying the user interface for selecting
> encodings and auto-detection modules. Katsuhiko Momoi has said in bug
> 10999 that the spec needs updating to reflect some of the changes that
> have been made, so my comments below reflect current behavior in Mozilla
> rather than behavior described in the spec (where the two are different).
>
> From using the `Character Coding' submenu on pages written in a variety
> of writing systems (Latin, Japanese kana, Chinese kanji, Korean,
> Cyrillic), I can see a number of things in the submenu which make it
> unnecessarily difficult to use. Working at an Internet cafe where people
> from many different countries (especially Asian countries) view Web
> pages written in their own languages, I am confident that they would
> have a lot more trouble using the encoding selection UI in Mozilla than
> they do using the equivalent UI in Internet Explorer 5.0. (That is not
> to say that the encoding UI in Internet Explorer is perfect -- far from
> it. But I would like the Mozilla UI to be better than that in Internet
> Explorer, not worse.)

I said this in the above bug but the experience of users at Internet


Cafe should not be the driving force for this part of the UI spec.
Internet cafes need to serve users of different language background with
one program with one profile. The reason IE seems easier to use has
little to do with the menu UI -- rather it has to do with the existence
of universal auto-detection module they use and the fact that a sizable
number of web sites send HTTP charset or document-based charset info.
Netscape 6 has a *proprietary* universal auto-detector in it. It needs
to be improved but when this is ON, it may help such customers by making
it unnecessary to use this menu. This is largely how IE feels better in
this regard, i.e. users don't have to use this menu much. That should be
our goal, too. Note that there is a lot of unknown about universal
auto-detection. Some studies have shown than IE universal auto-detection
has less than 50% success rate when matched against pages from diverse
languages and web sites. Mozilla developers might want to offer their
own version of universal detector.

Our main focus should be taking care of the scenario where a single
monolingual user uses Mozilla on a personal machine.

>

> The problems with Mozilla's UI for encoding selection are as follows.
>
> 1. The title of the submenu is `Character Coding'. Many users may not
> know what a `character' is. They are far more likely to know what
> `text' is, so I suggest that the submenu be called `Text Encoding'
> instead.
>

This menu under Communicator was named "Character Set". Users may have


difficulty with "Set" part but I doubt that they had much difficulty
with the "Character" part. We thought it would be easier for users to
use a familiar term "Character". One reason we did not use "encoding" is
that items that come under this menu are not all classifiable as
"encoding". For example, auto-detection modules are not encodings. They
are tools to detect an encoding of the document. (It is essential to
keep them apart as you will see below.) On the other hand, "Coding" is a
neutral term free of the meaning associated with "encoding". This is one
main reason why I and others suggested this term. If readers of an
earlier thread of discussion on this topic can offer other reasons why
this term was chosen during a discussion in the i18n newsgroups, I would
appreciate hearing about them again.

> 2. Making a selection from the menu will override the


> document-specified encoding (if there was one), but there is no
> obvious way to switch back to the document-specified encoding again.
> This leaves the user unable to backtrack on their action -- the
> equivalent of a word processor without an Undo command.
>
> The spec says:
> |
> | An override will not persist if a new document is loaded or if the
> | current document is reloaded by Super-Reloading, which will
> | refresh the document by going to the source page again. If the
> | user wants to undo the overrride that has been committed, then the
> | user can super-reload the document.
>
> This approach is completely unacceptable for three reasons. Firstly,
> super-reloading is to all intents and purposes a hidden feature, so
> it is not at all obvious to the user how to reset the encoding of
> the page. Secondly, current methods for super-reloading are
> practically impossible for disabled users, as I have described
> elsewhere <http://bugzilla.mozilla.org/show_bug.cgi?id=46845>. And
> thirdly, super-reloading will (and ordinary reloading should) will
> clear any form data entered in the page; so using this to reset the
> encoding can result in data loss, as I have also described elsewhere
> <http://bugzilla.mozilla.org/show_bug.cgi?id=17889>.


>

> 3. The `Character Coding' menu item has *three* levels of submenus
> -- e.g. `View' > `Character Coding' > `More' > `SE & SW Asian' >
> `Vietnamese (Windows-1258)'. Menus are the most difficult of all
> common UI controls for users to manipulate, and as the level of
> nesting increases linearly, the level of difficulty increases
> geometrically. In any professionally-designed software app, having
> more than *one* level of submenus is inexcusable.

I am aware of these UI considerations. But shoving these items into the


(MS documentation skipped...)

>

> 4. The division of encodings in the `More' submenu into what Momoi
> describes as `geo-linguistic classifications fairly well-accepted in
> linguistic typology'
> <http://bugzilla.mozilla.org/show_bug.cgi?id=10999> may be elegant
> from a linguistic point of view, but from a user's point of view
> it is just an unnecessary level of complication.
>
> Usually, the situation which requires the use of this UI is that of
> a page written in a language other than English, and designed for a
> non-Latin encoding, being rendered as if it was in a Latin encoding.
> In this case, when I see users use the equivalent submenu in IE,
> they open the submenu and immediately look for their chosen
> *language*. Not the world region in which their chosen language
> originated, or is dominant, or whatever, but the name of the
> language itself.

See my comments above. If enough people dislike sub-grouping of encoding


families, it would be easy to switch to MS style single sub-menu.
However, let me also say that Mozilla covers many more encodings than MS
and our fear was that with that many more items, some sub-groupings will
be helpful.
Remember that we are likely to add some more items to this menu so that
eventually the number might reach 80-90. Is a single sub-menu easier for
advanced users? Or the sub-groups? (I am assuming that average users
will not be using this sub-menu at all or very rarely.) Let me hear your
opinions.


>

> When presented with Mozilla's nested submenu of encoding categories,
> none of which are actual languages, and most of which have
> similarly-shaped names, I suggest that users will not pause (holding
> the submenu open) while they calculate which category their desired
> language fits into; instead, in an attempt to find the item they
> want quickly, they will begin a depth-first traversal of the
> submenus looking for the desired encoding, wasting time in the long
> run.

I think there is some truth in this comment. But let me also suggest


that that hesitation would be for the first time. Given the
sub-categorizations like East Asian, I think the user will quickly get
used to the fact that Chinese, Japanese and Korean belong there. I doubt
very much that they will keep on having this problem. Also, if you are
heavy users of certain encodings, you should put them in the permanent
or static list on the 1st tier via the "Customize..." menu. Again, for
the target users we have in mind, I don't see much difference. A long
menu with 80 or more items will be much worse, I fear.

>

> 5. The use of linguistic region submenus to subcategorize encodings
> requires newly-installed encodings to specify which of Mozilla's
> UI-specific regions they belong to, something which they may do
> incorrectly, or not do at all
> <http://forbes.co.il/dev/bidi-mozilla.gif>.

The example image has to do with Arabic and Hebrew, which hopefully get


a full support in the near future. I think we will be classing them into
Middle Eastern when the support for Bi-di becomes finalized. Thus, this
will not be a user-level decision. The engineers involved will put them
correctly. This is a non-issue and something we can resolve via a bug.

>

> 6. The use of third-level submenus to subcategorize the encodings
> available implies that selections in each of the submenus are
> independent from each other, when they are not.

They are independent from each other. You're simply misunderstanding how
these menu items work.

> This is especially


> confusing since the second-level submenus `Auto-Detect' and `More'
> are *not* independent from each other; if I choose `Auto-Detect' >
> `Auto-Detect (Chinese)' on a document intended to be displayed
> using ISO-8859-1, the Auto-Detection module works correctly and
> uses ISO-8859-1, but this is not indicated in the `More' submenu.

Auto-detection choice is independent of the encoding menu items. This is


why they cannot be under the same sub-group. Once you choose an
auto-detection, that detection module is ON at all times for every page
until auto-detection is tuned OFF. (Maybe we should mark this fact by
changing a bullet to a checkmark for the Auto-detection modules.

> (So both `View' > `Character Coding' > `Auto-Detect' > `Auto-Detect


> (Chinese)' and `View' > `Character Coding' > `Western (ISO-8859-1)'
> are checked at the same time, but `View' > `Character Coding' >
> `More' > `Western' > `Western (ISO-8859-1)' is not also checked.)

Yes. That is true phenomenally. Ideally we would like the items in the


sub-menu to disappear when it moves to the top tier because it now is in
the cache. We haven't had time to implement it yet. That should solve
the confusion. I should say this now again so that there is no
misunderstanding of the current spec:

A. Auto-detect modules are always ON if chosen and always participate in
determining the document coding if HTTP charset or document-based
charset is no available.
B. The bullet mark you see under "`View' > `Character Coding'" does NOT
indicate what you choose as the default encoding. (The default encoding
is set in the Preferences.). The bullet show the feedback from Mozilla
indicating what Mozilla considers the current encoding of the document
is. That is all that the bullet means -- what Mozilla has determined
its current encoding based on HTTP charset, document-based charset,
Auto-detection, or Pref-based default encoding.

The bullet mark is a feedback mechanism -- not the default choice by the
user.

Thus choosing another encoding item is an instruction to Mozilla to
reload that document assuming that it is in the chosen encoding. This
manual action will override HTTP, document-based charset info, or the
auto-detection result. Otherwise (i.e. when none of the 3 obtains), it
will simply reload the document under a new charset.

>

> In any group of radio buttons, one item must be selected at all
> times. Similarly, in any radio group of menu items, one item must be
> checked at all times. If that is not possible because there are so
> many options that they need to be split over several submenus, then
> you shouldn't be using menus to present the options at all.

Again, there is a bug. We should temporarily remove the item under a


sub-menu if that item gets the cached status.
>

> 7. The submenus make a distinction between individual encodings and
> auto-detection modules. From the user's point of view such a
> distinction is completely irrelevant -- they don't care how clever
> the programmers were in writing code to distinguish between multiple
> encodings, they just want their pages to look right. If there is a
> choice available in the `Text Encoding' menu which is able to work
> out which of the variety of Japanese encodings is being used on a
> particular Japanese Web page, that's great, but the user shouldn't
> have to go hunting in the `Auto-Detect' submenu for such an item,
> otherwise she won't use it at all. Auto-detect modules should be
> available right next to the items for the individual encodings which
> they cover.

First, Auto-detection is not an encoding and should not be mixed with


encodings. This becomes even clearer with more discussion below.
The user does not have to hunt for auto-detection menu usually. Each
localizer will have set that item appropriately for that locale. Thus
the default option for auto-detection is ON if it is available for the
user's locale/language. What you are not taking into account is that we
cannot generalize from the behavior of Mozilla users who are mostly
using English UI for their languages. These are minority of people --
developers, testers, Mozilla enthusiasts. Average users will get the UI
in their language and the default encoding and auto-detection set
properly for that locale. It is the localizer's job to set them
correctly in the first place. Mozilla is designed with these defaults so
that average users will not have to worry about issues you're talking
about.

>

> 8. The five most recently-used encodings are selectable from the
> first-level submenu. However, it is also possible (using the
> misleadingly-named `Customize Character Coding' dialog) to specify
> encodings which will always appear in the first-level submenu,
> regardless of whether they have been recently used or not. These
> encodings are mixed with the recently-used encodings in the
> first-level submenu, without any visual distinction being made
> between them.

This is how IE menu is designed also. IE has 1 or more static menu items


which are there at all times plus the cached items without any
separator. If enough people want it, we can probably put in a separator.

>

> As I said in bug 10999, this seems about as redundant as allowing
> the user to specify which files will appear in their `Recent Files'
> submenu -- it is only likely to be of any use to a tiny minority of
> users who regularly apply more than five different encodings (which
> would mean they would probably need to know at least three
> languages). For the vast majority of users, their list of
> recently-used encodings will be so good as an approximation of the
> list of encodings which they want to switch to quickly, that having
> an extra dialog for this is just not worth the confusion and bloat
> generated by its existence.
>
> I suggest that this feature be ditched. It is only present because
> the nested submenus are unnecessarily difficult to use in the first
> place, and I believe it will cause more confusion than it allays.

That is not so. The Character Coding Customize is there because for


advanced users, there is no way to predict which sets of encodings they
would like to have there permanently. We want that to be flexible and be
left up to the users who need it. A vast majority of people will get
their defaults set by localizers. Note also that we can have the static
menu even when the cache is lost. It is also wrong to assume that some
users will not need more than 5 items. If you want to do Chinese and
Japanese -- then you need a minimum of 5 and plus Western which is
needed for just about everyone. That is 6. Some languages like Russian
may have more than 5 available encodings in use. Yes, to some extent
this is an overkill but I believe it is better to provide this
flexibility to advanced users.

>

> 9. When the user then visits an uncached page with a different
> document-specified encoding from that which was present in the page
> where the encoding was last manually specified, Mozilla appears to
> continue using the user-specified encoding. (Internet Explorer
> appears to do this as well.) This seems rather unintelligent --
> since if you go to a page which has a different document-specified
> encoding, it is highly likely you have gone to a different Web site,
> so Mozilla should return to its default assumption that the Web page
> is able to specify its own encoding correctly.

I think you're misunderstanding how this works in Mozilla currently.

If you go to a new uncached page, it will go through the following
determination hierarchy process again:

1) Honor HTTP charset if it exists
2) Honor document-based charset if it exists
3) Try auto-detection if one is chosen
4) If 1-3 all fail, then display the page with the default set in Prefs
| Navigator | Languages | Character Coding.

These catch a huge number of cases already unless the pages are
mislabeled with 1) or 2). Many hours of testing indicate this to be
true. We don't pay attention to manually specified encoding for the last
page. There is one exception to this rule and that is when the user
opens a new navigator page from an existing page. In this case, it would
be best to inherit whatever is on that previous page.

>

> Therefore, I would suggest that the most reliable behavior would be
> to revert to using the document-specified encoding when the user
> firsts visits a page which has a different document-specified
> encoding from that which was present when the user last overrode the
> encoding.

If there is document-specified encoding and Mozilla is not honoring it,


it is simply a bug and not a spec. Please file a bug and provide a
specific instance.

> (In most cases, this should not affect the encoding used


> to show documents navigated to via the Back button, because the
> encoding used should be stored in the cache along with each cached
> document.)
>
> As a solution to these problems (except the last one above, which
> requires back-end work), I suggest that the current set of nested
> submenus be replaced by a single submenu and a dialog.
>

Since your analysis is so fraught with problems and misunderstanding, I


suggest that we don't adopt these proposals.

> The submenu would look like this:


> _________________________________
> Te_xt Encoding > |/ _Automatic |
> |---------------------------------|
> |* Western (ISO-8859-1) |
> | Central European (ISO-8859-2) |
> | Chinese (Simplified) |
> | Japanese (Auto-Detect) |
> | _Other ... |
> |---------------------------------|
> |* _Left to Right |
> | _Right to Left |
> `"""""""""""""""""""""""""""""""""'
>
> `Automatic' would specify whether or not the document-specified encoding
> was being used.

Remember that the bullet indicates what the current document encoding


is, NOT what you have chosen as the default. So this menu is incoherent.
A bullet mark as a feedback should be placed only to a true encoding,
not to an auto-detection module. Even IE does NOT do that. If you look
closely at IE behavior, you will notice that even if you choose an
auto-detection module like Japanese, when you go to a page, the bullet
is placed next to one of the 3 Japanese encodings. This is very
confusing! That is because they don't distinguish "auto-detect" module
which is ALWAYS ON unless you choose something else from real encodings.
That is why even when you choose an auto-detection module, you will
never see that module bulleted again IE. This IE-style UI should not be
imitated. It also makes no sense given what Mozilla backend does.

We improved on this confusing UI by separating auto-detection (always ON
if chosen) from true encodings. A bullet means only one thing under
Mozilla, i.e. the current encoding of document as determined by Mozilla.
One improvement would be to make auto-detection choice like a radio menu
or mark with a check mark to separate it cognitively from a bullet.

Changing the menu means only one thing: Reload the document assuming
that it is in the chosen encoding.

> The four items following that would be the four most


> recently-used encodings *or* auto-detection modules. These four items
> would be listed in alphabetical order (rather than order of most use or
> order of most recent use), with the exception of ISO-8859-1, to maximize
> the stability of the arrangement of the submenu.

I think these are too few for Japanese, Chinese or Cyrillic users.


Japanese uses at minimum 3 encodings plus Western. If you add Chinese to
the mix which some Japanese users will do, for example, the number jumps
to 6. For Russian, Chinese and other language users (advanced),
dictating the permanent set by way of cache "indirectly" is a big
mistake. If there is one thing we learned over the years, language needs
cannot be easily categorized for a sizable number of international users
-- though they may be in the minority. There are many languages which
use both ISO and Windows type encoding and possibly Mac type as well. If
you mix 3 of these languages, the number may have to easily exceed 5 or
6. In my opinion, it is a mistake to limit the number in the way you
suggest via cache. That is why we offered the Customize menu and I still
think it is a good idea for advanced users we serve.

>

How do you distinguish auto-detection which is always ON from an

Katsuhiko Momoi

unread,
Dec 16, 2000, 7:37:43 PM12/16/00
to mozill...@mozilla.org, mozil...@mozilla.org, Xslf, "\"\\"Matthew Thomas\\"\""
Katsuhiko Momoi wrote:

>
> Xslf wrote:
>
> >
> >
> > About the "revert to assuming the site has the correct encoding specified"
> > behavior- I would not want mozilla to behave that way, just for the sheer
> > number of (in my case Hebrew) web sites which have incorrect encoding
> > specified.
> >
> > I am afraid that if such a behavior is implemented, I would spend half my
> > surfing time changing encoding manually.
>
> I agree with this assessment but there also other complications. I will touch
> on this problem later when I comment on the entire proposal by Matthew.

I wrote my reply to Matthew but did not find a place in there to address
this issue. So, let me address it in this message. Our main objective in
implementing charset override was to cover sporadic cases of HTTP or
document charset failure. This is the main reason why it is one-time
only.

>> I think that the behavior that assumes the default from the previous site
>> makes sense, since most users I know spend most of their time using just one
>> or two languages, so most pages are in the encoding of the previous page.

This may benefit users whose pages often have wrong charset labels but
will not benefit users whose pages are incorrectly marked only some of
the time. If pages are marked wrong, let's get them corrected by the web
site designers. In the meantime, we might consider an option which lets
users override all HTTP or document-charset all the time -- until that
option is de-selected. That would probably serve your case better than
assuming the charset of the last visited page. I will try to file a bug
on it soon.


>
> - Kat

Andy

unread,
Dec 16, 2000, 10:05:31 PM12/16/00
to Katsuhiko Momoi
Katsuhiko Momoi wrote:

Um, Arial is a font, Unicode is a character set, UTF-8 is a character encoding.

e.g.
A web page may specify UFT-8 or UFT-7 as its encoding, both of these map
to the Unicode character set. The operating system will use one or more
fonts to render the glyphs representing the characters on screen or to
the printer.

If the user has a font which specifies every character in the Unicdoe
set that can be used, otherwise glyphs from several fonts might have to
be combined, if the text contained a mix of lanugues.


> > In modern operating systems, all
> > encodings are usually displayed using various parts of the Unicode
> > character; but the character sets used to display text in particular
> > encodings may vary from platform to platform.
>
> What is the "Unicode character" you mention here?
> Also what do you really mean by "but the character sets used to display
> text in particular encodings may vary from platform to platform."
>
> These may be nit picky points but this paragraph makes no sense.

Of course, he meant Unicode Character"s" plural. A little confusing
maybe, but the spirit is clear. If one has a piece of UTF-8 encoded text
containing both Cyrillic and Western characters the operating system may
have to convert those characters into 2 different native encodings and
use 2 fonts to render them.

e.g. A Mac might have 2 seperate fonts capable of displaying Western and
Cyrillic characters between them, but no one single font capable of
displaying both. Therefore the UFT8 encoded unicode characters must be
converted to MacRoman and MacCyrillic character codes (encodings) before
they can be displayed in those fonts.

Contrast a Windows machine with Microsoft's Arial (Unicode) font - it
may be able to simply render the full range of UTF8 encoded data without
converting any of the character codes?

All a bit irrelevant as it happens below the level of Mozilla...



> >
> > For example, while modern OSes will usually use the Unicode
> > character set to display text in the ISO-8859-1 encoding,
>
> Again, what does this mean? Unicode is a character encoding scheme for
> character and text.
>
> > older computers often use some strain of the ANSI character set, and IBM
> > mainframes used the EBCDIC character set, to display that same
> > encoding.
>
> This is not correct. ISO-8859-1 is both an encoding method and character
> set but the so-called ANSI and the EBCDIC are 2 of the many ASCII coding
> methods for the ASCII set. ISO-8859-1 is a superset of ASCII and thus
> this paragraph makes little sense.

Yup. He's got the details wrong, but since you demonstrate that you
understand the difference between encoding (ANSI) character set (ASCII)
here, you actually reinforce why the above paragraph makes sense, in spirit.

> >
> > The problems with Mozilla's UI for encoding selection are as follows.
> >
> > 1. The title of the submenu is `Character Coding'. Many users may not
> > know what a `character' is. They are far more likely to know what
> > `text' is, so I suggest that the submenu be called `Text Encoding'
> > instead.
> >
>
> This menu under Communicator was named "Character Set". Users may have
> difficulty with "Set" part but I doubt that they had much difficulty
> with the "Character" part. We thought it would be easier for users to
> use a familiar term "Character". One reason we did not use "encoding" is
> that items that come under this menu are not all classifiable as
> "encoding". For example, auto-detection modules are not encodings. They
> are tools to detect an encoding of the document. (It is essential to
> keep them apart as you will see below.) On the other hand, "Coding" is a
> neutral term free of the meaning associated with "encoding". This is one
> main reason why I and others suggested this term. If readers of an
> earlier thread of discussion on this topic can offer other reasons why
> this term was chosen during a discussion in the i18n newsgroups, I would
> appreciate hearing about them again.


Character is not a common word outside of computing circles. Ordinary
people say "letter". Auto detection modules are no more Character Sets
than they are "encodings", so to cite that as a reason to choose "set"
seems odd to me. I do see your point about "encoding" being worth
avoiding though. Text is a better choice, if one can find a word to pair
it with "Text Coding" doesn't really mean anything, "Character Coding"
has the words Character and Coding in it, neither of which mean anything
to a lay person. I'm almost tempted to say go back to Character Set -
its no worse than any of the others, and its at leasts its consistent
with 4.X and IE.


> > 2. Making a selection from the menu will override the
> > document-specified encoding (if there was one), but there is no
> > obvious way to switch back to the document-specified encoding again.
> > This leaves the user unable to backtrack on their action -- the
> > equivalent of a word processor without an Undo command.

[snip]

> incorrect display? If your 1st attempt at correcting does not give you
> the right result, then you try another menu item to correct it. Let's
> say that it is finally displaying correctly after 2 tries. Why would we
> want to go back to incorrect display?

Fear. I sometimes go to pages which come out garbled. Now, I don't have
every possible font installed, so is it garbled because my operating
system can't render it, or has Mozilla picked the wrong encoding? So I
try the other likely encodings, lets say I think its Japanese. Nothing
works :-( Probably my fonts then.

Ok, so what's my goal now? I want to undo everything I just did and go
back to whatever Mozilla was doing before. Maybe I'm dimily aware
Mozilla does auto detection (I certainly understand it better after
these 2 messages than I ever did from looking at the current UI). I'm
now worried that by choosing one of the encodings I've somehow turned it
off autodetection permenantly, and every Japanese page I visit will be
wrong, because Mozilla will keep on using my choice. So I want a reset switch.

Please be aware - this is not a rhetorical example. Maybe I'm not
typical, but when I read what Matthew said I though "yeah - that's
exactly what worries me about all this".

Before you answer that Mozilla doesn't permenantly alter the encoding
used because of the action I took, stop and think: what clues does the
current UI give that this is the case?

Bingo. You hit the nail on the head here, and its more clear to me what
the problems currently are.

Neither of the 2 solutions proposed:

* more submenus

or

* one long menu with seperators

is adequate. Both completely ignore menu guidelines, which talk about
avoiding submenus and keeping menus *short* maybe 8-10 choices at most.

Now guidelines are meant to be broken where absolutely necessary, but
this isn't one of those times.

You said it yourself - only advanced users will be interested in the
full breadth of the functionality. They probably don't use it all that
often (espcially given your caching the most recently used encodings).

To repeat a bit of what you said:

>Putting them in the
> dialog only makes the access harder for the advanced users who might
> want to seem them in the menu.

Why might advanced users "want" to see them in a menu - you seem
convinced a menu is the right solution. Surely people want/need to most
easy to use and understand solution.

Think what a well designed dialog might achieve (if you really don't
like Matthew's design, then forget it from the moment and let your
imagination go).

The reason you're having problems is because you're abusing menus in
several ways:

They're supposed to be short sequences of commands. Actions. Things the
user can *do*.

Changing the character set/encoding for this document is indeed an
action, but you've stuffed a whole bunch of state into the menus too -
they represent not only actions the user can take, but a representation
of the actions a user *and* Mozilla has taken. They're also too long or
too deep by every respected set of guidelines.

With a well designed dialog you could

* tell the user the action they're about to take is temporary - that
they're pickign the encoding for the current page and to alter the
defaults they should access the preferences. I had *NO* idea that was
the case until today - because there's simply nowhere to explain this in
a menu.

* use the full gamut of controls (checkboxes, radiobuttons, list views)
along with appropriate spacing, dimming highlighting and layout to show
how the auto detect and character set options interact. You could group
them or seperate them in a way which truly indicates their relationship
or lack thereof.

I've never understood how autodetection interacts which what the site is
specifying and what I choose in the menu until you explained it today -
and this must be at least in part because the current UI - a menu - has
no way of communicating this information to me, because its layout and
available controls (tick marks + check boxes) are so limited.

Why compress my options into layers of nested submenus with tiny gaps I
must traverse and limited explanatory labels or suplementary
information? Use my screen, show me a dialog I can understand.

Absolutely, keep the current cache of most recently used items, and keep
the most often used items in the menu.

I have the greatest respect for work your team has put it, and the fact
you have far more resources to do testing. But I'm one very confused
user... You can have the best implementation in the world - I can see
you've put lots of thought into it - but the current menu is not
communicating how it works to me. If I don't understand it, and we don't
assume I'm more stupid than most people :-) its value is limited.

>
> >
> > When presented with Mozilla's nested submenu of encoding categories,
> > none of which are actual languages, and most of which have
> > similarly-shaped names, I suggest that users will not pause (holding
> > the submenu open) while they calculate which category their desired
> > language fits into; instead, in an attempt to find the item they
> > want quickly, they will begin a depth-first traversal of the
> > submenus looking for the desired encoding, wasting time in the long
> > run.
>
> I think there is some truth in this comment. But let me also suggest
> that that hesitation would be for the first time. Given the
> sub-categorizations like East Asian, I think the user will quickly get
> used to the fact that Chinese, Japanese and Korean belong there. I doubt
> very much that they will keep on having this problem. Also, if you are
> heavy users of certain encodings, you should put them in the permanent
> or static list on the 1st tier via the "Customize..." menu. Again, for
> the target users we have in mind, I don't see much difference. A long
> menu with 80 or more items will be much worse, I fear.

Absoutlely, 80 items would be horrible. Please never even consider a
menu that long!
Matthew's point is more that the current options look very very similar,
wherever they occur: Japanese (XXXXXX) Japanese (XXXXXX) Japanese
(XXXXXXX) is what you'll see if you glance - after all people know what
Japenese is, but Shift-JIS is a technical term. I think most people
will be forced to learn the positions. Not sure there's a good solution
- but please see the comment I make about positioning below:



>
> Auto-detection choice is independent of the encoding menu items. This is
> why they cannot be under the same sub-group. Once you choose an
> auto-detection, that detection module is ON at all times for every page
> until auto-detection is tuned OFF. (Maybe we should mark this fact by
> changing a bullet to a checkmark for the Auto-detection modules.


Something needs to be done, sure. If what you say is true, why can't I
have more than one auto detection module active at the same time? Radio
button menu items seems to imply they are exclusive, yet a check mark
would imply a toggle, and that I could enable more than one. Again, this
would be so much cleaner in a dialog - where a radio button is a
radiobutton and a checkbox is a checkbox. Every OS implements radio and
checkbox menu items differently... and often not consistently. The only
way I've found to work out what is going on is to try selecting one of
the items then re-opening the menu to see if my change has behaved liked
a radio button or a check box. And if the menu is several levels deep,
this is really frustrating. A menu is the wrong tool!



> > (So both `View' > `Character Coding' > `Auto-Detect' > `Auto-Detect
> > (Chinese)' and `View' > `Character Coding' > `Western (ISO-8859-1)'
> > are checked at the same time, but `View' > `Character Coding' >
> > `More' > `Western' > `Western (ISO-8859-1)' is not also checked.)
>
> Yes. That is true phenomenally. Ideally we would like the items in the
> sub-menu to disappear when it moves to the top tier because it now is in
> the cache. We haven't had time to implement it yet. That should solve
> the confusion. I should say this now again so that there is no
> misunderstanding of the current spec:

No. Please don't do that. That means what's in each of the menus will
change each time I go to them. This is counter to the basic tenant of
muscle memory - that people remember where what they want it by
remembering its position. Real world objects don't move and disappear by
themselves, neither should computer controls, they should stay where
they were. Remember my point about all of the Japanese encoding names
looking the same? Most people will eventually stop reading the labels
and start selecting "the second one up from the bottom".

If you move things around (remove them) then you're destroying people's
ability to learn by position. Hence Matthew's comment about sorting
things alphabetically.


> B. The bullet mark you see under "`View' > `Character Coding'" does NOT
> indicate what you choose as the default encoding. (The default encoding
> is set in the Preferences.). The bullet show the feedback from Mozilla
> indicating what Mozilla considers the current encoding of the document
> is. That is all that the bullet means -- what Mozilla has determined
> its current encoding based on HTTP charset, document-based charset,
> Auto-detection, or Pref-based default encoding.
>
> The bullet mark is a feedback mechanism -- not the default choice by the
> user.

But I never understood this. One moment the checkmark is Mozilla's best
guess. The next its a record of what I did. How can I tell the
difference? I have to remember whether I changed it for this page?
Again, a menu is not expressive enough. How do I know the change I make
will not effect my permenant preferences? A menu can't teach me this.


>
> Remember that the bullet indicates what the current document encoding
> is, NOT what you have chosen as the default. So this menu is incoherent.
> A bullet mark as a feedback should be placed only to a true encoding,
> not to an auto-detection module. Even IE does NOT do that. If you look
> closely at IE behavior, you will notice that even if you choose an
> auto-detection module like Japanese, when you go to a page, the bullet
> is placed next to one of the 3 Japanese encodings. This is very
> confusing! That is because they don't distinguish "auto-detect" module
> which is ALWAYS ON unless you choose something else from real encodings.
> That is why even when you choose an auto-detection module, you will
> never see that module bulleted again IE. This IE-style UI should not be
> imitated. It also makes no sense given what Mozilla backend does.

> We improved on this confusing UI by separating auto-detection (always ON
> if chosen) from true encodings. A bullet means only one thing under
> Mozilla, i.e. the current encoding of document as determined by Mozilla.
> One improvement would be to make auto-detection choice like a radio menu
> or mark with a check mark to separate it cognitively from a bullet.

I'm totally understand what you've done and why from your explanation.
It does make sense. But I never grasped that from the UI itself. :(


> How do you distinguish auto-detection which is always ON from an
> encoding which only indicates what the current document is in? This
> mixing is incoherent and will not make sense given what our backend
> does.
>
> What you have not taken into account at all is that Mozilla already has
> default settings via the Pref menu. You're confusing the default setting
> from a list of encodings which can be potentially used to mark the
> current encoding.


Yup, I see your point. But again, I dunno how I was ever supposed to
understand this. I'm almost tempted to say you shoud do away with the
whole idea and simply have the menu (or dialog) as discussed, set a
permenant preference. When I was talking about what I feared above, this
was because I'd always assumed this menu was a permenant thing, because
there's no indication that it isn't. View->Apply Theme is permenant, is
it not? How am I to know View->Charcter Coding is temporary?

Assuming such a radical step is out of the question, I've already stated
what I believe - that using a dialog would be clearer, and from a dialog
I might have a chance of understanding what is going on - both because
you could use a richer set of controls to show me what Mozilla and I
have done, but also because you can simply include a label which makes
it clear the changes aren't permenant.

>
> Frankly, in my opinion putting menu items into a dialog will
> disadvantage the target advanced users we have in mind.

I disagree with this:

* Advanced users would still benefit from a more easy to understand
dialog. They still suffer from menus which are too long or too deeply
tested. Just because they're advanced and they can manipulate a mouse as
an able bodied person can, doesn't mean they *should have to* navigate
difficult to understand structures.

* As stated, a dialog can communicate information about how the backend
works. A menu can't, in my opinion.

* You would still keep the most recently used cache, and the localiser's
would still make their choices, so common operationg would still be fast.



> My suggesrions are then as follows:
>
> 1. The suggested proposal should be rejected.
> 2. However, Matthew's article points to some problems. Let's solve them
> via bugs. I believe they are fixable.
> 3. Let's try to improve the default settings -- a proposal to do this
> from the preferred language value will be forthcoming. This is where our
> focus should be, not on features which target small set of advanced
> users.
> 4. If you are concerned about Internet Cafe users, just increase the
> number of permanent menu items via the Customize menu to match your
> customer base so that all the encodings your customers need will be
> visible on the 1st tier. Some Internet Cafe may cater to mostly Chinese,
> Japanese, and Korean users. In that case, customize the menu to place 7
> items needed for these 3 languages, etc. Customize dialog offers this
> type of flexibility to un-predictable language prference environments.

All very good points.


> Finally, I would like to stress that testing and making adjustments to
> have both the UI and backend work well for many differnt types of users
> and languages is not a trivial task. Netscape internal QAs and other
> volunteers have spent many many hours to resolve problems and
> inconsistencies in the current implementation. In my opinion, any
> overhaul of this menu requires serious long-term commitment from the
> same set of people for a large number of test cases and testing hours.
> Please believe me when I say that we need consistent and committed
> support for testing. For this reason, too, I am not inclined to agree
> with changes which do not benefit the main target users (i.e. average
> users) in a big way. If it is an incremental bug fix within the current
> franmework, let's do it one by one. And let's try to do the dialog to
> set all the lang prefs in one action. (This of course means that we need
> to come up with the set of defaults for all the languages we support --
> not a trivial task but can be done eventually.)
>

Absolutely. We're all just talking here, and it costs time and money to
do real testing. I get passionate, but I never claim to be smarter - its
all just "in my opinion".

I guess, I'm saying that I believe the effort of the team would be best
directed into thinking of a solution which doesn't rely solely on menus,
because I personally believe that menus are not sufficiently expressive
to explain how the backend works to a user. The current schemes adopted
by Mozilla (and IE) have left myself, and it seems Mathew also,
confused. We're both savvy and experienced with computers but we never
"got it". We've made assumptions about how it works that are
fundamentally flawed. This points to the possibility that many people
will be equally confused.

Personally, I think your basic focus on localisation folks customising
stuff and making the common case fast are clearly the right thing to do,
and I commend you on it. However I also believe you should seriously
consider revisiting the advanced user part of the UI and think beyond
trying to do it all in menus.

Thanks for your time.

AndyT

Katsuhiko Momoi

unread,
Dec 17, 2000, 5:12:06 AM12/17/00
to m...@mailandnews.com, mozill...@mozilla.org, mozil...@mozilla.org, mozill...@mozilla.org, b...@netscape.com
With apology to everyone who read my posting on this topic earlier, let me make
one correction regarding the algorithm used to determine the current encoding of a
document. I don't believe that this will change the points I was trying to make in
the original post but nonetheless I feel it is important for us to base our
discussions on accurate information.

Katsuhiko Momoi wrote:

>
> > 9. When the user then visits an uncached page with a different
> > document-specified encoding from that which was present in the page
> > where the encoding was last manually specified, Mozilla appears to
> > continue using the user-specified encoding. (Internet Explorer
> > appears to do this as well.) This seems rather unintelligent --
> > since if you go to a page which has a different document-specified
> > encoding, it is highly likely you have gone to a different Web site,
> > so Mozilla should return to its default assumption that the Web page
> > is able to specify its own encoding correctly.
>
> I think you're misunderstanding how this works in Mozilla currently.
> If you go to a new uncached page, it will go through the following
> determination hierarchy process again:
>
> 1) Honor HTTP charset if it exists
> 2) Honor document-based charset if it exists
> 3) Try auto-detection if one is chosen
> 4) If 1-3 all fail, then display the page with the default set in Prefs
> | Navigator | Languages | Character Coding.

I have apparently missed the last change we made in this regard. The
default/fallback encoding is the last in the hierarchy of determining the current
charset but there is another factor between 3) and 4) above. This is actually in
the published document at:

http://www.mozilla.org/projects/intl/uidocs/browsercharmenu.html#definitions

So the correct determining factors in descending order of application are:

1) Honor HTTP charset if it exists
2) Honor document-based charset if it exists
3) Try auto-detection if one is chosen

4) Last current encoding if available.
5) If 1-4 all fail, e.g. when you start Mozilla to a web page with no document
based charset info, then display the page with the default set in Prefs |
Navigator | Languages | Character Coding.

4) was added because it was thought that if you go to a web site which does not
offer charset info on one page, then it is likely that it won't offer charset on
other pages and so that 4) would make navigation on such web site easier. This
means that the pref default setting will be used when the user starts using
Mozilla for the first time to set the current encoding on a page which has failed
steps 1 - 3 above. Thereafter, the last current encoding (often the same as the
pref default) will be used prior to the pref default value.

Every new (non-cached) page will go through this determination process. Thus the
last current encoding will not determine the encoding unless 1-3 do not apply.

Currently there might be a bug in 4) above such that the last current encoding
value is remembered from session to session. This in effect makes it the default
value. It would be better to clear out this value at the end of a session so that
the new session will begin with the user's default encoding (5 above) again in
case the 2 values are different.

By the way, the value in 5) is also used for initializing a new HTML Editor
document. Since web page default and editor document defaults are the same for all
the languages we know of, the pref default is useful in this regard also.

- Kat


Matthew Thomas

unread,
Dec 17, 2000, 10:36:02 AM12/17/00
to mozill...@mozilla.org, mozil...@mozilla.org, mozill...@mozilla.org, b...@netscape.com
Katsuhiko Momoi wrote:
>...
> Matthew Thomas wrote:
>...

> > The user agent (e.g. the Web browser) uses the
> > document-specified encoding information to decide which
> > *character set* to use to display the text of the page.
>
> You mean which "font" that corresponds to the encoding used?

Not necessarily -- though multiple fonts are one (rather kludgy) way of
providing characters in multiple character sets. On other systems, there
may be a font available which provides characters from all the character
sets required.

> > In modern operating systems, all
> > encodings are usually displayed using various parts of the
> > Unicode character; but the character sets used to display text
> > in particular encodings may vary from platform to platform.
>
> What is the "Unicode character" you mention here?

I'm sorry, that was a typo. I meant the Unicode character *set*.

> Also what do you really mean by "but the character sets used to
> display text in particular encodings may vary from platform to
> platform."

The Web author (and Web server) does not (and should not) care which
character set, or font, or whatever, is used by a user agent to render
characters in a particular encoding -- just as long as the user agent
does its job in finding the appropriate character. This preserves
platform-independence; the user does not have to use the same mechanisms
as the author in order to see the same characters.

Let's take a rather far-fetched example. In the Unicode character set
there is (for reasons best known to the Unicode Consortium) a section
which consists of all the characters in the Zapf Dingbats font
<http://myfonts.com/CharacterMap1106.html>. When the author specifies
these characters, his authoring tool may have a copy of the Zapf
Dingbats font on hand with which to display them.

But when someone else views this person's page, her user agent does not
have to do the same thing. It might use the Zapf Dingbats font, or it
might use a mapping of characters from the various Wingdings fonts to
Zapf Dingbats to produce close approximations to the appropriate
characters, or the OS might be lucky enough to have fonts (such as
Lucida Sans Unicode) which cover that section of the Unicode character
set already <http://www.hclrss.demon.co.uk/unicode/dingbats.html>.

>...


> > For example, while modern OSes will usually use the Unicode
> > character set to display text in the ISO-8859-1 encoding,
>
> Again, what does this mean? Unicode is a character encoding scheme for
> character and text.

Where fonts are available which cover the entire Unicode character set,
those can be used to display everything which is a subset of Unicode
(such as ISO-8859-1). But where this is not the case, the user agent may
have to assemble characters from various places in order to show
characters in the desired encoding.

> > older computers often use some strain of the ANSI character set,
> > and IBM mainframes used the EBCDIC character set, to display
> > that same encoding.
>
> This is not correct. ISO-8859-1 is both an encoding method and
> character set but the so-called ANSI and the EBCDIC are 2 of the many
> ASCII coding methods for the ASCII set.

No, they are not.

The ANSI character set is a superset of the ASCII character set,
containing twice as many characters as ASCII does (256 as opposed to
128, though some of those are non-printing characters).

And the EBCDIC character set is not a `coding method' for ASCII, but a
completely different character set. The fact that it contains most of
the same characters as ASCII is purely accidental
<http://www.natural-innovations.com/boo/asciiebcdic.html>. (ASCII does
not contain the `logical not' character, for example, and EBCDIC does
not contain the tilde character.)

> ISO-8859-1 is a superset of
> ASCII

Right. So if you read a Web page sent in ISO-8859-1 on an old computer
which used either ASCII or ECBDIC to display it, you'd end up with
problems (in the form of black rectangles, or question marks, or missing
symbols, or whatever) whenever a character in the second half of the
ISO-8859-1 encoding was encountered. But since these characters are
rarely used in Web pages (when compared to the first half of the
ISO-8859-1 set, that is), it would work ok most of the time.

Similarly, if you read a Web page sent in the ISO-8859-1 encoding with a
user agent which uses a character set slightly different from
ISO-8859-1 as if it *was* ISO-8859-1, or if the author uses a character
set slightly different from ISO-8859-1 and then sends the page as
ISO-8859-1, you would have a few problems with things not appearing as
they should. The most common examples of this are curly quotes and em
dashes, appearing as question marks or rectangular boxes on systems
which use a different character set for the ISO-8859-1 encoding from
that used by the author.

>...


> > However, sometimes Web servers get it wrong -- they send the
> > wrong encoding information with a Web page, or no encoding
> > information at all. In this case the user agent has to choose an
> > encoding itself in order to display the page. It typically does
> > this through a combination of two approaches: firstly by trying
> > to guess the encoding using an `auto-detection module' (which
> > works by looking for particular patterns of bytes which are
> > common in particular encodings), and secondly by assuming that
> > the page has used a particular encoding which is specified by
> > the user.
>
> This is not what Mozilla does. Auto-detection and/or user-specified
> default encoding is used as a fallback only if the server or document

^^^^^^^


> offers no encoding information. We don't use "encoding which is
> specified by the user" except indireclty in the case of inheritance as
> described below.

I wasn't referring only to defaults. If the server sends incorrect (as
opposed to no) encoding information, then Mozilla must provide means for
the user to specify the correct encoding.

>...


> I said this in the above bug but the experience of users at Internet
> Cafe should not be the driving force for this part of the UI spec.
> Internet cafes need to serve users of different language background
> with one program with one profile. The reason IE seems easier to use
> has little to do with the menu UI -- rather it has to do with the
> existence of universal auto-detection module they use and the fact
> that a sizable number of web sites send HTTP charset or document-based
> charset info. Netscape 6 has a *proprietary* universal auto-detector
> in it. It needs to be improved but when this is ON, it may help such
> customers by making it unnecessary to use this menu. This is largely
> how IE feels better in this regard, i.e. users don't have to use this
> menu much. That should be our goal, too. Note that there is a lot of
> unknown about universal auto-detection. Some studies have shown than
> IE universal auto-detection has less than 50% success rate when
> matched against pages from diverse languages and web sites. Mozilla
> developers might want to offer their own version of universal
> detector.

Sure, but all of that discussion of defaults and auto-detectors is
almost completely irrelevant to this discussion. I have no idea how
often Internet Explorer guesses the correct encoding, because when it
does the customers don't ask me for help.

If it was unnecessary to have the encoding submenu because the
auto-detection was 100-percent reliable, we could just get rid of the
submenu altogether. But since we live in a world where Web servers are
misconfigured often enough that we need to be able to specify the
encoding ourselves, that UI does have to exist; and since it does, it
should be as easy to use as possible.

If anything, the improvement of defaults and auto-detection makes it
*more* important, not less, that the encoding selection UI is easy to
use. If the UI will be accessed rarely, users will not have so much of a
chance to practise using it and to learn how it works. So it needs to be
easy to use, even at the expense of some speed in using it -- speed
doesn't matter so much if you're not going to use the UI very often.

> Our main focus should be taking care of the scenario where a single
> monolingual user uses Mozilla on a personal machine.

| Many developers of Windows-based applications unintentionally limit
| their potential audience by overlooking user scenarios that vary from
| their own. However, it often takes only a little extra effort to
| support the widest range of users. Time invested here can pay
| dividends in the form of increasing your application's potential
| market or meeting governmental requirements. It also provides
| consistency of operation for all your users.
|
<http://msdn.microsoft.com/library/books/winguide/ch01e.htm>

>...


> > The problems with Mozilla's UI for encoding selection are as
> > follows.
> >
> > 1. The title of the submenu is `Character Coding'. Many users may
> > not know what a `character' is. They are far more likely to know
> > what `text' is, so I suggest that the submenu be called `Text
> > Encoding' instead.

>...


> This menu under Communicator was named "Character Set". Users may have
> difficulty with "Set" part but I doubt that they had much difficulty
> with the "Character" part. We thought it would be easier for users to
> use a familiar term "Character".

As Andy said, the word `character' is *not* a familiar term -- it
appears very rarely (with the same meaning) outside the world of
computers. The word `text' is much more common.

> One reason we did not use "encoding"
> is that items that come under this menu are not all classifiable as
> "encoding". For example, auto-detection modules are not encodings.
> They are tools to detect an encoding of the document.

That's not a reason to use `coding' rather than `encoding'. An
auto-detection module is not a `coding' any more than it is an `encoding'.

>...


> > 2. Making a selection from the menu will override the
> > document-specified encoding (if there was one), but there is no
> > obvious way to switch back to the document-specified encoding
> > again. This leaves the user unable to backtrack on their action
> > -- the equivalent of a word processor without an Undo command.

>...


> Overriding a document-based charset is needed only
> in those cases where the charset info provided is wrong and the page
> is displaying incorrectly. In such a case, Mozilla currently offers a
> single override. You just corrected a wrong display to the right one
> with the menu action of choosing another encoding. Why do you want to
> go back to the incorrect display?

Any number of reasons.

* You have no idea what this `Character Coding' submenu system is for --
but it sounds exciting, like FBI profiling or something, so you decide
to play with it and see if it is as cool as the `Use Stylesheet'
submenu or the `Apply Theme' submenu.

It turns your Web page to gibberish. But that doesn't matter, since
you can always get it back to how it was (like you could with the
`Use Stylesheet' submenu, or the `Text Size' submenu, or the `Apply
Theme' submenu), right? Wrong. Because to get it back how it was, you
have to do a super-reload -- and you don't even know that there is
such a thing as a super-reload, let alone how to perform one.

* You want to check that Mozilla is using the document's specified
encoding, but because submenus are so difficult to use (and you happen
to be using Windows, where they're worst of all), you accidentally
choose an encoding by mistake. (Or the family cat comes to see what
you're doing and steps on the keyboard at just the wrong time, hitting
the mnemonic for one of the encodings.) Now you want to get it back
how it was. But you can't. Because you have to do a super-reload --
and you don't even know that there is such a thing as a super-reload,
let alone how to perform one.

* A page shows up as gibberish, and you suspect the encoding is at
fault. After some experimentation, you conclude that it's not the
encoding which is the problem after all, but something else that is
wrong with the Web site. You want to return to the original appearance
of the page, so you can e-mail the site author a screenshot of how it
looks and ask her what the problem is. But you can't. Because you have
to do a super-reload -- and you don't even know that there is such a
thing as a super-reload, let alone how to perform one.

Back to the /Macintosh human interface guidelines/ again:
|
| You can encourage people to explore your application by building in
| forgiveness. Forgiveness means that actions on the computer are
| generally reversible. People need to feel that they can try things
| without damaging the system; create safety nets for people so that
| they feel comfortable learning and using your product.
|
<http://developer.apple.com/techpubs/mac/HIGuidelines/HIGuidelines-24.html>

And the /Microsoft Windows user experience/ guidelines:
|
| Users like to explore an interface and often learn by trial and error.
| An effective interface allows for interactive discovery. It provides
| only appropriate sets of choices and warns users about potential
| situations where they could damage the system or data, or better,
| makes actions reversible or recoverable.
|
| Even in the best-designed interface, users can make mistakes. These
| mistakes can be both physical (accidentally pointing to the wrong
| command or data) and mental (making a wrong decision about which
| command or data to select). An effective design avoids situations that
| are likely to result in errors. It also accommodates potential user
| errors and makes it easy for the user to recover.
|
<http://msdn.microsoft.com/library/books/winguide/ch02b.htm>

> With regard to the form example and data loss, you want the page to
> display correctly in the first place before inputting any data. No
> user will attempt to enter form data on a page which is incorrectly
> displaying due to a wrong HTTP or document charset.

Did you not read the bug
<http://bugzilla.mozilla.org/show_bug.cgi?id=17889>, or are you
suggesting that I was making that up? It really happened, and it
was perfectly understandable. That woman was not an idiot. But she did
not know how to change the encoding to display her Webmail site in
Japanese, so she started composing her message even though the labels
saying `To:', `CC:', `Send', etc in Japanese were showing as ISO-8859-1 characters.

When it came the time that she really did need to see which button was
the `Attachments' button, she explored some more, worked out how to
change the encoding, changed to `Japanese (Auto-detect)', and -- bam! --
lost her entire message.

> So, you do an
> override, now the page displays correctly, and then you begin entering
> the data for the form. Why would anyone want to super-reload the
> correctly displaying page, lose data and then to boot end up in the
> original incorrect display? That is not the user scenario we should be
> covering with this function.

Exactly. You shouldn't have to super-reload the page at all. An encoding
change should preserve DOM state and form data -- anything else is dataloss.

> > 3. The `Character Coding' menu item has *three* levels of submenus
> > -- e.g. `View' > `Character Coding' > `More' > `SE & SW Asian' >
> > `Vietnamese (Windows-1258)'. Menus are the most difficult of all
> > common UI controls for users to manipulate, and as the level of
> > nesting increases linearly, the level of difficulty increases
> > geometrically. In any professionally-designed software app,
> > having more than *one* level of submenus is inexcusable.
>
> I am aware of these UI considerations. But shoving these items into
> the dialog does not solve the problem we are trying to address.

Well it solves the problem I am trying to address, which is the UI for
overriding the encoding being unnecessarily difficult to use. What
problem are you trying to address?

> Average users' needs are covered by the following default settings:

>...


> These 3 (or 4 with auto-detection) once they are set correctly serve
> almost all average users. They key is in improving the default setting
> mechanism. Currently, localizers will set them correctly for their own
> locale. Thus, people who have to learn to set these themselves are a
> small set of users like "Internet Cafe" users whose admin people have
> not yet learned how to set the defaults. Or since Internet cafe admins
> cannot predict the preference of each user ahead of time, an easier
> way to set the defaults. Setting the defaults more easily will solve
> the problem Matthew is trying to solve.

Again, all that is irrelevant to the current discussion. As I said, the
better you make the default setting mechanism (without making it
absolutely faultless), the *more* important it becomes that the override
mechanism is easy to use, because the less practice users will have in
using it.

> Sub menus thus are there mainly for advanced users who need more
> flexibility or access to many encodings. For these rarely used items,
> overhauling the sub menus seems too much of wasted efforts.

Nobody was asking you in particular to `waste' this effort. It may not
rank at all on your list of personal priorities. But if another Mozilla
contributor wants to implement improvements to the UI, describing it as
`wasted efforts' is only going to discourage them from contributing to
other areas in the future.

> It works
> Ok now for the segment of users we are targeting. Putting them in the
> dialog only makes the access harder for the advanced users who might
> want to seem them in the menu.

No, there are several ways in which a dialog would be easier than
submenus, even for advanced users.

* The problem of the nested submenus zigzagging over the top of their
ancestor menus (which happens even on this Mac with its single menu
bar at 832 * 624 resolution), making it hard to get back to the parent
items, would be eliminated.

* The list of available encodings could be scanned much faster using a
list control (with a scrollbar) in a dialog, than by weaving in and
out of submenus (as in Mozilla), or by twiddling your thumbs while a
single long submenu auto-scrolls to the part you are interested in (as
in Internet Explorer).

* The target area for encodings in a dialog's list control could
comfortably be much wider than the same items in a submenu, making
them easier to click.

* More context could be provided in a dialog about how the
document-specified encoding, the auto-detection modules, and the
user-specified encoding are currently interacting.

>...


> > 4. The division of encodings in the `More' submenu into what Momoi
> > describes as `geo-linguistic classifications fairly
> > well-accepted in linguistic typology'
> > <http://bugzilla.mozilla.org/show_bug.cgi?id=10999> may be
> > elegant from a linguistic point of view, but from a user's point
> > of view it is just an unnecessary level of complication.

>...


> See my comments above. If enough people dislike sub-grouping of
> encoding families, it would be easy to switch to MS style single
> sub-menu. However, let me also say that Mozilla covers many more
> encodings than MS and our fear was that with that many more items,
> some sub-groupings will be helpful. Remember that we are likely to add
> some more items to this menu so that eventually the number might reach
> 80-90. Is a single sub-menu easier for advanced users? Or the
> sub-groups?

>...

Neither. Unfortunately you seem fixated on the idea of using pull-down
menus at all. But pull-down menus are fundamentally unsuited to choosing
between 80 or 90 mutually exclusive items -- no matter how you arrange them.

>...


> > When presented with Mozilla's nested submenu of encoding
> > categories, none of which are actual languages, and most of
> > which have similarly-shaped names, I suggest that users will not
> > pause (holding the submenu open) while they calculate which
> > category their desired language fits into; instead, in an
> > attempt to find the item they want quickly, they will begin a
> > depth-first traversal of the submenus looking for the desired
> > encoding, wasting time in the long run.
>
> I think there is some truth in this comment. But let me also suggest
> that that hesitation would be for the first time.

Perhaps. But he who hesitates is lost, to the competition. We have to
*delight* the user the first time they try Mozilla. We can't afford to
put them through boot camp learning about geo-linguistic typology, and
My Sidebar Search categories, and the Forms Auto-Fill Single Sign-On
Gobbledygook Manager, and all that other crap that Mozilla is subjecting
users to at the moment.

> Given the
> sub-categorizations like East Asian, I think the user will quickly get
> used to the fact that Chinese, Japanese and Korean belong there. I
> doubt very much that they will keep on having this problem. Also, if
> you are heavy users of certain encodings, you should put them in the
> permanent or static list on the 1st tier via the "Customize..." menu.
> Again, for the target users we have in mind, I don't see much
> difference. A long menu with 80 or more items will be much worse, I
> fear.

Straw man argument. I didn't propose that Mozilla should have a single
long submenu, you did. I'm proposing that Mozilla should use a dialog
with a list control. I said I want Mozilla's UI to be *better* than
Internet Explorer's, not worse.

> > 5. The use of linguistic region submenus to subcategorize encodings
> > requires newly-installed encodings to specify which of Mozilla's
> > UI-specific regions they belong to, something which they may do
> > incorrectly, or not do at all
> > <http://forbes.co.il/dev/bidi-mozilla.gif>.
>
> The example image has to do with Arabic and Hebrew, which hopefully
> get a full support in the near future. I think we will be classing
> them into Middle Eastern when the support for Bi-di becomes finalized.
> Thus, this will not be a user-level decision. The engineers involved
> will put them correctly. This is a non-issue and something we can
> resolve via a bug.

Is it true, then, that support for new encodings can only be gained by
downloading a whole new version of Mozilla?

If so, why?

And if not, then when I download support for a new encoding, how does
Mozilla know which geo-linguistic category to put it in?

> > 6. The use of third-level submenus to subcategorize the encodings
> > available implies that selections in each of the submenus are
> > independent from each other, when they are not.
>
> They are independent from each other.

No, they are not independent from each other. Try it and see.
* Select `View' > `Character Coding' > `Other' > `West European' >
`Western (ISO-8859-1)'. ISO-8859-1 becomes selected in the `View' >
`Character Coding' submenu.
* Select `View' > `Character Coding' > `Other' > `East European' >
`Baltic (ISO-8859-4)'. ISO-8859-4 becomes selected in the `View' >
`Character Coding' submenu. ISO-8859-1 is no longer selected. You
cannot select both at once. The choices in each submenu are *not*
independent from each other.

> You're simply misunderstanding
> how these menu items work.

I spent a considerable amount of time trying to understand how they
work, and I'd like to think I'm of at least average intelligence. So if
I'm misunderstanding how the menu items work, I respectfully suggest
there's something badly wrong with them.

>...


> Auto-detection choice is independent of the encoding menu items.

Why? If you are trying to use the menus to indicate which auto-detection
module is in effect *and* which encoding that auto-detection module has
decided on for a given page, at the same time, you are providing the
user with too much information.

> This
> is why they cannot be under the same sub-group. Once you choose an
> auto-detection, that detection module is ON at all times for every
> page until auto-detection is tuned OFF. (Maybe we should mark this
> fact by changing a bullet to a checkmark for the Auto-detection
> modules.

That won't make me understand it any better, since on the Mac it's all
going to appear as checkmarks anyway
<http://developer.apple.com/techpubs/mac/HIGuidelines/HIGuidelines-83.html>.

> > (So both `View' > `Character Coding' > `Auto-Detect' >
> > `Auto-Detect (Chinese)' and `View' > `Character Coding' >
> > `Western (ISO-8859-1)' are checked at the same time, but `View'
> > > `Character Coding' > `More' > `Western' > `Western
> > (ISO-8859-1)' is not also checked.)
>
> Yes. That is true phenomenally. Ideally we would like the items in the
> sub-menu to disappear when it moves to the top tier because it now is
> in the cache. We haven't had time to implement it yet. That should
> solve the confusion.

>...

No, it would make things worse. Not only would it ruin the muscle memory
of people remembering the position of a particular encoding in the
submenus, it would also make it impossible to look up where the cached
encoding had come from so the user could select it themselves in the future.

>...


> B. The bullet mark you see under "`View' > `Character Coding'" does
> NOT indicate what you choose as the default encoding. (The default
> encoding is set in the Preferences.).

Then that's a bug. It makes no sense for almost all of the encoding
controls to be accessed from a submenu of the `View' menu, but for one
of them (the default) to be accessed a million miles away in the prefs dialog.

>...


> > 7. The submenus make a distinction between individual encodings and
> > auto-detection modules. From the user's point of view such a
> > distinction is completely irrelevant -- they don't care how
> > clever the programmers were in writing code to distinguish
> > between multiple encodings, they just want their pages to look
> > right. If there is a choice available in the `Text Encoding'
> > menu which is able to work out which of the variety of Japanese
> > encodings is being used on a particular Japanese Web page,
> > that's great, but the user shouldn't have to go hunting in the
> > `Auto-Detect' submenu for such an item, otherwise she won't use
> > it at all. Auto-detect modules should be available right next to
> > the items for the individual encodings which they cover.
>
> First, Auto-detection is not an encoding and should not be mixed with
> encodings.

I explained above why they should be mixed. Are you going to explain why
they should not be?

> This becomes even clearer with more discussion below.
> The user does not have to hunt for auto-detection menu usually. Each
> localizer will have set that item appropriately for that locale. Thus
> the default option for auto-detection is ON if it is available for the
> user's locale/language. What you are not taking into account is that
> we cannot generalize from the behavior of Mozilla users who are mostly
> using English UI for their languages. These are minority of people --
> developers, testers, Mozilla enthusiasts. Average users will get the
> UI in their language and the default encoding and auto-detection set
> properly for that locale. It is the localizer's job to set them
> correctly in the first place. Mozilla is designed with these defaults
> so that average users will not have to worry about issues you're
> talking about.

Yes, the defaults are carefully chosen. Great. Fine. But yet again, how
good the defaults are is irrelevant to this discussion. This discussion
is about how easy it is to change the settings, on those occasions when
you *do* need to change them.

> > 8. The five most recently-used encodings are selectable from the
> > first-level submenu. However, it is also possible (using the
> > misleadingly-named `Customize Character Coding' dialog) to
> > specify encodings which will always appear in the first-level
> > submenu, regardless of whether they have been recently used or
> > not. These encodings are mixed with the recently-used encodings
> > in the first-level submenu, without any visual distinction being
> > made between them.
>
> This is how IE menu is designed also. IE has 1 or more static menu
> items which are there at all times plus the cached items without any
> separator. If enough people want it, we can probably put in a
> separator.

I have never seen such static items in Internet Explorer's submenu, and
nor have I seen any method of adding such items. Does Internet Explorer
really offer this ability, and if so how?

> > As I said in bug 10999, this seems about as redundant as
> > allowing the user to specify which files will appear in their
> > `Recent Files' submenu -- it is only likely to be of any use to
> > a tiny minority of users who regularly apply more than five
> > different encodings (which would mean they would probably need
> > to know at least three languages). For the vast majority of
> > users, their list of recently-used encodings will be so good as
> > an approximation of the list of encodings which they want to
> > switch to quickly, that having an extra dialog for this is just
> > not worth the confusion and bloat generated by its existence.

>...


> That is not so. The Character Coding Customize is there because for
> advanced users, there is no way to predict which sets of encodings
> they would like to have there permanently.

Yes there is -- record the encodings which they have used in the past,
as you are already doing. If you don't trust this as a method of
providing the user with easy access to encodings they are likely to use
in the future, I can't help wondering why you are using it at all.

> We want that to be flexible
> and be left up to the users who need it. A vast majority of people
> will get their defaults set by localizers. Note also that we can have
> the static menu even when the cache is lost. It is also wrong to
> assume that some users will not need more than 5 items.

I didn't say that some users would not need more than 5 items, I said
that a tiny minority would. If I'm wrong, then make it 6, or 8.

> If you want to
> do Chinese and Japanese -- then you need a minimum of 5 and plus
> Western which is needed for just about everyone. That is 6. Some
> languages like Russian may have more than 5 available encodings in
> use. Yes, to some extent this is an overkill but I believe it is
> better to provide this flexibility to advanced users.

It is not better to provide that flexibility to advanced users, when
every other user is having to stumble past the overkill of the
`Customize ...' item on their way to selecting an encoding. `Customize
what?', they'll be wondering. `Produce my own custom encoding, perhaps?
Hmmmmm ...'

>...


> > Therefore, I would suggest that the most reliable behavior would
> > be to revert to using the document-specified encoding when the
> > user firsts visits a page which has a different
> > document-specified encoding from that which was present when the
> > user last overrode the encoding.
>
> If there is document-specified encoding and Mozilla is not honoring
> it, it is simply a bug and not a spec. Please file a bug and provide a
> specific instance.

No, that's not the problem.

Let's take a simple example. I have auto-detection turned off. I'm at
page (X), which is telling Mozilla that it's in encoding (1), but it's
actually in encoding (2). So I manually change the encoding from (1) to
(2), to read the page.

Then I click on a link in page (X) and go to page (Y), which is also
telling Mozilla that it's in encoding (1). Mozilla continues using
encoding (2). And that's good, because (Y) is probably on the same Web
site as (1), so it's the same broken server with the same broken
configuation, so (Y) was supposed to be in (2) anyway. I read the page.

Then I click on a link in (Y) and go to page (Z). (Z) tells Mozilla that
it's in encoding (7). From this it's reasonably certain that we're on a
different Web site now, so I think Mozilla should go back to believing
what the server tells it. But what Mozilla appears to do is carry
straight on using encoding (2), which is almost certainly the wrong
encoding to use.

Internet Explorer appears to have the same flaw, which means I
occasionally have to switch customers back to `Western (ISO-8859-1)'
when the German umlauted characters in their Hotmail appear as Japanese kana.

>...


> > The submenu would look like this:
> > _________________________________
> > Te_xt Encoding > |/ _Automatic |
> > |---------------------------------|
> > |* Western (ISO-8859-1) |
> > | Central European (ISO-8859-2) |
> > | Chinese (Simplified) |
> > | Japanese (Auto-Detect) |
> > | _Other ... |
> > |---------------------------------|
> > |* _Left to Right |
> > | _Right to Left |
> > `"""""""""""""""""""""""""""""""""'
> >
> > `Automatic' would specify whether or not the document-specified
> > encoding was being used.
>
> Remember that the bullet indicates what the current document encoding
> is, NOT what you have chosen as the default.

That's right.

> So this menu is
> incoherent.

Why?

> A bullet mark as a feedback should be placed only to a true encoding,
> not to an auto-detection module. Even IE does NOT do that. If you look
> closely at IE behavior, you will notice that even if you choose an
> auto-detection module like Japanese, when you go to a page, the bullet
> is placed next to one of the 3 Japanese encodings. This is very
> confusing! That is because they don't distinguish "auto-detect" module
> which is ALWAYS ON unless you choose something else from real
> encodings.

Yes, they should just have the bullet next to the auto-detect module all
the time.

> That is why even when you choose an auto-detection module,
> you will never see that module bulleted again IE. This IE-style UI
> should not be imitated.

Sure.

> It also makes no sense given what Mozilla
> backend does.

The user doesn't care what the back end does.

>...

I would give up the perverse attempt to try and show simultaneously what
auto-detection module Mozilla is using to determine the encoding, and
the encoding which the auto-detection module had come up with. I would
just show the method Mozilla was using to determine the encoding.

> This
> mixing is incoherent

Which is why I'm not suggesting it.

> and will not make sense given what our backend
> does.

The user doesn't care what the back end does.

> What you have not taken into account at all is that Mozilla already
> has default settings via the Pref menu.

And nor should I, because there shouldn't be default settings hidden in
the prefs dialog.

> You're confusing the default
> setting from a list of encodings which can be potentially used to mark
> the current encoding.

No, I'm not confusing a default setting with anything, because this UI
doesn't have a default setting at all. If one is needed (and I am not
convinced that it is), it could be added to the dialog as a separate
control, which would make much more sense than putting it in the prefs dialog.

>...


> The solution to Internet Cafe customers, Mozilla developers, and other
> minority users of this type lie in a better way to set all the
> defaults correctly with one action when Mozilla knows the preferred
> language of the user. Currently, you have to change 2 to 4 settings
> (if you include Mail) to get this done. We know that the correct
> defaults make a huge difference.

Good. But for the final time, how good the defaults are is irrelevant to
this discussion.

>...


> It is possible to change sub-menus into one long sub-menu with
> separators. We might do that but let's also make sure that because
> Mozilla covers many more languages/encodings than IE and the sub-menu
> idea should be evaluated by real target users, not by UI pros only.

Or you could just take it to any UI pro you like (and I don't regard
myself as a UI pro, I'm just a beginner). They'd look at you aghast and
say `You're not *seriously* proposing that, are you?', and you could
save yourself the cost of testing it with real target users.

>...


> If it helps people who are not necessarily in the i18n field, I would
> be happy to update the current spec document and also include
> introductory remarks to explain these issues in a less technical way.

>...

That would be a very good idea. A flow chart would help.

Katsuhiko Momoi

unread,
Dec 19, 2000, 1:20:29 AM12/19/00
to m...@mailandnews.com, mozill...@mozilla.org, mozil...@mozilla.org, mozill...@mozilla.org, b...@netscape.com
I am going to respond tonight to only 1 paragraph in the quoted message
below. Other parts require more time and attention. So it could be a
few days before I can respond again.

The part quoted below is important because the behavior of the current
Mozilla build is incorrect due to a bug.

- Kat

Matthew Thomas wrote:

> Katsuhiko Momoi wrote:
>
>
>> ...
>
>> Matthew Thomas wrote:
>
>> ...
>

>
>>

>>> Therefore, I would suggest that the most reliable behavior would
>>> be to revert to using the document-specified encoding when the
>>> user firsts visits a page which has a different
>>> document-specified encoding from that which was present when the
>>> user last overrode the encoding.
>
>>
>> If there is document-specified encoding and Mozilla is not honoring
>> it, it is simply a bug and not a spec. Please file a bug and provide a
>> specific instance.
>
> No, that's not the problem.
>
>
>
> Let's take a simple example. I have auto-detection turned off. I'm at
> page (X), which is telling Mozilla that it's in encoding (1), but it's
> actually in encoding (2). So I manually change the encoding from (1) to
> (2), to read the page.
>
> Then I click on a link in page (X) and go to page (Y), which is also
> telling Mozilla that it's in encoding (1). Mozilla continues using
> encoding (2). And that's good, because (Y) is probably on the same Web
> site as (1), so it's the same broken server with the same broke

> configuation, so (Y) was supposed to be in (2) anyway. I read the page.
> Then I click on a link in (Y) and go to page (Z). (Z) tells Mozilla that
> it's in encoding (7). From this it's reasonably certain that we're on a
> different Web site now, so I think Mozilla should go back to believing
> what the server tells it. But what Mozilla appears to do is carry
> straight on using encoding (2), which is almost certainly the wrong
> encoding to use.
>

The behavior you described above is not what we know as the current spec for this set of events. I believe you are seeing an effect of a regression bug:

http://bugzilla.mozilla.org/show_bug.cgi?id=62754

Try using an equivalent of NS6 (M18) final build with the auto-detection turned off. It should be working as I described.
(You can also use a build prior to what we believe is the date when this bug appeared in the builds, e.g. 12/8 seems fine.)


--
Katsuhiko Momoi
Netscape International Client Products Group
mo...@netscape.com

What is expressed here is my personal opinion and does not reflect
official Netscape views.


Jonathan Rosenne

unread,
Dec 19, 2000, 2:36:05 AM12/19/00
to Katsuhiko Momoi, mozil...@mozilla.org, mozill...@mozilla.org, Xslf, "\"\\"Matthew Thomas\\"\""
I suggest that the user's selection be maintained while in the same site
according to the URL.

Jony

Jungshik Shin

unread,
Dec 21, 2000, 11:21:14 PM12/21/00
to
In <Pine.LNX.4.10.100121...@Sina.sharif.ac.ir>, Roozbeh Pournader wrote:

I agree this should be made optional.

Even for those who usually browse web pages in a couple of languages,
the behavior that you think makes sense can be pretty inconvenient.
To take myself as an example, frequently I hop between Korean pages
(mostly in EUC-KR) and pages in ISO-8859-1 (even English pages have some
accent and other characters that go beyound the repertoire of US-ASCII
which is a subset of all Korean encodings) and the behavior forces me to
set the encoding manually EVEN if the page I'm viewing has the correctly
specified encoding either via http header or meta tag.

Jungshik Shin

Jonathan Rosenne

unread,
Dec 23, 2000, 5:18:32 AM12/23/00
to mozill...@mozilla.org, mozil...@mozilla.org
I suggest that the user selection remain in force only while he is in the same
site and the HTML has the same charset specification or lack of it.

When the user moves to another site or the charset specification changes the
selection to return to the default.

Jony

> -----Original Message-----
> From: Jungshik Shin [mailto:jung...@www.ykga.org]On Behalf Of Jungshik
> Shin
> Sent: Friday, December 22, 2000 6:21 AM
> To: mozill...@mozilla.org; mozil...@mozilla.org
> Subject: Re: Making the encoding selection UI easier to use
>
>

Reply all
Reply to author
Forward
0 new messages