Google Groups acts as both a news server and a new reader.
AFAIK there is no encoding override for GG[1,2] so there
is no easy way to tell if the decision to interpret Don's
message text as ISO-8859-1 (or equivalent) was made when
the message arrived at GG or when the message was called
up to be read. GG's genearl usage pattern plus common
software design sense would put it at the former.
[1] I am not a regular GG user, so there might be settings
that I am not aware of.
[2] There is of course the encoding override in one's browser.
> So I'm pretty sure it was your newsreader software that did the
> interpretation as UTF-8.
Of course -- have I implied otherwise?
> I instruct my newsreader to interpret
> postings in this group with undeclared character set as CP-1252,
> because that is in practice the best educated guess,
And I have set my default (called "Fallback" encoding in
Thunderbird) to "Unicode". I chose Unicode because a message
in a common "Western" encoding is by and large still
quite readable when interpreted as Unicode, but a message
in Unicode might contain many code points outside of the
0-255 range and thus incomprehensible when interpreted in
"Western" encodings.
> and Google
> probably does the same - or maybe it tries several interpretations
> and chooses the one that works best, a system that I planned to
> have in the newsreader I wanted to write for years but never did.
> Doing this is anathema to purists, of which you seem to be one,
I am not sure what qualifies one as a purist in this context,
but my reason for choosing Unicode as the default is a purely
practical one.
> but I consider it in line with the principle of "being liberal in
> what you accept", and more generally, a convivial attitude.
I had no problem understanding what "a?a?" and "b?nh m?"
referred to without changing the encoding, so I did not
bother. I do not see this as being "illiberal ...".
I would laud anyone who would take the extra step to set
the read-encoding so that the quoted text in a reply would
appear as the o.p. intended. However, not taking the extra
step should not be construed as anti-social, right?
ote well that I have not made any complain about Don's message.
OTOH, PTD blamed me for messing up the quoted text.
> In this case, I received the article from Eternal-September, and
> saw it in the intended form (except for "phở"). If the default
> doesn't work, I still have the choice to manually select a
> different character set interpretation for the specific post.
----- -----
>>>>> Many words are imported into English
>>>>> complete with the accents they sported in the source languages.
>>>>
>>>> And foreign words are often included in otherwise English text
>>>> without being nativized.
>>>
>>> And then they're italicized.
>>
>> Someone's neglect (or refusal) to abide by the convention
>> does not change a foreign word into an English one.
>
> But regular use by speakers of English does,
The mechanism of nativization is not in dispute; the disagreement
is about the extend of the nativizations of these words
and indications thereof.
> and that's clearly
> the case with items English speakers regularly order in Vietnamese
> food places, including bánh mì.
Clearly the meanings of "clearly" and "regularly" in this
context depends on where one lives and one's eating habits,
is it not?
Isn't the need of a pronunciation guide an indication that
the words are not fully nativized? And isn't it usually the case
that the more nativized a word is, the more likely it is going
to drop the accent marks?
> Which I often see written "Banh mi" where advertised. It's
> actually a silly name, as it simply means "bread". Other shops
> sell it, more appropriately IMO, as "Vietnamese sandwich".
Can <bánh mì> also mean (that kind of) sandwich in Vietnamese?
Cf in AmE "hotdog" can mean either the Frankfurter or the Frankfurter
plus the bun.