Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

quell character set messages?

3 views
Skip to first unread message

LinuxBear

unread,
Nov 23, 2001, 6:45:32 PM11/23/01
to
is there any way to quell the "wrong character set" messages? They drive me
bonkers!

--

No trees were killed in the sending of this message. However a large number
of electrons were terribly inconvenienced.

LinuxBear
db05...@yahoo.ca


___
{~._.~}
_( Y )_
(:_~*~_:)
(_)-(_)


Eduardo Chappa

unread,
Nov 23, 2001, 6:54:49 PM11/23/01
to
*** LinuxBear (vlpb...@news.videotron.ca) wrote today:

> is there any way to quell the "wrong character set" messages? They drive me
> bonkers!

No, you just have to use the biggest character set that you normally
receive messages in.

--
Eduardo
http://www.math.washington.edu/~chappa/pine/

LinuxBear

unread,
Nov 23, 2001, 8:00:14 PM11/23/01
to
kristian ragndahl wrote:
> LinuxBear <vlpb...@news.videotron.ca> wrote:
>: is there any way to quell the "wrong character set" messages? They drive me
>: bonkers!
>
> You have asked this several times before. Please read the answers.
>

I asked this *ONCE* before, and i did read the answer. I asked again because I
thought someone might have a workaround, a hack, whatever

dont be so trite

Nancy McGough

unread,
Nov 24, 2001, 1:37:28 AM11/24/01
to
On 24 Nov 2001 LinuxBear (vlpb...@news.videotron.ca) wrote:
> > LinuxBear <vlpb...@news.videotron.ca> wrote:
> >: is there any way to quell the "wrong character set" messages? They drive me
> >: bonkers!
>
> I asked this *ONCE* before, and i did read the answer. I asked again because I
> thought someone might have a workaround, a hack, whatever


* What do you currently have character-set set to in your pinerc?

* What character sets are people using that are producing the
wrong-character-set messages?

* Are your messages delivered to a system that has Procmail
installed?

Depending on your answers, I might have a suggestion for a
workaround.

Thanks,
Nancy


REFERENCE:
The message I'm replying to -- and this entire thread & group --
may be available at

<http://groups.google.com/groups?as_umsgid=%3Cslrn9vtrsk...@localhost.localdomain%3E>

--
ii Main Pine Page: <http://www.ii.com/internet/messaging/pine/>
ii Procmail QStart: <http://www.ii.com/internet/robots/procmail/qs/>

Nancy McGough <http://www.ii.com/> Infinite Ink
--= Sent via Pine 4.42: IMAP, NNTP & ESMTP for Unix/Win/MacOS X =--

LinuxBear

unread,
Nov 24, 2001, 1:31:23 PM11/24/01
to
Nancy McGough wrote:
> On 24 Nov 2001 LinuxBear (vlpb...@news.videotron.ca) wrote:
>> > LinuxBear <vlpb...@news.videotron.ca> wrote:
>> >: is there any way to quell the "wrong character set" messages? They drive me
>> >: bonkers!
>>
>> I asked this *ONCE* before, and i did read the answer. I asked again because I
>> thought someone might have a workaround, a hack, whatever
>
>
> * What do you currently have character-set set to in your pinerc?
>

character-set=iso-8859-1

> * What character sets are people using that are producing the
> wrong-character-set messages?

windows mostly

>
> * Are your messages delivered to a system that has Procmail
> installed?

it is a university email account, not sure about procmail

>
> Depending on your answers, I might have a suggestion for a
> workaround.
>
> Thanks,
> Nancy
>
>
> REFERENCE:
> The message I'm replying to -- and this entire thread & group --
> may be available at
>
> <http://groups.google.com/groups?as_umsgid=%3Cslrn9vtrsk...@localhost.localdomain%3E>
>

thanks

Matthew A. BACON

unread,
Nov 24, 2001, 3:56:41 PM11/24/01
to
On Sat, 24 Nov 2001, LinuxBear wrote:

> > * What do you currently have character-set set to in your pinerc?
> >
>
> character-set=iso-8859-1
>
> > * What character sets are people using that are producing the
> > wrong-character-set messages?
>
> windows mostly
>

I would also love an answer to this. More than turning off the messages,
is it possible to allow multiple character sets? Some of the newsgroups I
read use other sets for specialized characters. So I mainly receive
messages in iso-8859-1, iso-8859-3, AND utf-8. I do occasionally get a
message in the windows charset, but rarely.

---------------------------
The only problem
with Haiku is that you just
get started and then

Nancy McGough

unread,
Nov 27, 2001, 1:52:28 AM11/27/01
to
On 24 Nov 2001 LinuxBear (vlpb...@news.videotron.ca) wrote:
> Nancy McGough wrote:
> > On 24 Nov 2001 LinuxBear (vlpb...@news.videotron.ca) wrote:
> >> > LinuxBear <vlpb...@news.videotron.ca> wrote:
> >> >: is there any way to quell the "wrong character set" messages? They drive me
> >> >: bonkers!
> >
> > * What do you currently have character-set set to in your pinerc?
>
> character-set=iso-8859-1
>
> > * What character sets are people using that are producing the
> > wrong-character-set messages?
>
> windows mostly
>
> > * Are your messages delivered to a system that has Procmail
> > installed?
>
> it is a university email account, not sure about procmail

My Procmail Quick Start (URL in my sig below) has instructions
for finding out if procmail is on your system and, if it is,
setting it up.


> > Depending on your answers, I might have a suggestion for a
> > workaround.

OK, Here is a workaround that I just hacked together. I'm
crossposting this to comp.mail.pine, comp.mail.misc, and
comp.mail.mime hoping that pine, procmail, and mime experts will
give me some feedback.

First, note that if a message does not contain a MIME-version
header then Pine will not interpret any of MIME headers such as
Content-type. So my idea was to remove the MIME-version header if
the Content-type header contains `charset=Windows-1252'. This way
if you use a client other than Pine that does not insist on the
MIME-version header, it will still have the charset info that the
original sender intended. Here's my procmail recipe

:0 fhw
* ^Content-type:.*Windows-1252
* ^MIME-Version:
| formail -R MIME-Version: X-Original-MIME-Version:

This checks the header of a message and if it contains both a
MIME-Version header, and a Content-type header that includes the
string `Windows-1252', then it replaces the MIME-Version header
label with X-Original-MIME-Version.

I tested this with Pine and Mulberry, which are the only two mail
clients on my system and they both rendered my Windows-1252 test
messages fine. My test messages included the pound, euro, and yen
symbols and the really weird thing is that Mulberry did *not*
render the euro symbol when the MIME-Version header was left
intact but did render it when the MIME-Version header was
replaced with X-Original-MIME-Version -- very weird! Pine did not
render the euro symbol in any case so it seems that information
wasn't lost. The pound and yen symbols were rendered in all
cases.

My questions to the experts are:

* What do you recommend that I replace MIME-Version with. I made
up X-Original-MIME-Version and I'm wondering if there is a
standard?

* Any suggestions for improving my procmail recipe?

* What are the problems that people will see with this munging,
both with Pine and with other mail clients?

* Does anyone know why Mulberry displays the euro symbol only
when the MIME-Version header is missing?

* Does anyone know how to get Pine to render the euro symbol?


And here is a general charset question: Since Windows-1252 is a
charset that was designed for MS Windows, it seems to me that we
should discourage its use and instead get people to use ISO,
Unicode, or other platform-independent charsets. Do people agree
with that? If so, would it be a good idea to send an auto-reply
to people using charset=Windows-1252 asking them to use a more
standard charset?

Thanks,
Nancy
^x


REFERENCE:
The message I'm replying to -- and this entire thread & group --
may be available at

<http://groups.google.com/groups?as_umsgid=%3Cslrn9vui94...@localhost.localdomain%3E>

Earl Hood

unread,
Nov 27, 2001, 2:24:45 PM11/27/01
to
In article <Pine.WNT.4.42.0111270618180.-3852833-100000@no>,
Nancy McGough <nm-this-addr...@no.sp.am> wrote:

>And here is a general charset question: Since Windows-1252 is a
>charset that was designed for MS Windows, it seems to me that we
>should discourage its use and instead get people to use ISO,
>Unicode, or other platform-independent charsets. Do people agree
>with that? If so, would it be a good idea to send an auto-reply
>to people using charset=Windows-1252 asking them to use a more
>standard charset?

Windows-1252 is basically a superset of ISO-8859-1. The difference
is that Windows-1252 defines characters within the range of 0x80 to
0x9F; a range that has no defined character values in the ISO-8859
family of character sets.

In most cases, messages declared with a Windows-1252 charset
contains no characters in the 0x80 - 0x9F range, making the message
completely compatible with ISO-8859-1.

Since it will probably be futile to get people to avoid the use of
Windows-1252 since the MUAs many people use are setting the charset
parameter without the user's explicit knowledge. Sending an auto-reply
would probably lead to confusion.

It may be acceptable for an MUA that does not directly support
Windows-1252 to assume to ISO-8859-1 and just display the message.
Note, ISO-8859-1 does not define the Euro symbol, while Windows-1252
does. I believe ISO-8859-15 is supposed to be an update to Latin 1
to include the Euro symbol and to drop some less used symbols in
ISO-8859-1. I do not know how wide-spread ISO-8859-15 is used, but
it does not appear to be common.

You can goto <http://czyborra.com/charsets/codepages.html> for
useful information on various character sets, including MS Window
character sets.

--ewh
--
Earl Hood | University of California
eh...@hydra.acs.uci.edu | Irvine
http://www.nacs.uci.edu/indiv/ehood/ | Electronic Loiterer

Eric A. Hall

unread,
Nov 27, 2001, 3:33:38 PM11/27/01
to

Earl Hood wrote:

> In most cases, messages declared with a Windows-1252 charset
> contains no characters in the 0x80 - 0x9F range, making the message
> completely compatible with ISO-8859-1.

The characters in this range include things like curly quotes, em- and
en-dash, the euro symbol, and some characters which were not included in
ISO-8859-1 but which are included in ISO-8859-15. The presentation
characters like quotes and dashes are often used in HTML, but can also be
used in plain text if both systems understand the charset.

> Since it will probably be futile to get people to avoid the use of
> Windows-1252 since the MUAs many people use are setting the charset
> parameter without the user's explicit knowledge. Sending an auto-reply
> would probably lead to confusion.

Also, the way that the MIME registry works, any charset that is registered
is considered to be a "standard" charset. There is no deference to ISO* or
any other standards body for any charset other than ANSI X3.4-1968 (which
is obsolete).

> It may be acceptable for an MUA that does not directly support
> Windows-1252 to assume to ISO-8859-1 and just display the message.

This only works if you mask the 0x80-9F range, since those are defined as
control characters in the ISO specs. This is particularly nasty when you
use a terminal or emulator to read your mail.

--
Eric A. Hall http://www.ehsco.com/
Internet Core Protocols http://www.oreilly.com/catalog/coreprot/

Villy Kruse

unread,
Nov 28, 2001, 3:12:12 AM11/28/01
to
On Tue, 27 Nov 2001 20:33:38 GMT,
Eric A. Hall <eh...@ehsco.com> wrote:


>
>Earl Hood wrote:
>
>> In most cases, messages declared with a Windows-1252 charset
>> contains no characters in the 0x80 - 0x9F range, making the message
>> completely compatible with ISO-8859-1.
>
>The characters in this range include things like curly quotes, em- and
>en-dash, the euro symbol, and some characters which were not included in
>ISO-8859-1 but which are included in ISO-8859-15. The presentation
>characters like quotes and dashes are often used in HTML, but can also be
>used in plain text if both systems understand the charset.
>


It would be nice if the MS mail programs would analyze the text and
mark it as iso-8859-1 when no windows-1252 characters are found in
the message and likewise mark the text as US-ASCII if it only contains
US-ASCII characters. This by the way, is recommended by RFC2046
in this quote.


4.1.2. Charset Parameter

...

In general, composition software should always use the "lowest common
denominator" character set possible. For example, if a body contains
only US-ASCII characters, it SHOULD be marked as being in the US-
ASCII character set, not ISO-8859-1, which, like all the ISO-8859
family of character sets, is a superset of US-ASCII. More generally,
if a widely-used character set is a subset of another character set,
and a body contains only characters in the widely-used subset, it
should be labelled as being in that subset. This will increase the
chances that the recipient will be able to view the resulting entity
correctly.


>> Since it will probably be futile to get people to avoid the use of
>> Windows-1252 since the MUAs many people use are setting the charset
>> parameter without the user's explicit knowledge. Sending an auto-reply
>> would probably lead to confusion.
>
>Also, the way that the MIME registry works, any charset that is registered
>is considered to be a "standard" charset. There is no deference to ISO* or
>any other standards body for any charset other than ANSI X3.4-1968 (which
>is obsolete).
>

which is not good for compatibility. Registering a character set doesn't
make it understood by every other mail reader. The bigger the number
of different character set is the greater the risk if not being able to
display the message properly. The unicode character set in the form of
UTF8 should bring an end to this chaos.

Villy

Alan J. Flavell

unread,
Nov 28, 2001, 8:02:14 AM11/28/01
to
On Nov 28, Villy Kruse inscribed on the eternal scroll:

> Registering a character set doesn't
> make it understood by every other mail reader. The bigger the number
> of different character set is the greater the risk if not being able to
> display the message properly.

This is true enough, but there has been an additional problem with the
architecture of PINE in this area (as compared with, say, the Lynx web
browser): PINE throws these three different issues -

- the character coding of an incoming message
- the character coding of an outgoing message
- the character coding used by the local display

all into a single pot, with sub-optimal results.

There's a lot else which PINE does "right", but this aspect of
character code handling has been a bone of contention since, well, as
long as I've been aware of the existence of PINE, which is a
considerable number of years now[1]. I gather that they're now
working on it, and I look forward to seeing the results.

> The unicode character set in the form of
> UTF8 should bring an end to this chaos.

It's adding something to the mix, for sure, but I don't think you're
going to see the end of 8-bit character codings in actual use for
quite a while yet.


[1] Sure, the authors have their own priorities, I'm not trying to say
anything against that nor to seem ungrateful for free access to the
product of their considerable efforts.

[I'm suggesting f'ups narrowed to what seems to be the relevant group]

Villy Kruse

unread,
Nov 29, 2001, 3:19:33 AM11/29/01
to
On Wed, 28 Nov 2001 14:02:14 +0100,
Alan J. Flavell <fla...@mail.cern.ch> wrote:


>On Nov 28, Villy Kruse inscribed on the eternal scroll:
>
>> Registering a character set doesn't
>> make it understood by every other mail reader. The bigger the number
>> of different character set is the greater the risk if not being able to
>> display the message properly.
>
>This is true enough, but there has been an additional problem with the
>architecture of PINE in this area (as compared with, say, the Lynx web
>browser): PINE throws these three different issues -
>
>- the character coding of an incoming message
>- the character coding of an outgoing message
>- the character coding used by the local display
>
>all into a single pot, with sub-optimal results.
>

This is the only realistic way to handle it. Otherwise the number
of different character set combinations becomes overwhelming. If you
consider the about 9 different latin character sets in the iso-8859 series
you hav possible 9 differnt incomming sets and 9 different settings
for the display unit giving 81 combinations. Then add to that the non
iso-8859 character sets.


And besides, what do you do with characters in the incomming character
set which can't be rendered by the local display? For example if you
receive a message in iso-8859-2 it won't display properly on a iso-8859-1
display regardless of which character translation you do.


Villy

Alan J. Flavell

unread,
Nov 29, 2001, 8:19:00 AM11/29/01
to
On Nov 29, Villy Kruse inscribed on the eternal scroll:

> This is the only realistic way to handle it.

Then on your scale the Lynx developer who implemented these welcome
features was "unrealistic". It works, nevertheless.

> And besides, what do you do with characters in the incomming character
> set which can't be rendered by the local display?

You could put up an alert, just like today? Or do what Lynx does, or
both.

> For example if you
> receive a message in iso-8859-2 it won't display properly on a iso-8859-1
> display

If you receive a messge marked as iso-8859-2 but in English, then as
often as not it'll turn out to display without problems. Then the
alert, although understandable, is just an irritating nuisance.
German too would display perfectly well when advertised as iso-8859-2,
or as Turkish etc.

In any case, it's quite likely (in statistical terms, in an
English-language locale) that the terminal emulation can support the
windows-1252 repertoire, and would have no problem displaying the
repertoire used in a typical iso-8859-2 document.

The PINE developer who understood these issues didn't try to argue out
of any of this. It was explained to me, when I asked some years ago,
on the basis of CJK disambiguation, and developer priorities. While
there's no reason that I should argue against CJK disambiguation as a
principle, I have to admit that it isn't an issue that features in my
criteria for my personal choice of email client.

Andreas Prilop

unread,
Nov 29, 2001, 8:45:01 AM11/29/01
to
In article <news:slrna0brs...@pharmnl.ohout.pharmapartners.nl>,
v...@pharmnl.ohout.pharmapartners.nl (Villy Kruse) wrote:

> This is the only realistic way to handle it. Otherwise the number
> of different character set combinations becomes overwhelming. If you
> consider the about 9 different latin character sets in the iso-8859 series
> you hav possible 9 differnt incomming sets and 9 different settings
> for the display unit giving 81 combinations. Then add to that the non
> iso-8859 character sets.
> And besides, what do you do with characters in the incomming character
> set which can't be rendered by the local display? For example if you
> receive a message in iso-8859-2 it won't display properly on a iso-8859-1
> display regardless of which character translation you do.

There is no need to transcode from Greek ISO-8859-7 to Cyrillic 8859-5,
for example. Let's assume a Unix environment with 8-bit-per-character
encodings. All I want is that an incoming message labeled ISO-8859-7
is displayed in a Greek font (specified by me), that an incoming message
labeled ISO-8859-5 is displayed in a Cyrillic font, etc.
Web Browsers can do this - no transcoding involved.

The situation is even easier for PC-Pine. Fonts for MS Windows (such as
Monotype's Courier New) contain much more characters than only cp1252.
It should be no problem to display _any_ message as long as the
"charset" parameter is known.

It would be a bit more complicated if a [PC-Pine] user wants to send
messages in, say, Cyrillic Windows-1251. I see no principle problem
in transcoding the Cyrillic letters to ISO-8859-5 and to display
these letters in Unix Pine accordingly. Other characters from cp1251
(such as curly quotes, dashes, bullet) could be easily shown as
ASCII quotes, ASCII hyphen, asterisk.
Lynx does this very nice. Why can't Pine?

You might want to read
<http://www.hf.uib.no/smi/files/eudtab.html>
to learn how such transcoding is done with Eudora for Macintosh.
It is a practical and working solution that has enabled happy
Macintosh users to exchange e-mail in Greek, Cyrillic, Arabic, etc.
with the "outside world" for many years.

--
http://www.unics.uni-hannover.de/nhtcapri/plonk.txt

Earl Hood

unread,
Nov 30, 2001, 6:01:05 PM11/30/01
to
In article <3C03F89E...@ehsco.com>, Eric A. Hall <eh...@ehsco.com> wrote:

>> In most cases, messages declared with a Windows-1252 charset
>> contains no characters in the 0x80 - 0x9F range, making the message
>> completely compatible with ISO-8859-1.
>
>The characters in this range include things like curly quotes, em- and
>en-dash, the euro symbol, and some characters which were not included in
>ISO-8859-1 but which are included in ISO-8859-15. The presentation
>characters like quotes and dashes are often used in HTML, but can also be
>used in plain text if both systems understand the charset.

My statement indicates that MS software does not check what characters
are being used and give the most appropriate charset setting. For
example, if only us-ascii characters are used, us-ascii should be
the specified type. If only iso-8859-1 characters are used, then
iso-8859-1 should be the specified type.

I do know that windows-1252 defines some useful characters not in
iso-8859-1, but many times that windows-1252 is specified, there
are no characters within the 0x80 - 0x9F range.

Note, I do not see much usage of iso-8859-15.

>> Since it will probably be futile to get people to avoid the use of
>> Windows-1252 since the MUAs many people use are setting the charset
>> parameter without the user's explicit knowledge. Sending an auto-reply
>> would probably lead to confusion.
>
>Also, the way that the MIME registry works, any charset that is registered
>is considered to be a "standard" charset. There is no deference to ISO* or
>any other standards body for any charset other than ANSI X3.4-1968 (which
>is obsolete).

The registry just says it registered. It has nothing to say about
it having to then be adopted by all MUAs. It is too much of burden
to require MUA developers to always update their software everytime
a new assignment is made.

>> It may be acceptable for an MUA that does not directly support
>> Windows-1252 to assume to ISO-8859-1 and just display the message.
>
>This only works if you mask the 0x80-9F range, since those are defined as
>control characters in the ISO specs. This is particularly nasty when you
>use a terminal or emulator to read your mail.

An MUA could replace the characters with a '?', or something similiar,
or do something like the program less can do and display them like
"<9F>".

Andreas Prilop

unread,
Dec 1, 2001, 10:51:32 AM12/1/01
to
In article <news:Pine.LNX.4.30.011129...@lxplus023.cern.ch>,

"Alan J. Flavell" <fla...@mail.cern.ch> wrote:

> The PINE developer who understood these issues didn't try to argue out
> of any of this.

Wir werden schwerlich eine Antwort von diesen Amerikanern erhalten,
die kaum begreifen können, dass es auch noch andere Sprachen außer
Englisch gibt.

--
Can your newsreader display Latin-1 characters?
ÿ small letter y with diaeresis

Nancy McGough

unread,
Dec 3, 2001, 4:00:26 PM12/3/01
to
On 29 Nov 2001 Andreas Prilop (andreas...@altavista.net) wrote:
> The situation is even easier for PC-Pine. Fonts for MS Windows (such as
> Monotype's Courier New) contain much more characters than only cp1252.
> It should be no problem to display _any_ message as long as the
> "charset" parameter is known.

Thanks everyone for an interesting discussion about charsets. In
case anyone is wondering, I think I've figured out why the euro
sign wasn't showing up in PC-Pine but was showing up in Mulberry.
I use the font vt100.fon (that ships with SecureCRT) with PC-Pine
and I don't think that it contains the euro sign -- does anyone
know if that's true? I've updated my Power Pine page to contain
some info about character sets in this section

http://www.ii.com/internet/messaging/pine/pc/#character-set

I'd appreciate any feedback.

Thanks again,
Nancy
hoping someday Pine will let us *easily* switch between
different sending and receiving character sets.


REFERENCE:
The message I'm replying to -- and this entire thread & group --
may be available at

<http://groups.google.com/groups?as_umsgid=%3C<291120011445018101%andreas...@altavista.net>%3E>

Nancy McGough <http://www.ii.com/> Infinite Ink
--= Sent via Pine 4.43: IMAP, NNTP & ESMTP for Unix/Win/MacOS X =--

Alan J. Flavell

unread,
Dec 3, 2001, 5:23:15 PM12/3/01
to
On Dec 3, Nancy McGough inscribed on the eternal scroll:

Hi. This is a confusing area: my speciality (as Andreas will no doubt
confirm) is character usage in HTML, where it becomes important to
distinguish clearly between the concepts of character set and of
character coding. Since HTML is pretty widespread these days (don't
get me wrong, I am _no_ friend of emails in HTML), I feel it's useful
to maintain the distinction.

iso-8859-1 is a character _coding_, in terms of current terminology.
It's unfortunate that when the MIME protocol was defined, the
attribute for this concept was called "charset", but we have to live
with that somehow.

When you describe your selection of a display font vt100.fon, you
are describing a font which contains a specific _repertoire_ of
characters, and those characters are arranged in a particular way,
forming a "character set". This character set is suitable for
representing iso-8859-1 character coding (and for other codings of the
same character repertoire, e.g cp-850 or DOS Latin1, cp-1047 EBCDIC
Latin1, since these contain the same repertoire of characters, even
though they are arranged in a different ordering: all that's needed is
a simple lookup table).

If you get given a document in iso-8859-15 or some other incompatible
coding, then obviously you can't expect to get the right results by
simply flinging the octets (bytes) at a font which doesn't even
contain the characters that are needed! So where you say "does not
seem to be able to display the euro sign" I'm thinking "whatever led
you to suppose that it might?" ;-)

Having blethered on at some length about that, I'm afraid I don't have
a very concrete suggestion for how you could give a better description
within the limited space you have given yourself. The problem, as I
see it, is that the average reader _believes_ that they already
understand "character sets", when in fact what they understand is a
confused muddle of character coding, repertoire, fonts etc. It's very
hard to persuade them to step back and take a look at what they think
they understand, and maybe get the pieces of the puzzle to fall into
place in less of a jumble.

I think I'd have that note saying something like this:
___
/

For correct results when viewing the character coding that you have
selected, you must choose a corresponding font. E.g vt100.fon for
iso-8859-1. If you change to a different coding, let's say
iso-8859-15, then you would need to use a font which is appropriate
for /that/ coding, otherwise you cannot rely on what you will see.

When you receive a mail in a different coding, then, as
PINE warns you, some characters may display incorrectly.
\___ ^^^

(yeah, how does one say nicely "if the message contains any of the
affected characters, they most certainly WILL display incorrectly"?)

> hoping someday Pine will let us *easily* switch between
> different sending and receiving character sets.

Indeed.

all the best

Villy Kruse

unread,
Dec 4, 2001, 3:00:31 AM12/4/01
to
On Mon, 3 Dec 2001 23:23:15 +0100,

Alan J. Flavell <fla...@mail.cern.ch> wrote:


>
>For correct results when viewing the character coding that you have
>selected, you must choose a corresponding font. E.g vt100.fon for
>iso-8859-1. If you change to a different coding, let's say
>iso-8859-15, then you would need to use a font which is appropriate
>for /that/ coding, otherwise you cannot rely on what you will see.
>

In the case of the Euro symbol it is encoded as 0xA4 in iso-8859-15
and 0x80 in Windows-1252. I would be surprised if the 0x80 encoding
won't be the de-facto standard, unless the unicode encoding is used.


Villy

Alan J. Flavell

unread,
Dec 4, 2001, 6:16:36 AM12/4/01
to
On Dec 4, Villy Kruse inscribed on the eternal scroll:

> On Mon, 3 Dec 2001 23:23:15 +0100,
> Alan J. Flavell <fla...@mail.cern.ch> wrote:

[a suggestion for Nancy's web page]

> In the case of the Euro symbol it is encoded as 0xA4 in iso-8859-15
> and 0x80 in Windows-1252.

What kind of suggestion was _that_ for Nancy's web page?

If her page is going to say anything of this kind, then giving a URL
to some reliable online resources (e.g the character cross mapping
tables at Unicode) would seem more constructive than citing the actual
values of the odd individual character.
http://www.unicode.org/Public/MAPPINGS/

> I would be surprised if the 0x80 encoding
> won't be the de-facto standard, unless the unicode encoding is used.

You can't just take a single coded octet in isolation without
reference to the character coding that is in use.

Including 0x80 as a printable character in an iso-8859-15 -coded
document would clearly be wrong, as would including a 0xA4 in a
Windows-1252 -coded document and believing it to be a euro character.

So maybe what you're trying to say is that you believe Windows-1252
will become some kind of de facto "standard". Well, if you ask me,
it already has become - but that doesn't mean I'm going to recommend
others to use it, in general, in an open-standards context.

best regards

Villy Kruse

unread,
Dec 4, 2001, 10:04:18 AM12/4/01
to
On Tue, 4 Dec 2001 12:16:36 +0100,

Alan J. Flavell <fla...@mail.cern.ch> wrote:


>On Dec 4, Villy Kruse inscribed on the eternal scroll:
>
>> On Mon, 3 Dec 2001 23:23:15 +0100,
>> Alan J. Flavell <fla...@mail.cern.ch> wrote:
>
>[a suggestion for Nancy's web page]
>
>> In the case of the Euro symbol it is encoded as 0xA4 in iso-8859-15
>> and 0x80 in Windows-1252.
>
>What kind of suggestion was _that_ for Nancy's web page?

Don't know Nancy's web page.

>
>If her page is going to say anything of this kind, then giving a URL
>to some reliable online resources (e.g the character cross mapping
>tables at Unicode) would seem more constructive than citing the actual
>values of the odd individual character.
>http://www.unicode.org/Public/MAPPINGS/
>

Absolutly.

I think someone already mentioned www.czyborra.com containing a lot
of reference material with respect to various character sets and character
encodings.

>> I would be surprised if the 0x80 encoding
>> won't be the de-facto standard, unless the unicode encoding is used.
>
>You can't just take a single coded octet in isolation without
>reference to the character coding that is in use.
>

Of course not.

>Including 0x80 as a printable character in an iso-8859-15 -coded
>document would clearly be wrong, as would including a 0xA4 in a
>Windows-1252 -coded document and believing it to be a euro character.
>

Of course it would be wrong. You just need to know which character
set and encoding you will be using before inserting that symbol (or any
other symbol) into a message.

>So maybe what you're trying to say is that you believe Windows-1252
>will become some kind of de facto "standard". Well, if you ask me,
>it already has become - but that doesn't mean I'm going to recommend
>others to use it, in general, in an open-standards context.
>

If you ask me I would not recommend trying to insert a euro character
in any e-mail message regardless of encoding, but use the 3 letter
international abreviatoun EUR instead, and for the rest stick with
iso-8859-1.

Wonder what Outlook Express would do with a iso-8859-15 text.


Villy

Andreas Prilop

unread,
Dec 4, 2001, 2:14:21 PM12/4/01
to
In an article with invalid Message-ID,
Nancy McGough <nm-this-addr...@no.sp.am> wrote:

> I've updated my Power Pine page to contain
> some info about character sets in this section
> http://www.ii.com/internet/messaging/pine/pc/#character-set
> I'd appreciate any feedback.

In PC-Pine, you _can_ set your window and printer font to
"Courier New Bold Italic" (Why anybody should want to do this
is completely beyond me.) and this information is carefully stored
into your pinerc file. But when you choose "Courier New Greek" or
"Courier New Cyr" or "Courier New CE", etc, this information is
*not* stored and lost. You have to choose the script (Greek,
Cyrillic, Central European, etc) again and again when you start
PC-Pine. How silly!
<http://groups.google.com/groups?th=db527d001ff1c392>

Wir werden keine Antwort von den amerikanischen Entwicklern von
Pine erhalten, die kaum begreifen können, dass es auch andere

Earl Hood

unread,
Dec 4, 2001, 6:12:27 PM12/4/01
to
In article <Pine.WNT.4.43.0112032035580.-3848883-100000@no>,

Nancy McGough <nm-this-addr...@no.sp.am> wrote:
>Thanks everyone for an interesting discussion about charsets. In
>case anyone is wondering, I think I've figured out why the euro
>sign wasn't showing up in PC-Pine but was showing up in Mulberry.
>I use the font vt100.fon (that ships with SecureCRT) with PC-Pine
>and I don't think that it contains the euro sign -- does anyone
>know if that's true? I've updated my Power Pine page to contain
>some info about character sets in this section
>
> http://www.ii.com/internet/messaging/pine/pc/#character-set

The answer to the Note you have about vt100.fon is that the
Euro symbol has a different code point in Windows-1252 from
ISO-8859-15. Most font sets map directly to a given coded
character set. I.e. The raw integer value given to a character
is used as the offset into the glyph table for the font.

Someone mentioned about the problem of the use of "charset". For
traditional coded character sets (CCS), like the iso-8859 sets, the
CCS and the character encoding form (CEF) are basically the same.
Actually, the iso-8859 family really defines CEFs and the CCSs are
directly implied.

The terminology about characters, character sets, et. al. can be quite
confusing (and I have probably screwed up myself in this message). The
following Unicode Technical Report is pretty good about explaining
character encoding and the various terminology used:
<http://www.unicode.org/unicode/reports/tr17/>.

Alan J. Flavell

unread,
Dec 5, 2001, 7:37:26 AM12/5/01
to
On Dec 4, Earl Hood inscribed on the eternal scroll:

> Most font sets map directly to a given coded
> character set. I.e. The raw integer value given to a character
> is used as the offset into the glyph table for the font.

This is, in a sense, the source of a conceptual problem. Many folks
are so ingrained with the idea that the font arrangement and the
character coding is one and the same thing, that they seem to find it
impossible to even form the concepts - let alone discuss them - in
situations where this is not so.

> Someone mentioned about the problem of the use of "charset". For
> traditional coded character sets (CCS), like the iso-8859 sets, the
> CCS and the character encoding form (CEF) are basically the same.
> Actually, the iso-8859 family really defines CEFs and the CCSs are
> directly implied.

Fair comment.

> The terminology about characters, character sets, et. al. can be quite
> confusing

Especially as the terminology has been notoriously unstable over time!
(The most obvious symptom of which, in the present context, is the
MIME usage of the attribute "charset" to specify, say, utf-8).

> <http://www.unicode.org/unicode/reports/tr17/>.

Useful reference, thanks. However, while it does set out one definite
model of the "naming of parts", people are IMHO still going to have
difficulties fitting it in to their preconceived notions which tend to
be based on past usage, and on deductions from what they saw happening
in, say, Wordprocessors, or in wrongly-architected web browsers such
as NN4.

Only yesterday I was being bawled-out, elsewhere, for failing to
answer a "simple" question with a "simple" answer - but the answer
necessitated a clear understanding of these various parts, over which
the questioner was only too evidently in an extensive muddle.

0 new messages