help on writing Greek characters

elagabalus

unread,

Aug 1, 2006, 8:58:12 PM8/1/06

to

As I don't succeed in pasting valid Greek characters in the posts I send to
this
newsgroup, I decided to utilize the Polytonic Greek keyboard. But when I get
to
write to alt.just.testing for a trial, I can just see a series of squares. I
tried with Arial Unicode MS and Times New Roman fonts.

Geoff

unread,

Aug 1, 2006, 11:33:27 PM8/1/06

to

I went to that NG and can read your posts fine. You may need to adjust
the encoding in your newsreader to display them properly.

QUOTE

~ ῁ ς ε ρ τ υ θ ι ο π α σ δ φ γ η ξ κ λ ζ χ ψ ω β ν μ Ε Ρ Τ Υ Θ Ι Ο Π Α
Σ Δ Φ Γ Η Ξ Κ Λ Ζ Χ Ψ Ω Β Ν Μ

Dasia
ἁ ἑ ἡ ἱ ὁ ῥ ὑ ὡ Ἁ Ἑ Ἡ Ἱ Ὁ Ῥ Ὑ Ὡ

Dasia Oxia
ἅ ἕ ἥ ἵ ὅ ὕ ὥ Ἅ Ἕ Ἥ Ἵ Ὅ Ὕ Ὥ

Dasia Oxia Ypogegrammeni/Prosgegrammeni
ᾅ ᾕ ᾥ ᾍ ᾝ ᾭ

Dasia Perispomeni
ἇ ἧ ἷ ὗ ὧ Ἇ Ἧ Ἷ Ὗ Ὧ

Dasia Perispomeni Ypogegrammeni/Prosgegrammeni
ᾇ ᾗ ᾧ ᾏ ᾟ ᾯ

Dasia Varia
ἃ ἓ ἣ ἳ ὃ ὓ ὣ Ἃ Ἓ Ἣ Ἳ Ὃ Ὓ Ὣ

Dylan Sung

unread,

Aug 2, 2006, 2:12:48 AM8/2/06

to

"elagabalus" <PER...@AMICI.it> wrote in message
news:EMSzg.60113$_J1.6...@twister2.libero.it...

It may be that there is an encoding for greek characters when you send
messages. Select unicode in the text format, and fiddle around with the
options first and make more posts until you get it right.

I take it, your greek fonts look alright before you sent the messages, but
only get back a post with rectangles...

Dyl.

Jukka K. Korpela

unread,

Aug 2, 2006, 3:19:09 AM8/2/06

to

elagabalus <PER...@AMICI.it> scripsit:

> As I don't succeed in pasting valid Greek characters in the posts I
> send to this
> newsgroup, I decided to utilize the Polytonic Greek keyboard.

As a rule of thumb, international Usenet groups use US-ASCII characters
only, since other characters don't work reliably enough. Polytonic Greek
characters are a particularly difficult case. Even if you sent them
correctly, not all newsreaders get and show them properly, and it easily
happens that when someone quotes you, things get all wrong.

There's a particular feature in encoding of polytonic Greek that might
explain some of the problems that have occurred, and I mention it because it
will always remain a problem (though hopefully a manageable one in the
future). A Greek letter with a diacritic mark can be represented either as
a single Unicode character or as two Unicode characters, namely the base
letter followed by a combining diacritic mark. Though they are by definition
"compatibility equivalent", they are not identical, and it may well happen
that one method works and the other does not, in particular situation. Most
probably it's the single-character representation that works more often (and
has much better odds of creating a typographically acceptable visual
result).

This group is admittedly one of the few international groups where a richer
character repertoire would really be needed. But I'm afraid it's still far
from safe to use Unicode here (which would be the only sensible way to
include polytonic Greek using the real Greek characters).

Thus, try to live with the US-ASCII limitation for the time being. Use some
transliteration or, if you have long piece of text that you wish to discuss,
write it as a web page and tell its URL.

> I tried with Arial Unicode MS and Times New Roman fonts.

It is irrelevant what fonts you used. Font information is not included into
a Usenet message (unless you use HTML format on Usenet - you shouldn't),
only the characters as elements of a character code. On a web page, it's
different, but you need not (and perhaps should not) set the font there
either.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

Wiktor S.

unread,

Aug 2, 2006, 4:13:10 AM8/2/06

to

>> write to alt.just.testing for a trial, I can just see a series of
>> squares. I tried with Arial Unicode MS and Times New Roman fonts.

Maybe you tried, but failed :-)
I changed the font to Arial Unicode and all of the following characters
showed just fine:

> ~ ῁ ς ε ρ τ υ θ ι ο π α σ δ φ γ η ξ κ λ ζ χ ψ ω β ν μ Ε Ρ Τ Υ Θ Ι Ο Π
> Α Σ Δ Φ Γ Η Ξ Κ Λ Ζ Χ Ψ Ω Β Ν Μ

> ἁ ἑ ἡ ἱ ὁ ῥ ὑ ὡ Ἁ Ἑ Ἡ Ἱ Ὁ Ῥ Ὑ Ὡ

> ἅ ἕ ἥ ἵ ὅ ὕ ὥ Ἅ Ἕ Ἥ Ἵ Ὅ Ὕ Ὥ

> ᾅ ᾕ ᾥ ᾍ ᾝ ᾭ

> ἇ ἧ ἷ ὗ ὧ Ἇ Ἧ Ἷ Ὗ Ὧ

> ᾇ ᾗ ᾧ ᾏ ᾟ ᾯ

> ἃ ἓ ἣ ἳ ὃ ὓ ὣ Ἃ Ἓ Ἣ Ἳ Ὃ Ὓ Ὣ

(be sure that your newsreader actually displays arial unicode when you
choose it)

--
Azarien

mb

unread,

Aug 2, 2006, 5:38:45 AM8/2/06

to

Seeing that you easily use Unicode, why not make life simpler?
No polytonic keyboard works well with all. I tried everything and went
bananas. Some classical NGs waste a large part of their time discussing
code pages and charsets but continue sending messages that look full of
stupid little squares to most people.

Now, there is no real need for a polytonic keyboard, and the usual
monotonic (modern) Greek keyboard doesn't have that kind of problems.

People who can read Greek won't have any problem reading without the
accessory marks: if they can read, they know the rules of prosody and
the placement of signs. To a minority who have learned Modern Greek
only, whatever is understandable in polytonic is just as understandable
in monotonic. So, almost no drawbacks and the big advantage of easy
reading.

Also, transliterating to Latin characters according to the classic
rules works just fine anyway. In native Greek NGs, most correspondence
is in Latin transliteration (not classical, but by keyboard position)
to make sure everyone without exception can read, no matter the
computer and the system.

Below monotonic, Latin and Modern keyboard samples to test
understandability:

1. Ανδρα μοι έννεπε, Μούσα, πολύτροπον,
ος μάλα πολλά
πλάγχθη, επεί Τροίης ιερόν πτολίεθρον
έπερσεν '

2. Ándra moi énnepe, Moûsa, polýtropon, hos mála pollá
plánchthê, epeí Troíês hierón ptolíethron épersen
pollôn d'anthrôpôn....

No problemo.

3. Andra moi énnepe, Moûsa, polútropon, 'os mála pollá
plágx0h, epeí Troíhs 'ierón ptolíe0ron épersen '
pollwn d'an0rwpwn íden ástea kai nóon égnw,

Not too hard either.

Wiktor S.

unread,

Aug 2, 2006, 6:39:00 AM8/2/06

to

> Also, transliterating to Latin characters according to the classic
> rules works just fine anyway. In native Greek NGs, most correspondence
> is in Latin transliteration (not classical, but by keyboard position)
> to make sure everyone without exception can read, no matter the
> computer and the system.

So why use the Greek alphabet at all? Interesting, how easily you adapt
yourself to the software, instead of making it work.

> 1. Ανδρα μοι έννεπε, Μούσα, πολύτροπον,

> 2. Ándra moi énnepe, Moûsa, polýtropon,

so ύ is û or ý?

--
Azarien

mb

unread,

Aug 2, 2006, 7:42:49 AM8/2/06

to

Wiktor S. wrote:
> > Also, transliterating to Latin characters according to the classic
> > rules works just fine anyway. In native Greek NGs, most correspondence
> > is in Latin transliteration (not classical, but by keyboard position)
> > to make sure everyone without exception can read, no matter the
> > computer and the system.
>
> So why use the Greek alphabet at all? Interesting, how easily you adapt
> yourself to the software, instead of making it work.

Rather, why use it when there is any chance of not getting across?
Ease of adaptation is the same for any conventions of alphabetical
writing, anyway, not a special thing with software. Getting the rules
internalized for less than 30 signs and their combinations isn't rocket
science.

> > 1. Ανδρα μοι έννεπε, Μούσα, πολύτροπον,
> > 2. Ándra moi énnepe, Moûsa, polýtropon,
>
> so ύ is û or ý?

2 traditions: either all u, or u in diphthongs and y otherwise.

Tzortzakakis Dimitrios

unread,

Aug 2, 2006, 10:44:46 AM8/2/06

to

Ï "Wiktor S." <wswik...@Mpoczta.fm> Ýãñáøå óôï ìÞíõìá
news:eappmh$8sr$1...@news.onet.pl...

> >> write to alt.just.testing for a trial, I can just see a series of
> >> squares. I tried with Arial Unicode MS and Times New Roman fonts.
>
>
> Maybe you tried, but failed :-)
> I changed the font to Arial Unicode and all of the following characters
> showed just fine:
>

> > ~ ? ò å ñ ô õ è é ï ð á ó ä ö ã ç î ê ë æ ÷ ø ù â í ì Å Ñ Ô Õ È É Ï Ð
> > Á Ó Ä Ö Ã Ç Î Ê Ë Æ × Ø Ù Â Í Ì
> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
> > ? ? ? ? ? ? ? ? ? ? ? ? ? ?
> > ? ? ? ? ? ?
> > ? ? ? ? ? ? ? ? ? ?
> > ? ? ? ? ? ?
> > ? ? ? ? ? ? ? ? ? ? ? ? ? ?
>
That's all I can see.I see on my newsreader that it's greek (Geoff's post)
and unicode (Wiktor.S's post) alright but I still can't read it.In greece,
monotonico is used since maybe '82 and to really write polytonic texts you
have to use Magenta's "polytonisti", a special programm.Even a native greek
windows PC, like mine, cannot display polytonic texts without magenta's plug
in and to make your text readable you have to convert it to a .pdf
document.Magenta's programm does the toning by itself as even older greek
language professors really don't remember all the rules.Anyway, for usenet
usage, make sure you do encode it in greek or unicode.That was, and will be,
a problem with all the non-ASCII fonts,except if you use "greeklish"(which
are used by many greek people abroad, without greek support on their
pcs).Anyway, if you want to quickly shift between greek and english,
start-run-(type)internat and a little blue square will appear on the bottom
right of your screen.Add greek, and use ALT+SHIFT to change (you can add
other languages, too, but things get complicated with more than two).

Andreas Prilop

unread,

Aug 2, 2006, 11:04:53 AM8/2/06

to

On Wed, 2 Aug 2006, Tzortzakakis Dimitrios wrote:

> X-Newsreader: Microsoft Outlook Express 6.00.2800.1106

You have already been told in
<news:Pine.GSO.4.44.060717...@s5b004.rrzn.uni-hannover.de>
<http://groups.google.com/group/sci.lang/msg/0684b8d0c34a5706>
how to set up your newsreader surrogate in order to send
special, non-ASCII characters.

Apparently, you don't want to learn.

Bye.

Andreas Prilop

unread,

Aug 2, 2006, 11:49:08 AM8/2/06

to

On Tue, 1 Aug 2006, Geoff wrote:

> I went to that NG and can read your posts fine. You may need to adjust
> the encoding in your newsreader to display them properly.

No, this should only be necessary for *writing* but not for *reading*.

Andreas Prilop

unread,

Aug 2, 2006, 11:55:25 AM8/2/06

to

On Wed, 2 Aug 2006, elagabalus wrote:

> Content-Type: text/plain;
> charset="iso-8859-1";

>
> As I don't succeed in pasting valid Greek characters in the posts I send to
> this
> newsgroup, I decided to utilize the Polytonic Greek keyboard.

You cannot write Greek characters with "charset=iso-8859-1".
(Well, we might take the micro sign as Greek mu.)
You need to choose either Greek ISO-8859-7 or Unicode UTF-8 as
encoding.

Greek capital letters:
Α Β Γ Δ Ε Ζ Η Θ Ι Κ Λ Μ Ν Ξ Ο Π Ρ Σ Τ Υ Φ Χ Ψ Ω

Greek small letters:
α β γ δ ε ζ η θ ι κ λ μ ν ξ ο π ρ σς τ υ φ χ ψ ω

mb

unread,

Aug 2, 2006, 12:10:10 PM8/2/06

to

Andreas Prilop wrote:
...

> You cannot write Greek characters with "charset=iso-8859-1".
> (Well, we might take the micro sign as Greek mu.)
> You need to choose either Greek ISO-8859-7 or Unicode UTF-8 as
> encoding.
>
> Greek capital letters:

> Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ó Ô Õ Ö × Ø Ù
>
> Greek small letters:
> á â ã ä å æ ç è é ê ë ì í î ï ð ñ óò ô õ ö ÷ ø ù

Produces only gobbledygook.
Bring something that everyone can read or stop tinkering.

Andreas Prilop

unread,

Aug 2, 2006, 12:19:58 PM8/2/06

to

On 2 Aug 2006, mb wrote:

> Organization: http://groups.google.com
> Content-Type: text/plain; charset="iso-8859-1"
> User-Agent: G2/0.2

>
> Produces only gobbledygook.
> Bring something that everyone can read or stop tinkering.

*You* need a newsreader instead of Google's broken Usenet interface.
Note that "User-Agent: G2/0.2" is a very preliminary beta
version 0.2.

My original message had "charset=ISO-8859-7".
Google's broken Usenet interface ignores this.
If *you* rely on Google Groups, you have to live with its
imperfections.

Peter T. Daniels

unread,

Aug 2, 2006, 1:06:55 PM8/2/06

to

Yet mb's Greek sample a few posts up came through perfectly.

I could even read a line of Arabic a few days ago!

Nigel Greenwood

unread,

Aug 2, 2006, 1:50:04 PM8/2/06

to

Peter T. Daniels wrote:

> Andreas Prilop wrote:
> > My original message had "charset=ISO-8859-7".
> > Google's broken Usenet interface ignores this.
> > If *you* rely on Google Groups, you have to live with its
> > imperfections.

>
> Yet mb's Greek sample a few posts up came through perfectly.
>
> I could even read a line of Arabic a few days ago!

I use Google Groups, FWIW. My impression is that if you generate some
text in any script, GG will convert it to Unicode & transmit it
correctly, as long as you select "English/Automatic" in the language
box. The problems seem to start when you select some other language.

I, too, got empty rectangles in the polytonic Greek passages, but was
able to read them by pasting them into Word & selecting a suitable font
(eg one of the following:

Arial Unicode MS
Athena
Cardo
Code2000
Palatino Linotype
TITUS Cyberbit Basic
Vusillus Old Face

Obviously you won't be able to see the polytonic characters unless you
have a suitable font on your system.

Re "Greeklish": there are various informal versions, some more
phonetic than others, used in newsgroups, mplogk (blogs), etc. But
there is also a strict version used by HR-Net, which takes a bit of
getting used to. Eg:

" Na egkataleicei tis tritokosmikoy xarakthra apeiles pros tis
dioikhseis twn tameiwn", symboyleyse thn Ajiwmatikh Antipoliteysh o
ypoyrgos Apasxolhshs kai Koinwnikhs Prostasias k. S. Tsitoyridhs.

The letters C and J, in particular, for Psi and Xi, aren't very
intuitive.

Nigel

--
ScriptMaster language resources (Chinese/Modern & Classical
Greek/IPA/Persian/Russian/Turkish):
http://www.elgin.free-online.co.uk

mb

unread,

Aug 2, 2006, 2:53:51 PM8/2/06

to

When there is an obvious, effective and uncomplicated alternative,
anyone who complicates life by using something that requires special
programs or tinkering _is_ the imperfection itself. It's way simpler to
just dismiss it.

mb

unread,

Aug 2, 2006, 3:15:11 PM8/2/06

to

Nigel Greenwood wrote:
> Peter T. Daniels wrote:
...

> > Yet mb's Greek sample a few posts up came through perfectly.
> >
> > I could even read a line of Arabic a few days ago!
>
> I use Google Groups, FWIW. My impression is that if you generate some
> text in any script, GG will convert it to Unicode & transmit it
> correctly, as long as you select "English/Automatic" in the language
> box. The problems seem to start when you select some other language.
>
> I, too, got empty rectangles in the polytonic Greek passages, but was
> able to read them by pasting them into Word & selecting a suitable font
> (eg one of the following:

...

Even then something often goes wrong. Anyway, I don't see why I should
go to even a fraction of that trouble I am not editing a page for
philologic printing or some such thing.
Subtitle: Why make it simple when one can complicate it?

> Re "Greeklish": there are various informal versions, some more
> phonetic than others, used in newsgroups, mplogk (blogs), etc. But
> there is also a strict version used by HR-Net, which takes a bit of
> getting used to. Eg:

....

I wouldn't recommend Greeklish to anyone who is not familiar with the
modern language. The HR-Net version is very convenient for people who
type blind on the standard Greek keyboard on a daily basis.

For anyone who isn't dealing with the mdoern language, there is a >
2,000-y-old tradition of transliterating into Latin characters: The
Romans reasonably chose to ignore the codepage geeks of their times.

elagabalus

unread,

Aug 2, 2006, 4:58:51 PM8/2/06

to

"mb" <azyt...@hotmail.com> wrote in message
news:1154511525.8...@p79g2000cwp.googlegroups.com...

elagabalus wrote:
>> As I don't succeed in pasting valid Greek characters in the posts I send
>> to this newsgroup, I decided to utilize the Polytonic Greek keyboard. But
>> when I
>> get to write to alt.just.testing for a trial, I can just see a series of
>> squares.
>> I tried with Arial Unicode MS and Times New Roman fonts.

>Seeing that you easily use Unicode, why not make life simpler?
>No polytonic keyboard works well with all. I tried everything and went
>bananas. Some classical NGs waste a large part of their time discussing
>code pages and charsets but continue sending messages that look full of
>stupid little squares to most people.

>Now, there is no real need for a polytonic keyboard, and the usual
>monotonic (modern) Greek keyboard doesn't have that kind of problems.

>People who can read Greek won't have any problem reading without the
>accessory marks: if they can read, they know the rules of prosody and
>the placement of signs. To a minority who have learned Modern Greek
>only, whatever is understandable in polytonic is just as understandable
>in monotonic. So, almost no drawbacks and the big advantage of easy
>reading.

I tested the Monotonic Greek Keyboard and this time it works; I agree that
renouncing the use of signs is not a big problem, on the contrary it is an
advantage to me, as I have studied them such a long time ago that I forgot
their function. But look at what I write below.

>Also, transliterating to Latin characters according to the classic
>rules works just fine anyway. In native Greek NGs, most correspondence
>is in Latin transliteration (not classical, but by keyboard position)
>to make sure everyone without exception can read, no matter the
>computer and the system.

I guess I am going to adopt this solution which I hold the best:

polla d' ho g' en pontôi pathen algea hon kata thumon,
arnumenos hên te psuchên kai noston hetairôn

>Below monotonic, Latin and Modern keyboard samples to test
>understandability:

>1. Ανδρα μοι έννεπε, Μούσα, πολύτροπον,
>ος μάλα πολλά
>πλάγχθη, επεί Τροίης ιερόν πτολίεθρον
>έπερσεν '

>2. Ándra moi énnepe, Moûsa, polýtropon, hos mála pollá
>plánchthê, epeí Troíês hierón ptolíethron épersen
>pollôn d'anthrôpôn....

>No problemo.

>3. Andra moi énnepe, Moûsa, polútropon, 'os mála pollá
>plágx0h, epeí Troíhs 'ierón ptolíe0ron épersen '
>pollwn d'an0rwpwn íden ástea kai nóon égnw,

>Not too hard either.

I saved your post to quickly have at reach a transliterational model because
the one I presented before was copied from Perseus Digital Library.

elagabalus

unread,

Aug 2, 2006, 5:01:23 PM8/2/06

to

"Jukka K. Korpela" <jkor...@cs.tut.fi> wrote in message
news:DlYzg.718$XC5...@reader1.news.jippii.net...

First, I was pleasantly surprised by the number of answers got: you all were
kind, some even spent their time to watch my tests on alt.just.testing, so I
want to thank you and the others in the circumstance.

Your suggestion to utilize a transliteration seems very good and I am going
to follow this line from now on.

Also your advice to use a web page looks valid and I will try it if
necessary.

Thanks again

mb

unread,

Aug 2, 2006, 7:44:51 PM8/2/06

to

elagabalus wrote:
...

> I guess I am going to adopt this solution which I hold the best:
>
> polla d' ho g' en pontôi pathen algea hon kata thumon,
> arnumenos hên te psuchên kai noston hetairôn

...

> I saved your post to quickly have at reach a transliterational model because
> the one I presented before was copied from Perseus Digital Library.

The system used by Perseus is certainly understood by all. Some prefer
to add a stress marking with an acute or an apostrophe (without which
other prosodic inferences also become hard to guess), but this is not
essential.

Geoff

unread,

Aug 2, 2006, 9:26:44 PM8/2/06

to

mb wrote:
> Andreas Prilop wrote:
> ...
>> You cannot write Greek characters with "charset=iso-8859-1".
>> (Well, we might take the micro sign as Greek mu.)
>> You need to choose either Greek ISO-8859-7 or Unicode UTF-8 as
>> encoding.
>>
>> Greek capital letters:

>> ء آ أ ؤ إ ئ ا ب ة ت ث ج ح خ د ذ ر س ش ص ض × ط ظ
>>
>> Greek small letters:
>> ل â م ن ه و ç è é ê ë ى ي î ï ً ٌ ٍَ ô ُ ِ ÷ ّ ù

>
> Produces only gobbledygook.
> Bring something that everyone can read or stop tinkering.
>

Perfectly clear to me. Problem is at your end.

mb

unread,

Aug 2, 2006, 9:59:25 PM8/2/06

to

Geoff wrote:
> mb wrote:
> > Andreas Prilop wrote:
> > ...
> >> You cannot write Greek characters with "charset=iso-8859-1".
> >> (Well, we might take the micro sign as Greek mu.)
> >> You need to choose either Greek ISO-8859-7 or Unicode UTF-8 as
> >> encoding.
> >>
> >> Greek capital letters:

> >> Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ó Ô Õ Ö × Ø Ù
> >>
> >> Greek small letters:

> >> á â ã ä å æ ç è é ê ë ì í î ï ð ñ óò ô õ ö ÷ ø ù

> >
> > Produces only gobbledygook.
> > Bring something that everyone can read or stop tinkering.
> >
> Perfectly clear to me. Problem is at your end.

Sure. That's why the word "everyone" was used. Please use BTK-W-130046
crr471.57 or equivalents to translate it in plain English.

Geoff

unread,

Aug 2, 2006, 10:22:24 PM8/2/06

to

mb wrote:

<<snip>>

>> Perfectly clear to me. Problem is at your end.
>
> Sure. That's why the word "everyone" was used. Please use BTK-W-130046
> crr471.57 or equivalents to translate it in plain English.
>

I don't know what those letter and numbers mean, but basically -- why
should I bother? My reader (Thunderbird) displays and posts everything
fine without touching a setting: Chinese, Korean, Tibetan, Arabic,
Greek, you name it. Seamless and transparent.

Paul J Kriha

unread,

Aug 3, 2006, 3:08:11 AM8/3/06

to

Geoff <grw...@yahoo.com> wrote in message news:ohcAg.85$aO2...@fe07.lga...

Even when it is not displayed correctly like it is not in my
Outlook Express window and you know(or guess) what charset
was used by the writer you can select View, More, Greek
and bob's your uncle.

pjk

Jim Heckman

unread,

Aug 3, 2006, 4:06:03 AM8/3/06

to

On 2-Aug-2006, "mb" <azyt...@hotmail.com>
wrote in message <1154511525.8...@p79g2000cwp.googlegroups.com>:

[...]

> Below monotonic, Latin and Modern keyboard samples to test
> understandability:

[...] [can't easily do utf-8 Greek with my setup]

> 2. Ándra moi énnepe, Moûsa, polýtropon, hos mála pollá
> plánchthê, epeí Troíês hierón ptolíethron épersen
> pollôn d'anthrôpôn....
>
> No problemo.
>
> 3. Andra moi énnepe, Moûsa, polútropon, 'os mála pollá
> plágx0h, epeí Troíhs 'ierón ptolíe0ron épersen '
> pollwn d'an0rwpwn íden ástea kai nóon égnw,
>
> Not too hard either.

How are xi (14th letter) and psi (23rd) transliterated in this
latter, Modern scheme?

--
Jim Heckman

Jim Heckman

unread,

Aug 3, 2006, 4:16:07 AM8/3/06

to

On 3-Aug-2006, "Jim Heckman" <rot13_r...@none.invalid>
wrote in message <12d3bjo...@corp.supernews.com>:

Never mind. I just saw Nigel's post that they're <j> and <c>,
respectively.

--
Jim Heckman

mb

unread,

Aug 3, 2006, 4:45:53 AM8/3/06

to

Jim Heckman wrote:
...

> > How are xi (14th letter) and psi (23rd) transliterated in this
> > latter, Modern scheme?
>
> Never mind. I just saw Nigel's post that they're <j> and <c>,
> respectively.

That's not the most popular convention; it's preferred by people who
type a lot on Greek keyboards because it corresponds exactly to the
keys. There is no real set convention for psi and ksi. Some Greeklish
lists have a thousand flowers blooming re idiosyncrasy. There is also a
school that goes for maximum graphical adaptation, with p for rho, c
for middle sigma, = for ksi, v for nu and n for pi. But they all read
each other.

Ruud Harmsen

unread,

Aug 3, 2006, 4:48:20 AM8/3/06

to

3 Aug 2006 01:45:53 -0700: "mb" <azyt...@hotmail.com>: in sci.lang:

>
>Jim Heckman wrote:
>...
>> > How are xi (14th letter) and psi (23rd) transliterated in this
>> > latter, Modern scheme?
>>
>> Never mind. I just saw Nigel's post that they're <j> and <c>,
>> respectively.
>
>That's not the most popular convention; it's preferred by people who
>type a lot on Greek keyboards because it corresponds exactly to the
>keys.

On the Windows Greek keyboard c is indeed psi, but u = theta and y =
upsilon. And w is final sigma, q = ; (question mark). v = omega.

--
Ruud Harmsen - http://rudhar.com

mb

unread,

Aug 3, 2006, 5:08:09 AM8/3/06

to

Right, but the other letters are more or less well established (0eta,
upsilon or ypsilon, wmega) while psi and ksi are orphans.

Nigel Greenwood

unread,

Aug 3, 2006, 7:05:02 AM8/3/06

to

mb wrote:

> There is no real set convention for psi and ksi. Some Greeklish
> lists have a thousand flowers blooming re idiosyncrasy. There is also a
> school that goes for maximum graphical adaptation, with p for rho, c
> for middle sigma, = for ksi, v for nu and n for pi.

navenisthmiov for university? Well, why not? Andra moi evvene ...

There's an online Greeklish<>Greek converter at:

http://tinyurl.com/mttaw

Unfortunately it doesn't like graphical equivalents: it seems to prefer
phonetic Greeklish.

Peter T. Daniels

unread,

Aug 3, 2006, 8:36:55 AM8/3/06

to

Tibetan? You get emails in Tibetan?

Geoff

unread,

Aug 3, 2006, 10:11:03 AM8/3/06

to

Not whole e-mails, but Unicode Tibetan shows up OK on the rare occasions
I need to paste in an example of something. Tibetan "30 letters" follow:

ཀ ཁ ག ང
ཅ ཆ ཇ ཉ
ཏ ཐ ད ན
པ ཕ བ མ
ཙ ཚ ཛ ཝ
ཞ ཟ འ ཡ
ར ལ ཤ ས
ཧ ཨ

Andreas Prilop

unread,

Aug 3, 2006, 10:59:56 AM8/3/06

to

On 2 Aug 2006, Peter T. Daniels wrote:

> Yet mb's Greek sample a few posts up came through perfectly.
> I could even read a line of Arabic a few days ago!

Google Groups *guesses* the encoding instead of reading and
recognizing the "charset" header of a messages. When the bytes
in a message could be interpreted as legal UTF-8, then
Google Groups treats it as Unicode UTF-8.

Let's see:
This message has "charset=ISO-8859-1" and contains some
extended Latin-1 characters:

ا ب ت ث ج ح خ د ذ ر ز س ش ص ض ط ظ

I wonder what Google Groups makes out of it.

mb

unread,

Aug 3, 2006, 11:07:56 AM8/3/06

to

Andreas Prilop wrote:

> Let's see:
> This message has "charset=ISO-8859-1" and contains some
> extended Latin-1 characters:
>
> ا ب ت ث ج ح خ د ذ ر ز س ش ص ض ط ظ
>
> I wonder what Google Groups makes out of it.

Comes through, clear and loud.
Never happened with polytonic Greek, not even when sent by the
polytonic encoding gurus themselves.

Nigel Greenwood

unread,

Aug 3, 2006, 1:58:46 PM8/3/06

to

As I mentioned, the polytonic Gk was correctly encoded by GG, but
couldn't be displayed because the (default?) font didn't include the
characters. There's nothing mystical about all this! This is
presumably why people use SAMPA etc rather than IPA on this NG.

BTW I regularly see little rectangles on Wikipedia, which must also be
due to the default font they use.

mb

unread,

Aug 3, 2006, 6:16:08 PM8/3/06

to

Nigel Greenwood wrote:

> As I mentioned, the polytonic Gk was correctly encoded by GG, but
> couldn't be displayed because the (default?) font didn't include the
> characters. There's nothing mystical about all this! This is
> presumably why people use SAMPA etc rather than IPA on this NG.

Right, that's exactly the point. It just means that if one is writing
with the purpose of being read by all, considering the different
systems in use the smallest common denominator is ANSI text, a very
widely shared standard includes Unicode sendings limitedly to certain
keyboards, and eveything beyond that limits the readership to those who
have the right system and don't mind the aggravation to install extra
software and the right fonts.

Peter T. Daniels

unread,

Aug 3, 2006, 11:00:55 PM8/3/06

to

It's the citation forms of the first 17 Arabic letters, in order!

I hope that's what you intended it to be.

Peter T. Daniels

unread,

Aug 3, 2006, 11:03:13 PM8/3/06

to

Nope, 30 boxes. For some reason there's no Arial Unicode on this
machine, but there is Lucida Sans Unicode -- but apparently not a
complete set.

Jukka K. Korpela

unread,

Aug 4, 2006, 3:38:22 AM8/4/06

to

Peter T. Daniels <gram...@verizon.net> scripsit:

> For some reason there's no Arial Unicode on this
> machine,

You probably mean "Arial Unicode MS" - that's the font name (a trademark).
It's no wonder that you haven't got it, since it's distributed only as part
of Microsoft Office software. (It used to be available separately for free
on the Web, but that was years ago.)

> but there is Lucida Sans Unicode

That's normal, since that font is part of several flavors of Windows
(practically speaking all Windows versions that are still in use, I guess).

> -- but apparently not a
> complete set.

No single font contains all Unicode characters. "Unicode fonts" are fonts
that use Unicode coding (Unicode code numbers for characters), and they
should not be expected to cover all of the about 100,000 characters in
Unicode. Besides, there are fonts under the same name but with different
character repertoires - different versions of "the same font".

Thus, even if we assume that all software is Unicode-capable (which surely
isn't true), we cannot just use characters and expect everyone to see them.

As demonstrated in this thread, there are rather nasty problems in Google
Groups, independently of any font issues. Andreas Prilop posted a simple
test, containing simple Greek letters as used in modern Greek, using the
ISO-8859-7 encoding. Yet if you view it via Google Groups, you will see a
mixture of ISO Latin 1 characters. When people send followups using Google
Groups, further confusion arises. I'm not particularly worried about people
who use Google Groups as their regular newsreader (that's a poor choice and
has its negative implications anyway), but I'm worried about Google Groups
as an archive that people use - for good reasons. When our messages have
expired from most news servers, typically in a few weeks, they will continue
their life indefinitely on Google Groups. This means that non-ASCII
characters will often appear as transmogrified into something else.

Thus, there are strong reasons to stick to US-ASCII on Usenet. If you intend
to use anything else, consider the implications. Sometimes it might be
feasible to use both a transliteration and the original spelling (e.g., with
the latter in parentheses after the transliteration, for the benefit of
those who can see it correctly). We can probably be optimistic about getting
the US-ASCII characters kept as they are even if Google Groups messes up all
the rest.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

Andreas Prilop

unread,

Aug 4, 2006, 8:52:55 AM8/4/06

to

On 3 Aug 2006, Peter T. Daniels wrote:

>> This message has "charset=ISO-8859-1" and contains some
>> extended Latin-1 characters:

>> I wonder what Google Groups makes out of it.
>
> It's the citation forms of the first 17 Arabic letters, in order!
> I hope that's what you intended it to be.

No! As I had wrote, the message was declared as ISO-8859-1,
which contains only *Latin* characters.
Here's a description: http://www.cs.tut.fi/~jkorpela/latin1/
There are no Arabic letters in it.

Here are my special Latin-1 characters again:

Harlan Messinger

unread,

Aug 4, 2006, 9:05:12 AM8/4/06

to

Nope. When I read Andreas' message, which Thunderbird reports to be
ISO-8859-1, the characters appeared as a mess of Danish capital slash-O,
asterisk, umlaut, guillement, paragraph, plus-minus, etc., characters.
Then, when I read your response, they turned into the Arabic
presentation forms. Google Groups is second-guessing.

I'm sending this as UTF-8. Let's see what the following looks like--to
me, it looks like the same characters I saw in Andreas' message. Your
copy is above, and in this response still looks Arabic to me.

Ø§ Ø¨ Øª Ø« Ø¬ Ø Ø® Ø¯ Ø° Ø± Ø² Ø³ Ø´ Øµ Ø¶ Ø· Ø¸

Harlan Messinger

unread,

Aug 4, 2006, 9:10:23 AM8/4/06

to

Harlan Messinger wrote:
>
> Nope. When I read Andreas' message, which Thunderbird reports to be
> ISO-8859-1, the characters appeared as a mess of Danish capital slash-O,
> asterisk, umlaut, guillement, paragraph, plus-minus, etc., characters.
> Then, when I read your response, they turned into the Arabic
> presentation forms. Google Groups is second-guessing.
>
> I'm sending this as UTF-8. Let's see what the following looks like--to
> me, it looks like the same characters I saw in Andreas' message. Your
> copy is above, and in this response still looks Arabic to me.
>

> ا ب ت ث ج ح خ د ذ ر ز س ش ص ض ط ظ

Interesting: Go to Andreas' original message with the characters in
Google Groups and click on the "show options" link. Then click the "Show
original" link. On the original, the characters show as I originally saw
them. And the messages' headers appear, confirming that Google received
the ISO-8859-1 header.

Peter T. Daniels

unread,

Aug 4, 2006, 11:06:09 AM8/4/06

to

That's too much work, but in Andreas's reply to me and in your reply to
Andreas, I see garbage (not boxes), but in your message I'm replying
to, I see the Arabic letters again.

Peter T. Daniels

unread,

Aug 4, 2006, 11:10:59 AM8/4/06

to

Jukka K. Korpela wrote:
> Peter T. Daniels <gram...@verizon.net> scripsit:
>
> > For some reason there's no Arial Unicode on this
> > machine,
>
> You probably mean "Arial Unicode MS" - that's the font name (a trademark).
> It's no wonder that you haven't got it, since it's distributed only as part
> of Microsoft Office software. (It used to be available separately for free
> on the Web, but that was years ago.)

How arrogant can you be? How can you assume that the computer placed in
my office by my employer for the purpose of editing and typesetting
doesn't contain a complete set of Office2003? (Most of which, of
course, I have no use for whatsoever.)

> > but there is Lucida Sans Unicode
>
> That's normal, since that font is part of several flavors of Windows
> (practically speaking all Windows versions that are still in use, I guess).
>
> > -- but apparently not a
> > complete set.
>
> No single font contains all Unicode characters. "Unicode fonts" are fonts
> that use Unicode coding (Unicode code numbers for characters), and they
> should not be expected to cover all of the about 100,000 characters in
> Unicode. Besides, there are fonts under the same name but with different
> character repertoires - different versions of "the same font".

"Chuck" Bigelow, of Bigelow and Holmes, is a good friend of Bill
Bright's, and they've kept me informed over the years on the progress
of Lucida. It does supposedly include all Unicode characters.

> Thus, even if we assume that all software is Unicode-capable (which surely
> isn't true), we cannot just use characters and expect everyone to see them.
>
> As demonstrated in this thread, there are rather nasty problems in Google
> Groups, independently of any font issues. Andreas Prilop posted a simple
> test, containing simple Greek letters as used in modern Greek, using the
> ISO-8859-7 encoding. Yet if you view it via Google Groups, you will see a
> mixture of ISO Latin 1 characters. When people send followups using Google

Yet when mb did the same, the Greek was just fine.

Andreas Prilop

unread,

Aug 4, 2006, 11:18:22 AM8/4/06

to

On 4 Aug 2006, Peter T. Daniels wrote:

>>> but there is Lucida Sans Unicode
>>

>> No single font contains all Unicode characters. "Unicode fonts" are fonts
>> that use Unicode coding (Unicode code numbers for characters), and they
>> should not be expected to cover all of the about 100,000 characters in
>> Unicode.
>

> "Chuck" Bigelow, of Bigelow and Holmes, is a good friend of Bill
> Bright's, and they've kept me informed over the years on the progress
> of Lucida. It does supposedly include all Unicode characters.

Certainly not! According to Alan Wood
http://www.alanwood.net/unicode/fonts.html#lucidasansunicode
Lucida Sans Unicode version 2.00 contains a mere 1776 glyphs.

Lee Sau Dan

unread,

Aug 4, 2006, 12:12:20 PM8/4/06

to

>>>>> "Harlan" == Harlan Messinger <hmessinger...@comcast.net> writes:

>>> Let's see: This message has "charset=ISO-8859-1" and contains
>>> some extended Latin-1 characters:
>>>
>>> ا ب ت ث ج ح خ د ذ ر ز س ش ص ض ط ظ
>>>
>>> I wonder what Google Groups makes out of it.
>> It's the citation forms of the first 17 Arabic letters, in
>> order! I hope that's what you intended it to be.

Harlan> Nope. When I read Andreas' message, which Thunderbird
Harlan> reports to be ISO-8859-1, the characters appeared as a
Harlan> mess of Danish capital slash-O, asterisk, umlaut,
Harlan> guillement, paragraph, plus-minus, etc., characters.

Me too.

Harlan> Then, when I read your response, they turned into the
Harlan> Arabic presentation forms. Google Groups is
Harlan> second-guessing.

No. PTD's message says "charset=utf-8" in the headers. Andreas's
message says "charset=ISO-8859-1". That's the difference.
Newsreaders simply honour what a message claims its charset to be.

Harlan> I'm sending this as UTF-8. Let's see what the following
Harlan> looks like--to me, it looks like the same characters I saw
Harlan> in Andreas' message. Your copy is above, and in this
Harlan> response still looks Arabic to me.

Many slash-O's interspersed in other garbage-looking things.

--
Lee Sau Dan 李守敦 ~{@nJX6X~}

E-mail: dan...@informatik.uni-freiburg.de
Home page: http://www.informatik.uni-freiburg.de/~danlee

Lee Sau Dan

unread,

Aug 4, 2006, 12:14:12 PM8/4/06

to

>>>>> "Peter" == Peter T Daniels <gram...@verizon.net> writes:

>> Not whole e-mails, but Unicode Tibetan shows up OK on the rare
>> occasions I need to paste in an example of something. Tibetan
>> "30 letters" follow:
>>
>> ཀ ཁ ག ང ཅ ཆ ཇ ཉ ཏ ཐ ད ན པ ཕ བ མ ཙ ཚ ཛ ཝ ཞ ཟ འ ཡ ར ལ ཤ ས ཧ ཨ

Peter> Nope, 30 boxes. For some reason there's no Arial Unicode on
Peter> this machine, but there is Lucida Sans Unicode -- but
Peter> apparently not a complete set.

My newsreader has Tibetan fonts, and shows Tibetan letters to me.

Lee Sau Dan

unread,

Aug 4, 2006, 12:28:20 PM8/4/06

to

>>>>> "Jukka" == Jukka K Korpela <jkor...@cs.tut.fi> writes:

Jukka> No single font contains all Unicode characters.

s/character/glyph/

Jukka> "Unicode fonts" are fonts that use Unicode coding (Unicode
Jukka> code numbers for characters), and they should not be
Jukka> expected to cover all of the about 100,000 characters in
Jukka> Unicode. Besides, there are fonts under the same name but
Jukka> with different character repertoires - different versions
Jukka> of "the same font".

If memory serves, a TrueType font may contain more than one encodings.
Internally, the glyphs are named, and there are tables mapping from
character code to glyph names. One table per encoding. Type1 fonts
have only 1 such table, but it is relatively easy to derive new fonts
with different encodings by sharing the glyphs (plus other
information), substituting just another such table. This is called
"recoding" a Type1 font.

Jukka> Thus, even if we assume that all software is
Jukka> Unicode-capable (which surely isn't true), we cannot just
Jukka> use characters and expect everyone to see them.

You would need a font to contains the complete set of glyphs for all
defined characters. Another approach, which is more common, is to
introduce the notion of "fontsets". A fontset is a set of fonts,
usually covering different (but can be overlapping) regions of the
Unicode coding space. In this way, it is possible to construct a more
complete coverage out of a set of smaller but readily available fonts.
But it is not easy to find a set of fonts which are visually
consistent in design and style.

Jukka> I'm worried about Google Groups as an archive that people
Jukka> use - for good reasons. When our messages have expired from
Jukka> most news servers, typically in a few weeks, they will
Jukka> continue their life indefinitely on Google Groups. This
Jukka> means that non-ASCII characters will often appear as
Jukka> transmogrified into something else.

But Google's interface (harnessing a web browser) is not the only way
to read Google Groups messages. At least, Gnus/Emacs has an 'nnweb'
backend, which allows one to read Google Groups messages inside Gnus.
I haven't used it, though.

Jukka> Thus, there are strong reasons to stick to US-ASCII on
Jukka> Usenet.

It depends on what groups you're reading/posting to. e.g. in de.*
newsgroups, it would be considered preferable to use ISO-8859-1 or
-15, if not UTF-8. For hk.* newsgroups, BIG5 (or even BIG5-HKSCS) is
the norm.

Lee Sau Dan

unread,

Aug 4, 2006, 12:30:35 PM8/4/06

to

>>>>> "Peter" == Peter T Daniels <gram...@verizon.net> writes:

Peter> How arrogant can you be? How can you assume that the
Peter> computer placed in my office by my employer for the purpose
Peter> of editing and typesetting doesn't contain a complete set
Peter> of Office2003? (Most of which, of course, I have no use for
Peter> whatsoever.)

How can you assume that such a computer doesn't contain SuSE +
OpenOffice instead of MS Windows + MS Office?

Harlan Messinger

unread,

Aug 4, 2006, 12:44:33 PM8/4/06

to

Clicking two links is too much effort for the purpose of better
understanding something that's fairly important to communicating on-line
in your field of expertise? Seems rather unfortunate.

Artur Jachacy

unread,

Aug 4, 2006, 1:28:56 PM8/4/06

to

On Fri, 04 Aug 2006 08:10:59 -0700, Peter T. Daniels wrote:
> Jukka K. Korpela wrote:
>> Peter T. Daniels <gram...@verizon.net> scripsit:
>>
>> > For some reason there's no Arial Unicode on this
>> > machine,
>>
>> You probably mean "Arial Unicode MS" - that's the font name (a trademark).
>> It's no wonder that you haven't got it, since it's distributed only as part
>> of Microsoft Office software. (It used to be available separately for free
>> on the Web, but that was years ago.)
>
> How arrogant can you be? How can you assume that the computer placed in
> my office by my employer for the purpose of editing and typesetting
> doesn't contain a complete set of Office2003? (Most of which, of
> course, I have no use for whatsoever.)

The font doesn't come with the standard installation of MSO. You have to
choose custom installation and select appropriate settings ('International
fonts' or something like that).

Artur

--
Washington, Washington
Six-foot-twenty, fuckin' killing for fun

Jukka K. Korpela

unread,

Aug 4, 2006, 4:24:48 PM8/4/06

to

Peter T. Daniels <gram...@verizon.net> scripsit:

> Jukka K. Korpela wrote:
>> Peter T. Daniels <gram...@verizon.net> scripsit:
>>
>>> For some reason there's no Arial Unicode on this
>>> machine,
>>
>> You probably mean "Arial Unicode MS" - that's the font name (a
>> trademark). It's no wonder that you haven't got it, since it's
>> distributed only as part of Microsoft Office software. (It used to
>> be available separately for free on the Web, but that was years ago.)
>
> How arrogant can you be?

Apparently I cannot beat you in that area.

> How can you assume that the computer placed
> in my office by my employer for the purpose of editing and typesetting
> doesn't contain a complete set of Office2003? (Most of which, of
> course, I have no use for whatsoever.)

I was not assuming anything. In particular, I did not assume that you were
using your office computer when posting to Usenet or that you like or do not
like Microsoft Office.

I made just a simple and correct note about Arial Unicode MS not being
universal in people's computers (even if they use Windows). That's what
"It's no wonder - -" says. Surely there are _other_ reasons for lack of
Arial Unicode MS, but they just _support_ the observation that "It's no
wonder - -". (You should probably consult your local IT support, since the
probable explanation of lack of Arial Unicode MS - in cases where the user
hasn't removed it consciously and has Office installed - is some error in
the fonts directory. Perhaps there are other fonts missing, too, and lack of
Arial Unicode MS sounds like a significant loss to anyone who works with a
large number of languages.)

> "Chuck" Bigelow, of Bigelow and Holmes, is a good friend of Bill
> Bright's, and they've kept me informed over the years on the progress
> of Lucida. It does supposedly include all Unicode characters.

"Lucida" is a group name for a collection of rather different fonts rather
than a specific font, though some of such fonts _might_ be distributed under
the simple name "Lucida". I very much doubt the Unicode coverage, which is
probably quite different in different Lucidas at probably at most at the
level of Unicode 2.0 (issued in 1996, i.e. ten years ago, with 38,950
characters); we are now at Unicode 4.1 (issued in 2005, with 97,720
characters).

> Yet when mb did the same, the Greek was just fine.

Maybe; I cannot recall. The point is that when tests show mixed behavior,
depending on minor variation in data, the technology is what we call
"unreliable".

>> Groups, further confusion arises. I'm not particularly worried about
>> people who use Google Groups as their regular newsreader (that's a
>> poor choice and has its negative implications anyway), but I'm
>> worried about Google Groups as an archive that people use - for good
>> reasons. When our messages have expired from most news servers,
>> typically in a few weeks, they will continue their life indefinitely
>> on Google Groups. This means that non-ASCII characters will often
>> appear as transmogrified into something else.
>>
>> Thus, there are strong reasons to stick to US-ASCII on Usenet. If
>> you intend to use anything else, consider the implications.
>> Sometimes it might be feasible to use both a transliteration and the
>> original spelling (e.g., with the latter in parentheses after the
>> transliteration, for the benefit of those who can see it correctly).
>> We can probably be optimistic about getting the US-ASCII characters
>> kept as they are even if Google Groups messes up all the rest.

You still haven't learned to quote on Usenet, have you? You pointlessly
quoted even the part of my text about which you hadn't got anything to say,
even nominally. Maybe you should read something like
http://www.netmeister.org/news/learn2quote.html

Jukka K. Korpela

unread,

Aug 4, 2006, 4:30:26 PM8/4/06

to

Artur Jachacy <arturj...@gmail.com> scripsit:

> The font [Arial Unicode MS] doesn't come with the standard installation of

> MSO.
> You have
> to choose custom installation and select appropriate settings
> ('International fonts' or something like that).

Really? I thought things changed years ago. In the old times, it was common
that default installations of Windows and Office didn't install much of the
available "international support", causing quite some frustration. I thought
that it was mainly to save disk space and that the policy changed.

Jukka K. Korpela

unread,

Aug 4, 2006, 4:40:37 PM8/4/06

to

Lee Sau Dan <dan...@informatik.uni-freiburg.de> scripsit:

>>>>>> "Jukka" == Jukka K Korpela <jkor...@cs.tut.fi> writes:
>
> Jukka> No single font contains all Unicode characters.
>
> s/character/glyph/

No, if you wish to make the expression clearer, you should say "No single
font contains glyphs for all Unicode characters". There is no such thing as
Unicode glyph. And it is quite common and normal to say that font X contains
character Y, instead of the longer expression that font X contains a glyph
for character Y.

> Jukka> Thus, even if we assume that all software is
> Jukka> Unicode-capable (which surely isn't true), we cannot just
> Jukka> use characters and expect everyone to see them.
>
> You would need a font to contains the complete set of glyphs for all
> defined characters. Another approach, which is more common, is to
> introduce the notion of "fontsets".

Of course. That's what many programs do all the time, creating typographical
oddities and even monstrosities, and often (of course) helping us, too.

But even if you take the union of characters covered by the fonts installed
in Joe Q. Public's computer, you will be rather far from complete Unicode
coverage.

> But Google's interface (harnessing a web browser) is not the only way
> to read Google Groups messages.

That's irrelevant to the topic, since the problem is that so many people use
the Google's interface.

> Jukka> Thus, there are strong reasons to stick to US-ASCII on
> Jukka> Usenet.
>
> It depends on what groups you're reading/posting to.

Of course. I thought that was previously mentioned the discussion, so that
we can concentrate on the international groups like the sci.* hierarchy.

Peter T. Daniels

unread,

Aug 4, 2006, 6:08:52 PM8/4/06

to

If I ever have occasion to email someone text in Arabic script, I'll
find out how to do it.

Sherlock Holmes was surprised to be told the earth rotates around the
sun.

Peter T. Daniels

unread,

Aug 4, 2006, 6:11:06 PM8/4/06

to

I don't know who Alan Wood may be, but why would he know more about it
than the designer of Lucida?

Peter T. Daniels

unread,

Aug 4, 2006, 6:12:25 PM8/4/06

to

Lee Sau Dan wrote:
> >>>>> "Peter" == Peter T Daniels <gram...@verizon.net> writes:
>
> Peter> How arrogant can you be? How can you assume that the
> Peter> computer placed in my office by my employer for the purpose
> Peter> of editing and typesetting doesn't contain a complete set
> Peter> of Office2003? (Most of which, of course, I have no use for
> Peter> whatsoever.)
>
> How can you assume that such a computer doesn't contain SuSE +
> OpenOffice instead of MS Windows + MS Office?

Um, because all the components bear all of MS's trademarked graphics
and copyright notices and Agreements and everything else?

Peter T. Daniels

unread,

Aug 4, 2006, 6:14:35 PM8/4/06

to

I have previously reported here that when I tried to install CJK
capability, it asked me to Insert Distribution Disk 2, and they
couldn't find where they'd put the Distribution Disks for safekeeping.

Peter T. Daniels

unread,

Aug 4, 2006, 6:25:30 PM8/4/06

to

Jukka K. Korpela wrote:
> Peter T. Daniels <gram...@verizon.net> scripsit:
>
> > Jukka K. Korpela wrote:
> >> Peter T. Daniels <gram...@verizon.net> scripsit:
> >>
> >>> For some reason there's no Arial Unicode on this
> >>> machine,
> >>
> >> You probably mean "Arial Unicode MS" - that's the font name (a
> >> trademark). It's no wonder that you haven't got it, since it's
> >> distributed only as part of Microsoft Office software. (It used to
> >> be available separately for free on the Web, but that was years ago.)
> >
> > How arrogant can you be?
>
> Apparently I cannot beat you in that area.
>
> > How can you assume that the computer placed
> > in my office by my employer for the purpose of editing and typesetting
> > doesn't contain a complete set of Office2003? (Most of which, of
> > course, I have no use for whatsoever.)
>
> I was not assuming anything. In particular, I did not assume that you were
> using your office computer when posting to Usenet or that you like or do not
> like Microsoft Office.

Then your command of English must be far, far poorer than you, or we,
have thought.

You wrote, "It's no wonder that you haven't got it, since it's
distributed only as part of Microsoft Office software." That is
precisely an assertion that the reason I don't have it is that I do not
have Microsoft Office software.

> I made just a simple and correct note about Arial Unicode MS not being
> universal in people's computers (even if they use Windows).

You made no such "note." It may be what you intended to say, but it's
not what you said.

> That's what
> "It's no wonder - -" says. Surely there are _other_ reasons for lack of
> Arial Unicode MS, but they just _support_ the observation that "It's no
> wonder - -". (You should probably consult your local IT support, since the
> probable explanation of lack of Arial Unicode MS - in cases where the user
> hasn't removed it consciously and has Office installed - is some error in
> the fonts directory. Perhaps there are other fonts missing, too, and lack of
> Arial Unicode MS sounds like a significant loss to anyone who works with a
> large number of languages.)

We work with Syriac, Hebrew, and occasionally Arabic. I've recommended
starting a series in Chinese philology and linguistics, but I don't
know that it's been greeted with much enthusiasm.

Curiously, we have fonts for 8 scripts of South Asia (all the standards
except Oriya and Sinhala), plus quite a few for Thai, so one of these
days I'm going to try to activate the systems for them. Maybe it'll
want Distribution Disks (it didn't for Arabic), maybe it won't.

> > "Chuck" Bigelow, of Bigelow and Holmes, is a good friend of Bill
> > Bright's, and they've kept me informed over the years on the progress
> > of Lucida. It does supposedly include all Unicode characters.
>
> "Lucida" is a group name for a collection of rather different fonts rather
> than a specific font, though some of such fonts _might_ be distributed under
> the simple name "Lucida". I very much doubt the Unicode coverage, which is
> probably quite different in different Lucidas at probably at most at the
> level of Unicode 2.0 (issued in 1996, i.e. ten years ago, with 38,950
> characters); we are now at Unicode 4.1 (issued in 2005, with 97,720
> characters).
>
> > Yet when mb did the same, the Greek was just fine.
>
> Maybe; I cannot recall. The point is that when tests show mixed behavior,
> depending on minor variation in data, the technology is what we call
> "unreliable".

If you didn't snip so much, it wouldn't be a question of "recall."

> You still haven't learned to quote on Usenet, have you? You pointlessly
> quoted even the part of my text about which you hadn't got anything to say,
> even nominally. Maybe you should read something like
> http://www.netmeister.org/news/learn2quote.html

In google groups, which I have been using for only a few weeks now, so
your "still" is highly inappropriate, one is not aware of how much one
is quoting, since it is displayed only upon request. I therefore am not
prompted to delete all your excess verbiage. If I were seeing it in
front of me, I would of course have removed most of it -- but nothing
that was potentially relevant to my reply.

Why do you enjoy having continually to be reminded of how much
misconstrual you do because of your excessive snippage of context?

me

unread,

Aug 4, 2006, 10:54:29 PM8/4/06

to

Jukka K. Korpela wrote:
> Lee Sau Dan <dan...@informatik.uni-freiburg.de> scripsit:
>
>>>>>>> "Jukka" == Jukka K Korpela <jkor...@cs.tut.fi> writes:
>>
>> Jukka> No single font contains all Unicode characters.
>>
>> s/character/glyph/
>
> No, if you wish to make the expression clearer, you should say "No single
> font contains glyphs for all Unicode characters".

"No single typeface contains glyphs for all Unicode characters." Your
sentence conveys the impression that some glyphs can be displayed only in
some point sizes.

Paul J Kriha

unread,

Aug 5, 2006, 2:24:32 AM8/5/06

to

Lee Sau Dan <dan...@informatik.uni-freiburg.de> wrote in message
news:87lkq4c...@informatik.uni-freiburg.de...

Quite obviously each of these 17 pairs (slashed-O plus something)
was _originally_ a 16-bit character. They are interleaved with spaces.
Since all 17 chars belonged to the same charset the slashed-O
never changes.

The 17 Arabic letters farther back are result of interpretting
the 17 pairs of 8-bit as 17 UTF-8 16-bit chars.

I am posting this as UTF-8 so it shouldn't get further corrupted
and all people with UTF-8 capability should see it the same way
as I do now.

pjk

Jukka K. Korpela

unread,

Aug 5, 2006, 3:47:07 AM8/5/06

to

me <nor...@noreply.net> scripsit:

>> No, if you wish to make the expression clearer, you should say "No
>> single font contains glyphs for all Unicode characters".
>
> "No single typeface contains glyphs for all Unicode characters."

The distinction between "typeface" and "font" is made by some typographers
and other people, but most people don't make such a distinction and use the
word "font" to cover both a more concrete concept (a font in a particular
size) and a more abstract concept (a collection of essentially similar fonts
in different sizes). I have no strong feelings about this, and I use words
the way I expect readers to understand best.

Even if the font vs. typeface distinction is made, words still have varying
meanings. For example, "font" would still refer both to visible
presentations of characters (something you are probably looking right now on
your screen) and to computer software that produces such presentations.

> Your
> sentence conveys the impression that some glyphs can be displayed
> only in some point sizes.

Only if you expect me (and people in general) to make the typeface vs. font
distinction when writing on a forum like this. Even if you think the
distinction _should_ be made, it would be unrealistic to think that people
generally _make_ it.

(Of course, the statement "some glyphs can be displayed only in some point
sizes" as such _is_ true. There is some lower limit for useable glyph size,
even if we assume high-quality display or printing and excellent eyesight.
And the limit is different for different characters, like the letter "o"
versus some complex Chinese character.)

Jukka K. Korpela

unread,

Aug 5, 2006, 3:56:59 AM8/5/06

to

Peter T. Daniels <gram...@verizon.net> scripsit:

> I don't know who Alan Wood may be,

That's an informative statement, in this context.

> but why would he know more about it
> than the designer of Lucida?

Perhaps because he checked the facts?

Actually, I don't think "the designer of Lucida" is misinformed in this
matter; rather, if you have got some information from such a direction, the
information apparently got somehow misunderstood in an essential way.

You haven't told what you mean by "Lucida" (which is a collective name, not
a specific font [or typeface]), but assuming you mean "Lucida Sans Unicode",
you can (if you're using Windows) open the Fonts folder and right-click the
icon for the font and select "Properties", and you'll see (under "Features")
a statement that says it contains 1776 glyphs. That's assuming it's version
2.0, and it most probably is. If it's a newer, extended version, then let's
hope it'll be on the market soon.

Jukka K. Korpela

unread,

Aug 5, 2006, 4:12:13 AM8/5/06

to

Peter T. Daniels <gram...@verizon.net> scripsit:

> You wrote, "It's no wonder that you haven't got it, since it's

> distributed only as part of Microsoft Office software." That is
> precisely an assertion that the reason I don't have it is that I do
> not have Microsoft Office software.

I cannot be held responsible for your jumping into conclusions and getting
pissed off by something that you think I meant (and that should not piss off
anybody even if it were what I meant). Does your parse engine really
associate "since - -" with "you haven't got it" and not the entire "It's no
wonder - -"? The funniest thing is that you called me arrogant apparently
because you though I had written that you haven't got Office, yet you seem
to have no particular appreciation of Office.

> If you didn't snip so much, it wouldn't be a question of "recall."

I was referring to your reference to some previous experiment by a third
person, and I just wrote that it's irrelevant. (I would have had to scan
through previous messages - even your excessive quoting does not quote
_everything_ in the thread, you know - and I had no reason to do so.) Now
you are making a noise thereof, quoting a lot of lines that have nothing to
do with the issue.

> In google groups, which I have been using for only a few weeks now,

You have quoted far too much long before that, and apparently you will now
make things much worse by using Google Groups for posting. Using inferior
methods is no excuse for the harm you cause.

You seem to have no understanding of the purposes of quoting on Usenet, or
in general - or you simply ignore your understanding because you are too
lazy to trim quotations. Either way, it's getting more and more difficult to
make any sense of what you write - to pick up the eventual pearls from the
piles of garbage (massive quoting, personal attacks).

Harlan Messinger

unread,

Aug 5, 2006, 7:04:55 AM8/5/06

to

Peter T. Daniels wrote:
> Sherlock Holmes was surprised to be told the earth rotates around the
> sun.

It *revolves* around the sun. It *rotates* around its own axis.

Harlan Messinger

unread,

Aug 5, 2006, 8:02:21 AM8/5/06

to

mb wrote:
> Andreas Prilop wrote:

>> On 2 Aug 2006, mb wrote:
>>
>>> Organization: http://groups.google.com
>>> Content-Type: text/plain; charset="iso-8859-1"
>>> User-Agent: G2/0.2
>>>
>>> Produces only gobbledygook.
>>> Bring something that everyone can read or stop tinkering.
>> *You* need a newsreader instead of Google's broken Usenet interface.
>> Note that "User-Agent: G2/0.2" is a very preliminary beta
>> version 0.2.
>>
>> My original message had "charset=ISO-8859-7".
>> Google's broken Usenet interface ignores this.
>> If *you* rely on Google Groups, you have to live with its
>> imperfections.
>
> When there is an obvious, effective and uncomplicated alternative,
> anyone who complicates life by using something that requires special
> programs

That's like calling a Web browser a "special" program. They aren't
"special" programs, they are programs designed to access Usenet the way
it's supposed to be accessed, as was defined long before Google Groups
came along and broke things.

> or tinkering _is_ the imperfection itself. It's way simpler to
> just dismiss it.

You sound like the suit salesman in the joke related at
http://www.hamiltonlabs.com/suits.htm.

Harlan Messinger

unread,

Aug 5, 2006, 8:07:26 AM8/5/06

to

Peter T. Daniels wrote:
> Jukka K. Korpela wrote:
>>

>> As demonstrated in this thread, there are rather nasty problems in Google
>> Groups, independently of any font issues. Andreas Prilop posted a simple
>> test, containing simple Greek letters as used in modern Greek, using the
>> ISO-8859-7 encoding. Yet if you view it via Google Groups, you will see a
>> mixture of ISO Latin 1 characters. When people send followups using Google
>
> Yet when mb did the same, the Greek was just fine.

Sometimes Google guesses correctly, sometimes it doesn't. Just as
sometimes Microsoft Word is sometimes correct when you don't manage to
let go of the shift key before typing the second letter in a sentence
and it helps you by lower-casing that character automatically, and
sometimes it's wrong.

Peter T. Daniels

unread,

Aug 5, 2006, 2:05:17 PM8/5/06

to

I went to the wrong syllable for my mnemonic! revOlve Orbit, rotAte
Axis (vs. rOtate Orbit)

Peter T. Daniels

unread,

Aug 5, 2006, 2:12:47 PM8/5/06

to

Jukka K. Korpela wrote:
> Peter T. Daniels <gram...@verizon.net> scripsit:
>
> > I don't know who Alan Wood may be,
>
> That's an informative statement, in this context.
>
> > but why would he know more about it
> > than the designer of Lucida?
>
> Perhaps because he checked the facts?
>
> Actually, I don't think "the designer of Lucida" is misinformed in this
> matter; rather, if you have got some information from such a direction, the
> information apparently got somehow misunderstood in an essential way.
>
> You haven't told what you mean by "Lucida" (which is a collective name, not

Lucida is a typeface (or font) designed by Chuck Bigelow for Microsoft
that is intended to include all the Unicode characters.

> a specific font [or typeface]), but assuming you mean "Lucida Sans Unicode",

I think there are 17 Lucidas in my Fonts folder.

> you can (if you're using Windows) open the Fonts folder and right-click the
> icon for the font and select "Properties", and you'll see (under "Features")

Nope. There's no "Features" label anywhere under "Properties."

> a statement that says it contains 1776 glyphs. That's assuming it's version

Nope, the only figures found are "Size" and "Size on Disk."

> 2.0, and it most probably is. If it's a newer, extended version, then let's
> hope it'll be on the market soon.

It is 2.0.

Wiktor S.

unread,

Aug 5, 2006, 2:45:22 PM8/5/06

to

>>> Sherlock Holmes was surprised to be told the earth rotates around
>>> the sun.
>>
>> It *revolves* around the sun. It *rotates* around its own axis.
>
> I went to the wrong syllable for my mnemonic! revOlve Orbit, rotAte
> Axis (vs. rOtate Orbit)

A mnemonic that may go wrong is a bad mnemonic :-)

--
Azarien

Artur Jachacy

unread,

Aug 5, 2006, 2:58:54 PM8/5/06

to

On Sat, 05 Aug 2006 11:12:47 -0700, Peter T. Daniels wrote:
>> a statement that says it contains 1776 glyphs. That's assuming it's version
>
> Nope, the only figures found are "Size" and "Size on Disk."

In the Programs menu, go to Accessories > System Tools > Character Map,
choose Lucida Sans Unicode from the drop-down list and see what's there.

mb

unread,

Aug 5, 2006, 3:03:24 PM8/5/06

to

Harlan Messinger wrote:
> mb wrote:

> > When there is an obvious, effective and uncomplicated alternative,
> > anyone who complicates life by using something that requires special
> > programs
>
> That's like calling a Web browser a "special" program.

Not Web browsers. Newsreaders.

> They aren't
> "special" programs, they are programs designed to access Usenet the way
> it's supposed to be accessed, as was defined long before Google Groups
> came along and broke things.

As if all text was sent in the correct way. As if everyone had the same
Unicode reading capacity. Also, if it all worked as well as they say
why are the geeks discussing it no end, and never getting to a simple
and clear way of working it for everyone? In fact, why are we still
having all this discussion here?

> > or tinkering _is_ the imperfection itself. It's way simpler to
> > just dismiss it.
>
> You sound like the suit salesman in the joke related at
> http://www.hamiltonlabs.com/suits.htm.

Your analogies. I'd say that here I'm the customer who sees the
half-baked offer and just walks out. I'm not missing much of the deep
wisdom of some inconsiderate writer on Usenet.

Most importantly, there is nothing more ridiculously unnecessary in the
world than polytonic Greek. We're talking some signs introduced late
for the profit of second language learning, now totally abandoned, that
are only rarely useful in documenting some features of etymology based
on old text, which can be easily replaced by ANSI signs in those very
rare cases where they belong in the discussion.

So keep your suit, I'm not buying.

Brian M. Scott

unread,

Aug 5, 2006, 3:10:31 PM8/5/06

to

On Sat, 05 Aug 2006 08:07:26 -0400, Harlan Messinger
<hmessinger...@comcast.net> wrote in
<news:4jjg0gF...@individual.net> in sci.lang:

[...]

> Just as sometimes Microsoft Word is sometimes correct
> when you don't manage to let go of the shift key before
> typing the second letter in a sentence and it helps you
> by lower-casing that character automatically, and
> sometimes it's wrong.

Assuming that you haven't turned off auto-correction.
That's one of the first things that I do.

Brian

mb

unread,

Aug 5, 2006, 5:08:24 PM8/5/06

to

Harlan Messinger wrote:
> Peter T. Daniels wrote:

...

> > Yet when mb did the same, the Greek was just fine.
>
> Sometimes Google guesses correctly, sometimes it doesn't. Just as
> sometimes Microsoft Word is sometimes correct when you don't manage to
> let go of the shift key before typing the second letter in a sentence
> and it helps you by lower-casing that character automatically, and
> sometimes it's wrong.

The "sometimes" in this case is definable:
When sending is not set to some particular "language"", it is in
Unicode.
When reading, limitedly to Greek, there is no problem with the standard
monotonic. Anything not belonging to it will be marked unknown.
Also, even this is way beyond the smallest common denominator of NG
users, which is still ANSI.

Peter T. Daniels

unread,

Aug 5, 2006, 9:47:21 PM8/5/06

to

Artur Jachacy wrote:
> On Sat, 05 Aug 2006 11:12:47 -0700, Peter T. Daniels wrote:
> >> a statement that says it contains 1776 glyphs. That's assuming it's version
> >
> > Nope, the only figures found are "Size" and "Size on Disk."
>
> In the Programs menu, go to Accessories > System Tools > Character Map,
> choose Lucida Sans Unicode from the drop-down list and see what's there.

What a useful little window! Unfortunately it doesn't give a character
count. It only has roman, phonetic, cyrillic, greek, and hebrew -- not
even arabic -- and lots and lots and lots and lots of math symbols.

What's the "WST" group? Each one includes what looks like a complete
set of Braille characters (2 x 6 rectangles of dots).

Peter T. Daniels

unread,

Aug 5, 2006, 9:49:17 PM8/5/06

to

mb wrote:

> Most importantly, there is nothing more ridiculously unnecessary in the
> world than polytonic Greek. We're talking some signs introduced late
> for the profit of second language learning, now totally abandoned, that
> are only rarely useful in documenting some features of etymology based
> on old text, which can be easily replaced by ANSI signs in those very
> rare cases where they belong in the discussion.

Except that if you're publishing a Classical text, you need it.

Peter T. Daniels

unread,

Aug 5, 2006, 9:52:58 PM8/5/06

to

You can turn off individual features of "Auto-Correct" and retain the
ones that are useful -- I often type <langauge>, and it's nice to have
the machine fix it before I even notice. And the _italics_ feature is
_very_ handy, since I don't have to reach the extra stretch to the Ctrl
key.

The most annoying one is auto-capping of i.

Brian M. Scott

unread,

Aug 6, 2006, 3:55:29 AM8/6/06

to

On 5 Aug 2006 18:52:58 -0700, "Peter T. Daniels"
<gram...@verizon.net> wrote in
<news:1154829178.5...@75g2000cwc.googlegroups.com>
in sci.lang:

> Brian M. Scott wrote:

[...]

>> Assuming that you haven't turned off auto-correction.
>> That's one of the first things that I do.

> You can turn off individual features of "Auto-Correct" and retain the
> ones that are useful -- I often type <langauge>, and it's nice to have
> the machine fix it before I even notice.

I don't consider *any* of them useful: I want the program to
produce what I've typed. (And my fingers almost always
notice the 'langauge' type of error as I'm making it
anyway.)

> And the _italics_ feature is _very_ handy, since I don't
> have to reach the extra stretch to the Ctrl key.

Nine and sixty ways: I find the ctrl-i toggle much handier
than the underscore.

Brian

mb

unread,

Aug 6, 2006, 4:36:34 AM8/6/06

to

Exactly: Not within the purpose of an NG or listserv. I've needed a
polytonic keyboard only twice since I first installed one in the
Eighties.

Peter T. Daniels

unread,

Aug 6, 2006, 8:16:49 AM8/6/06

to

You didn't say "nothing more ridiculously unnecessary in newsgroups
than polytonic Greek."

mb

unread,

Aug 6, 2006, 2:23:44 PM8/6/06

to

No. Because the general statement is correct: There is absolutely no
need for it in book printing either because it is unnecessary for
reading.

Besides, and independently, the world needs re-editions of the popular
texts just as much as re-recordings of the Four Seasons: Like a hole in
the head.

Harlan Messinger

unread,

Aug 6, 2006, 3:55:46 PM8/6/06

to

Just think of the line "Sit on it and rotate" and you won't go wrong.

Harlan Messinger

unread,

Aug 6, 2006, 4:22:11 PM8/6/06

to

mb wrote:
> Harlan Messinger wrote:
>> mb wrote:
>
>>> When there is an obvious, effective and uncomplicated alternative,
>>> anyone who complicates life by using something that requires special
>>> programs
>> That's like calling a Web browser a "special" program.
>
> Not Web browsers. Newsreaders.
>
>> They aren't
>> "special" programs, they are programs designed to access Usenet the way
>> it's supposed to be accessed, as was defined long before Google Groups
>> came along and broke things.
>
> As if all text was sent in the correct way. As if everyone had the same
> Unicode reading capacity. Also, if it all worked as well as they say
> why are the geeks discussing it no end,

Because it is an inherently complicated discipline. Brain surgery is
complicated too. Should a doctor just simplify things by sending his
patients who need brain treatment to the guy who drills holes in
people's skulls to cure them?

Paul J Kriha

unread,

Aug 6, 2006, 11:09:44 AM8/6/06

to

Wiktor S. <wswik...@Mpoczta.fm> wrote in message news:eb2p04$2bc$1...@news.onet.pl...

> Azarien

It needs a second level mnemonic. :-)
The original mnemonic involves two words, therefore
always compare the second syllable.

pjk

mb

unread,

Aug 7, 2006, 6:16:04 PM8/7/06

to

Harlan Messinger wrote:

> > As if all text was sent in the correct way. As if everyone had the same
> > Unicode reading capacity. Also, if it all worked as well as they say
> > why are the geeks discussing it no end,
>
> Because it is an inherently complicated discipline. Brain surgery is
> complicated too. Should a doctor just simplify things by sending his
> patients who need brain treatment to the guy who drills holes in
> people's skulls to cure them?

Correct. One difference is that in medical research, patients are not
forced to submit to investigational methods: Those who are not
volunteering as investigational subjects continue to receive
established therapy until the investigational one proves its
superiority. The other difference is that we need polytonic Gk in NG
correspondence like a hole in the head.