Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[Fwd: Re: Experiments with classical Greek keyboard input]

20 views
Skip to first unread message

Simos Xenitellis

unread,
Feb 3, 2006, 10:50:20 AM2/3/06
to

Dear All,
This e-mail appears not to have made it to the list (Kostas is probably
not subscribed to the list), therefore I forward it as there are some
interesting information here.

Simos

-------- Forwarded Message --------
From: Πιστιόλης Κωνσταντίνος <pistiolis στο ts τελεία sch τελεία gr>
To: Simos Xenitellis <simos74 στο gmx τελεία net>,
linux...@nl.linux.org
Subject: Re: Experiments with classical Greek keyboard input
Date: Tue, 31 Jan 2006 22:11:05 +0200

Την Mon, 30 Jan 2006 19:05:26 +0000,ο(η) Simos Xenitellis
<sim...@gmx.net> έγραψε/wrote:

> O/H Jan Willem Stumpel έγραψε:
>> Simos Xenitellis wrote:
>>
>>
>>> You can have a look at this document,
>>> http://planet.hellug.gr/misc/polytonic/ Although it is in Greek, it
>>> should be feasible to discern the combinations proposed. For example,
>>> "Νεκρό πλήκτρο" is "Dead key" in the list. If there are queries, feel
>>> free to refer to me.
>>>
>>
>> Very interesting. Is this a proposal, or has it been implemented?
>> According to Babelfish, you say "Your distribution of Linux that
>> has been published after October 2005 should include the renewed system
>> that we describe here." Mine does not, but I don't trust the Babelfish
>> translation..
>>
> The referenced document is indeed a proposal.
> You are correct about October 2005. Several distributions were released
> in October (Ubuntu, OpenSUSE) so the plan was to have the changes
> upstream by the end of the summer so that they move to the new
> distributions as they appear.
> However, this plan did not work out and we still did not submit these
> changes.
> Konstantinos Pistiolis is working on this subject.
>> As far as I can see, it would not be difficult to implement it. Nothing
>> would have to be changed in the binaries, only in the xkb and Compose
>> files.
>>
>> I noticed you only want to use 'two level' keys (normal and shift), not
>> using AltGr. Is this some kind of standard? (e.g. Greek national
>> standard, or some other kind of standard)? The present pc/gr file in xkb
>> uses 'three level' keys.
>>
> As far as I know there is no national standard for Greek polytonic.
> Windows XP support Greek polytonic,
> however, there is an inherent disadvantage that you cannot stuck more
> than one dead key; due to this
> quite a lot of keys have to be used as dead keys. In addition, if a
> character accepts more than one diacritic,
> then you need three dead keys to cover all the cases (diacritic A,
> diacritic B, diacritic A+B).
If it could be any, it is the old typewriter's standard (computers were not
used for text proccessing at the time polytonic was removed from modern
greek),
but it didn't cover the full polytonic because it didn't have vareia
(grave),
makron, and vrahy. It was rather used for modern greek than ancient greek.
This keymap defines a dead key for every combination, and is more or less
followed by the windows XP, using up to 16 or more dead keys!

However, the proposed keymap uses the same principles and only needs 9
dead keys
>
> Regarding the usage of AltGr. There have been quite a few discussions on
> whether to use or not. I do not have the full details at my disposal.
> Kostas, would you like to chip in for this?
the accents, dead iota and the breathing marks shouldn't use it:
1. most of the dead keys are too often used to be put in third
level (except for makron, vrahy). Each symbol is aproximately
used in 1 every 3-5 words!
2. the altGr chooser was not used in the old typewriter's standard.
In fact, all symbols (except vareia=grave) have a position in
the old typewriter's standard which is preserved in the proposed keymap.

About makron and vrahy, I have proposed putting them in ] and } and not as
an
altGr combination, as the openning [ and { are already occupied
as dead keys (~ and iota subscript in accordance to the typewriter
"standard").
The concept is that it wouldn't be bad to lose the closing brace, if
the openning brace is lost too, and it would save the altGr+dead_key
combinations for future use (see below).


The other symbols (ancient greek numbers) are also needed in modern
(monotonic) greek, and could be added either as altGr combinations,
or composed with dead acute, or even in both ways. eg:
altGr + sigma : numeric stigma
or
dead tonos + sigma : numeric stigma
I don't know if the latter odd combination would produce conflicts in
an international Compose file, but this idea was used in the past in
greek keyboard, in the following combinations:
dead_tonos + . : above (middle) dot
dead_tonos + < : «
dead_tonos + > : »
I believe that the Compose should actually be a part of the keymap;
not the locale. Dead keys are very good sticky third level choosers, for
languages that use them.
The present pc/gr file uses altgr for the euro symbol, the middle dot
and the «» symbols, along with the Compose combinations and I suggest
the same (duality) for all new symbols

Another idea is to use the same kind of rules to increase the usability
of the polytonic keyboard for writing tenchical texts:
To have a double press of a dead_key and the altGr + dead_key
to produce the "lost" symbol so that the user wouldn't have to
switch keyboards to write the []{}'"and/ symbols. For example:
key [ is proposed to be pesispomeni (~), so
dead_[ + dead_[ : [
and/or
altGr + dead_[ : [
Again, the first combination could result in conflicts, in an
international compose, so probably is is only applicable in
the personal .Compose files of the users that need them.

>> BTW I suppose when you say that tonos/oxia is on the ; key, you mean the
>> key which is ; on US keyboards, not the key which is ; on Greek
>> keyboards?
>>
> Indeed, ; it is the physical key according to the US keyboard.
> The proposal document does not include a specific dead key to produce
> oxia. In the Windows XP layout there is such a dead key,
> in an uncomfortable location however, for those end-users who would like
> to use it.

Another proposed use of altGr is for the dead acute.
ELLOT, the Hellenic Standard Organization has proposed and defined
different symbols for acute and tonos (which is actually the same symbol)
which are equivalent in unicode. However, the use of the dublicate
accented letters is stongly discouraged, so the keymap would normally
produce the letters with tonos.
The combination altGr-dead_tonos + vowel is proposed to produce the
letter with accent, in case someone needs it.
The default symbol produced by the keymap will be the dead_acute
and for the polytonic extension I have used the dead_doubleacute
(has no reasonable meaning though, but so has the distintion between
acute and tonos)

>>
>>> The "Compose" file should be broken in smaller files per script
>>> rather than having a big monolithic file.
>>>
>>
>> What advantage would this bring? If we have many small pieces of the
>> Compose file, how is the user (or the system) supposed to decide when to
>> use which piece? Wouldn't this create another configuration problem?
>>
It could become a part of the keymap. As I said, a dead key makes an
excelent third level chooser which is sticky. Yet, it seems that xkb
is not meant to work like that.
> The configuration mechanism of Xorg would shield the end-user from this
> complexity. I am referring to the needs of the developers.
> For example, suppose a lesser known language wants to make an
> installable package that adds writing support. The way this could be
> done is by dropping (adding) the appropriate files in the appropriate
> directory. Otherwise, there would be need to patch the monolithic file.
> In addition, the Polytonic section in the Compose file is suitable to be
> auto-generated from a script as the multiple diacritics on vowels bring
> up
> combinations.
>> UTF-8 allows using one system for all languages and scripts, without
>> changing locales. There is only one, IMHO unavoidable, but small,
>> disadvantage: some files (like fonts, and the Compose file) tend to
>> become rather big. But memory and disk space are not as expensive as
>> they used to be. And the user does not notice anything of this. She just
>> thinks: wow! I can input any language anywhere, at any time!
>>
> As I mention above, the splitting of the files would be an advantage for
> the developers.
> The end-user would only see a GUI configuration tool. No setxkbmap or
> editing of xorg.conf.
>>> There is increasing interest in updating this area of Xorg
>>> (http://community.livejournal.com/xkbconfig/) and I hope it gets done
>>> soon.
>>>
>>
>> Hmm.. "xkb" and "Compose" are two completely different mechanisms. One
>> is input to the other. People often complain about xkb being
>> 'mysterious' or 'arcane'. Since xfree86 4.3 and x.org came around, it
>> isn't anymore. It just lacks user-level documentation. Recently, thanks
>> to this list, I have come close enough to enlightenment to attempt a
>> user-level description on my utf-8 page, sections 6.1 and 6.2
>> (http://www.jw-stumpel.nl/stestu).
I have a question. It is mentioned that it's a bug to use dead_horn and
dead_ogonek
and that "combining comma above" 0x0313 and "combining reversed comma
above"
0x0314 should be used instead. Wouldn't it be best to ask for a
dead_commaabove (or dead_psili) and a dead_reversedcommaabove (dead_daseia)
to be added to the xkb binaries?
When the polytonic variant was first created, it was thought that
it doesn't matter which dead_XXX symbol would be used. Is this true?
>>
> Thanks for this.
> We need to put effort so that gswitchit (Keyboard Indicator applet in
> GNOME) gets more and more advanced and ubiquitous.
> The plan is for gswitchit to be used for KDE as well.
> This is the proper direction so end-users are happy that their settings
> just work.
>
> Simos
>

--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/


Alexandros Diamantidis

unread,
Feb 4, 2006, 6:09:26 PM2/4/06
to
I'm sorry for not contributing more to this thread or to the work needed
to be done to solve the issues discussed here... But here are some
thoughts in response to Kostas' message:

From: Πιστιόλης Κωνσταντίνος <pistiolis στο ts τελεία sch τελεία gr>
> This keymap defines a dead key for every combination, and is more or less
> followed by the windows XP, using up to 16 or more dead keys!

You know, there really should be a way to create a keyboard layout on
X11 compatible with the Windows XP / typewriter one. Is this currently
possible? To do this, either many more "generic" dead keys are needed,
or a way to have a single keypress produce many keysyms, for use in a
compose sequence.

For reference, here's the Windows XP way to produce polytonic Greek
characters:

http://support.microsoft.com/default.aspx?scid=kb;el;GR750052

According to the table there, the dead keys used are [ ] - = | \ / ; '
combined with Shift, Alt, and AltGr. In total, 27 different "virtual"
dead keys... Not an easy system to learn, but I think anyone
who's learned it, should be able to keep using it under X11.

Is it possible to implement this with the current xkb plus simple
Compose-file infrastructure? Or is it only possible with complex
input method software?

> 1. most of the dead keys are too often used to be put in third
> level (except for makron, vrahy). Each symbol is aproximately
> used in 1 every 3-5 words!

Right! By the way, I've typed a bit more polytonic Greek recently, and
the layout currently included in XFree86/X.org worked nicely for me (who
isn't used to any other polytonic Greek layout).

> I don't know if the latter odd combination would produce conflicts in
> an international Compose file, but this idea was used in the past in
> greek keyboard, in the following combinations:
> dead_tonos + . : above (middle) dot
> dead_tonos + < : «
> dead_tonos + > : »

I don't think there are any conflicts, and these combinations are very
nice from a usability point of view: you don't have to memorize obscure
AltGr combinations, just to remember that puting an accent on a
character that doesn't take one produces a "special" (less common)
character that looks similart. The three combinations listed above were
also used in some old MS-DOS keyboard drivers.

> The present pc/gr file uses altgr for the euro symbol, the middle dot
> and the «» symbols, along with the Compose combinations and I suggest
> the same (duality) for all new symbols
>
> Another idea is to use the same kind of rules to increase the usability
> of the polytonic keyboard for writing tenchical texts:
> To have a double press of a dead_key and the altGr + dead_key
> to produce the "lost" symbol so that the user wouldn't have to

...

I agree with this.

> Another proposed use of altGr is for the dead acute.
> ELLOT, the Hellenic Standard Organization has proposed and defined
> different symbols for acute and tonos (which is actually the same symbol)
> which are equivalent in unicode.

That was a mistake... My opinion is that having different glyphs for
OXIA and TONOS in fonts is a bug. Upright and slanted oxia don't have
any meaningful distinction in Greek, they're just graphic variants. Some
fonts are designed with a modern look, where oxia looks like a bullet or
an equilateral triangle. These fonts can only be used for modern Greek.
Other fonts are designed more traditionally, with a slanted oxia. Putting
glyphs with upright oxia in these fonts looks, IMHO, ugly, and I think
was only motivated by font creators looking at Unicode, seeing
characters both with "OXIA" and with "TONOS" in their names, and naïvely
deciding to differentiate their appearance, without noticing that they
are equivalent according to Unicode, and without a justification in
representing actual Greek text.

By the way, there is a case where font designers have almost universally
drawn two canonically equivalent Unicode characters differently, and
that's U+00B7 MIDDLE DOT (·) and U+0387 GREEK ANO TELEIA (·). Here
they're next to each other: ··

Most fonts have different glyphs for them, because the usual appearance
of middle dot looks wrong as an _ano teleia_. So... in this case there
is some justification. But the correct way to solve this according to
the Unicode model is with higher-level protocols and smart fonts. For
example, with modern smart fonts (OpenType etc.), it's possible to have
both U+00B7 and U+0387 assume their correct shape and position depending
on their surrounding characters.

> The combination altGr-dead_tonos + vowel is proposed to produce the
> letter with accent, in case someone needs it.

Well... it probably won't hurt much, except in perpetuating the idea
that tonos/accent and oxia/accute are different. And also systems
which do their own keysym processing (i.e. GTK+) will have to add
some more illogical combinations...

> I have a question. It is mentioned that it's a bug to use dead_horn
> and dead_ogonek and that "combining comma above" 0x0313 and
> "combining reversed comma above" 0x0314 should be used instead.
> Wouldn't it be best to ask for a dead_commaabove (or dead_psili) and a
> dead_reversedcommaabove (dead_daseia) to be added to the xkb binaries?

Well... I think it *is* a bug that we're using dead_horn and dead_ogonek
for this, but using U0313 and U0314 will also be a bug. According to
appendix A of the X11 protocol spec, pointed out by Markus Kuhn a few
messages upthread...

ftp://ftp.x.org/pub/X11R7.0/doc/PDF/proto.pdf

# Dead keys, which place an accent on the next character entered,
# shall be encoded as Function KEYSYMs, and not as the Unicode KEYSYM
# corresponding to an equivalent combining character.

So... dead_commaabove and dead_reversedcommaabove is the way to go.

> When the polytonic variant was first created, it was thought that
> it doesn't matter which dead_XXX symbol would be used. Is this true?

You can blame me for this... That's what I thought, because I was
creating a simple proof-of-concept, not a correct solution. But as the
saying goes, there's nothing more permanent than the temporary. Well, it
works for me, but there are some problems, and it's not the Right Thing™.

I apologise for the long message, and for not offering anything
concrete... This discussion should probably take place in a X11 mailing
list or other forum if it's to have any worthwhile fruit. Or maybe
someone should post a proposal at http://bugs.freedesktop.org/, as
Markus said. Only I'm not sure what the proposal should be.

--
Alexandros Diamantidis * ad...@hellug.gr

Πιστιόλης Κωνσταντίνος

unread,
Feb 4, 2006, 7:41:11 PM2/4/06
to
>
> You know, there really should be a way to create a keyboard layout on
> X11 compatible with the Windows XP / typewriter one. Is this currently
> possible? To do this, either many more "generic" dead keys are needed,
> or a way to have a single keypress produce many keysyms, for use in a
> compose sequence.
>
> For reference, here's the Windows XP way to produce polytonic Greek
> characters:
>
> http://support.microsoft.com/default.aspx?scid=kb;el;GR750052
>
> According to the table there, the dead keys used are [ ] - = | \ / ; '
> combined with Shift, Alt, and AltGr. In total, 27 different "virtual"
> dead keys... Not an easy system to learn, but I think anyone
> who's learned it, should be able to keep using it under X11.
>
> Is it possible to implement this with the current xkb plus simple
> Compose-file infrastructure? Or is it only possible with complex
> input method software?
>
I thought of this too, but I don't see an easy way to do this with xkb.
Anyway, the idea of using combinations of dead keys instead of a dead
key for every mark combination was used before in macintosh and as
long as the single symbol dead keys have the same position with the
old keymap... perhaps it is enough for now.
It is propably better to implement this legacy keyboard map with
some complex input method at a later time, instead of messing up xkb now.

> ...


>> I don't know if the latter odd combination would produce conflicts in
>> an international Compose file, but this idea was used in the past in
>> greek keyboard, in the following combinations:
>> dead_tonos + . : above (middle) dot
>> dead_tonos + < : «
>> dead_tonos + > : »
>
> I don't think there are any conflicts, and these combinations are very
> nice from a usability point of view: you don't have to memorize obscure
> AltGr combinations, just to remember that puting an accent on a
> character that doesn't take one produces a "special" (less common)
> character that looks similart. The three combinations listed above were
> also used in some old MS-DOS keyboard drivers.
>

yes, it is a very good idea, but in an international compose file
it would be a conflict if greek keymap wanted to use:
dead_acute + . : above (middle) dot
and some other language's keymap uses:
dead_acute + . : <degree symbol>

The dead_XXX definitions are accessible for all languages
(and this is correct). The correct way to do this would be to have xkb
defining a different Compose file for every keymap

>> ...


>> Another idea is to use the same kind of rules to increase the usability
>> of the polytonic keyboard for writing tenchical texts:
>> To have a double press of a dead_key and the altGr + dead_key
>> to produce the "lost" symbol so that the user wouldn't have to
> ...
>
> I agree with this.

But:
1. it could cause the same kind of conflicts as mentioned above
2. in the proposed keymap dead_horn is placed in ' so we want the rule
dead_horn dead_horn : '\''
But if someone creates a new keymap with dead_horn placed in ]
we won't be able to add a new rule.
This will work for only one keymap messing up all the (future) others
(if we ever need any)


>
>> Another proposed use of altGr is for the dead acute.
>> ELLOT, the Hellenic Standard Organization has proposed and defined
>> different symbols for acute and tonos (which is actually the same
>> symbol)
>> which are equivalent in unicode.
>
> That was a mistake... My opinion is that having different glyphs for
> OXIA and TONOS in fonts is a bug. Upright and slanted oxia don't have

> ...


> are equivalent according to Unicode, and without a justification in
> representing actual Greek text.

> ...


> is some justification. But the correct way to solve this according to
> the Unicode model is with higher-level protocols and smart fonts. For
> example, with modern smart fonts (OpenType etc.), it's possible to have
> both U+00B7 and U+0387 assume their correct shape and position depending
> on their surrounding characters.
>

I agree


>> The combination altGr-dead_tonos + vowel is proposed to produce the
>> letter with accent, in case someone needs it.
>
> Well... it probably won't hurt much, except in perpetuating the idea
> that tonos/accent and oxia/accute are different. And also systems
> which do their own keysym processing (i.e. GTK+) will have to add
> some more illogical combinations...

I could hurt because many people will prefer to use it, in order to
avoid this bug of the fonts. (and this will cause a lot of trouble
when mixed up with monotonic greek of a linux with hellenic locale)
This is why I propose altGr-dead_acute, so that the combination
will be hard, forcing people not to use it.

Unfortunately this is necessary, because a lot of polytonic greek
texts are encoded like that. If you want to search text with
google you will have to use this accent.
Look at google search results. Searching for:
ἀνθρώπου (with tonos) yields 584 results and
ἀνθρώπου (with polytonic set's acute) yields 21.400 results!
(I think that this happens because most texts are converted
from older 8bit encodings)
This is a google bug (?) too, because text searching should be
insensitive to unicode-equivalent characters, but this is
the reality, so we must produce these characters too.

Kostas

Jan Willem Stumpel

unread,
Feb 6, 2006, 3:58:13 PM2/6/06
to
Imitating the difficult-to-learn Windows system for 'multiple
diacriticals' should IMHO be offered as an option, but not as the only
option. The ease with which diacriticals can be combined by means of
xkb/Compose could be a 'Linux selling point' in the academic world.

BTW I am now terribly confused about he tonos/oxia issue.

-- "Tonos and oxia are considered equivalent in Unicode" - but why,
then, are there different code points for them (U+1FFD, and all
the letters "with oxia", vs. U+0384 and all the letters "with
tonos")? Where does it actually say that they are equivalent?

-- Many (maybe most) font creators made different glyphs for oxia
and tonos (although others did not, see the Gentium font), because
they were "looking at unicode". But, surely, that was the correct
place to look?

-- Kostas calls it "a bug of the fonts". If there is a bug, isn't it
in the Unicode standard ?

I hope there is a way to put the genie back into the bottle. Just making
the keyboard entry for oxia "hard, forcing people not to use it" does
not seem to be the right way.

Regards, Jan

Simos Xenitellis

unread,
Feb 6, 2006, 4:47:41 PM2/6/06
to
On Mon, 2006-02-06 at 21:58 +0100, Jan Willem Stumpel wrote:
> Imitating the difficult-to-learn Windows system for 'multiple
> diacriticals' should IMHO be offered as an option, but not as the only

I am not sure what complexities the Windows keyboard layout has that
make it difficult to re-implement as an extra layout in Xorg. My
understanding is that sets too many dead keys, as there is a limitation
of "stacking" dead keys together.

> option. The ease with which diacriticals can be combined by means of
> xkb/Compose could be a 'Linux selling point' in the academic world.
>
> BTW I am now terribly confused about he tonos/oxia issue.
>
> -- "Tonos and oxia are considered equivalent in Unicode" - but why,
> then, are there different code points for them (U+1FFD, and all
> the letters "with oxia", vs. U+0384 and all the letters "with
> tonos")? Where does it actually say that they are equivalent?

It at
http://www.unicode.org/charts/PDF/U1F00.pdf

For example, see 1F71, Greek Small Letter Alpha with Oxia.
The three horizontal bars show equivalence between glyphs.
It shows that 1F71 == 03AC.

It is common to have these equivalences; compatible software should take
care of these equivalences for the end-users and fold glyphs to their
initial equivalences.

> -- Many (maybe most) font creators made different glyphs for oxia
> and tonos (although others did not, see the Gentium font), because
> they were "looking at unicode". But, surely, that was the correct
> place to look?

Unicode does not dictate how fonts should look. See the Fonts section at
http://www.unicode.org/charts/PDF/U1F00.pdf
The selected font was merely a font donated for this purpose.

> -- Kostas calls it "a bug of the fonts". If there is a bug, isn't it
> in the Unicode standard ?

I am not sure about the background of this; I think it has to do with
different "schools of thought" on how original documents looked like.

> I hope there is a way to put the genie back into the bottle. Just making
> the keyboard entry for oxia "hard, forcing people not to use it" does
> not seem to be the right way.

The choice is between
1. do not provide an option for people to type 1F71 and other vowels
with oxia. (current situation)
2. provide such a choice to type vowels with oxia.

The preference is to move to Choice 2, so that if a user wants this
option, he has the freedom of choice to do so.
Giving equivalent exposure to both oxia and tonos can create a mess with
documents. That's why oxia should be somewhere far away, not on a nearby
dead key.

Google does not normalise yet texts so that these equivalent glyphs are
treated the same.

Simos

Πιστιόλης Κωνσταντίνος

unread,
Feb 6, 2006, 6:35:40 PM2/6/06
to
Την Mon, 06 Feb 2006 21:58:13 +0100,ο(η) Jan Willem Stumpel
<jstu...@planet.nl> έγραψε/wrote:

> Imitating the difficult-to-learn Windows system for 'multiple
> diacriticals' should IMHO be offered as an option, but not as the only

> option. The ease with which diacriticals can be combined by means of
> xkb/Compose could be a 'Linux selling point' in the academic world.
>
> BTW I am now terribly confused about he tonos/oxia issue.
>
> -- "Tonos and oxia are considered equivalent in Unicode" - but why,
> then, are there different code points for them (U+1FFD, and all
> the letters "with oxia", vs. U+0384 and all the letters "with
> tonos")? Where does it actually say that they are equivalent?
>

In ancient greek and modern "katharevousa" (a formal archaic greek)
there were three accents. (I don't know the english names)
Perispomeni (~), oxia (acute) and grave (`), which were all together named
with the word 'tonos' (accents)

Yet, in modern greek practically noone was actually distinguishing between
acute and grave, so the accents used was oxia and perispomeni.

The next step was to deprecate all these accent marks and use only one
simpe accent, for the words that have multiple syllabes.
This was called 'monotonic greek'.
That simple accent was simply called "tonos" (accent) and actually
was the acute. Still typographically there was no prefference about
the slope of tonos (/ \ or |) and modern "monotonic" greek fonts
may use a | glyph, or a dot above
This glyph may be good for monotonic greek, but it is completely
unsuitable for ancient or polytonic greek, so in the meantime
font designers were making different glyphs and were using
different character codes for each case.

This is a very stupid distinction, because there is no such
difference between tonos and oxia (acute), and no such symbol as
a "vertical line above" or a "dot above" in greek;

The issue was finally resolved by greek government, which declared
that tonos is actually the acute (oxia).
But this has become TOO LATE, because EL.O.T. (the Hellenic Standarization
Organization) had allready proposed different characters to the
unicode consortium.
After that, many people who were using polytonic greek (out of Greece)
had allready converted their texts from the original 8bit encodings
to unicode using the new characters with 'oxia'
This faq describes the story.
http://www.unicode.org/faq/greek.html
and for more info http://ptolemy.tlg.uci.edu/~opoudjis/unicode/unicode.html

The difference between 'oxia' and 'tonos' and the problems related
to that is mentionned in more detail here:
http://ptolemy.tlg.uci.edu/~opoudjis/unicode/unicode_gkbkgd.html#oxia

> -- Many (maybe most) font creators made different glyphs for oxia
> and tonos (although others did not, see the Gentium font), because
> they were "looking at unicode". But, surely, that was the correct
> place to look?

Well there is no other way for modern greek. Neither can be a distinction
between tonos and oxia, nor we may have two different keycodes for
the same character. Imagine what will happen if a Greek user uses
polytonic keyboard to enter a filename.
It's just a matter of fonts. If someone wants to write monotonic greek is
free to use any font he/she likes. But for polytonic greek he/she has
to use a polytonic font (which must define correctly the polytonic glyphs)
Font designers claim the opposite; that the user should keep oxia and
tonos combinations distinct, but this is incorrect according unicode
and, as I said, is extremely dangerous when mixed with modern greek.

Then again, the actual reason is that unicode cannocinal equivalence is not
correctly implemented neither by applications nor by fonts.
According to unicode, a proccess must not treat equivalent characters
differently, nor assume that some other proccess does.
Even more, a text may be automatically normalized at any time (without
the user or any other program knowing that) by the system or a intermediate
proccess, having some characters decomposed or replaced by their
canonical equivalents.


>
> -- Kostas calls it "a bug of the fonts". If there is a bug, isn't it

> in the Unicode standard ?

As Simos said, this is rather a way of thinking than a bug. Unicode has
not altered existing encodings. It has included them all and defined the
relationships and the equivalences for future use.
The problem is that most applications do not yet implement these rules.
And since people are still treating equivalent characters as not equal,
some font designers decide to do so too.

When it comes to Greek there is another reason. Usually a font implements
the basic symbols first (with tonos) in the monotonic way, so later
they just add polytonic accents.


>
> I hope there is a way to put the genie back into the bottle. Just making
> the keyboard entry for oxia "hard, forcing people not to use it" does
> not seem to be the right way.

The correct way is the maturity of unicode:
When all the texts are beeing normalized, all programs will become aware
of character equivalence, and smart fonts will be used to decide which
glyph suits best for every case.

In the meantime, some font designers use this workaround to improve
the displaying of their fonts, thus making the problem persistant

I hope it helped,

Rich Felker

unread,
Feb 6, 2006, 9:27:14 PM2/6/06
to
On Tue, Feb 07, 2006 at 01:35:40AM +0200, ????????? ???????????? wrote:
> >-- Many (maybe most) font creators made different glyphs for oxia
> > and tonos (although others did not, see the Gentium font), because
> > they were "looking at unicode". But, surely, that was the correct
> > place to look?
> Well there is no other way for modern greek. Neither can be a distinction
> between tonos and oxia, nor we may have two different keycodes for
> the same character. Imagine what will happen if a Greek user uses
> polytonic keyboard to enter a filename.
> It's just a matter of fonts. If someone wants to write monotonic greek is
> free to use any font he/she likes. But for polytonic greek he/she has
> to use a polytonic font (which must define correctly the polytonic glyphs)
> Font designers claim the opposite; that the user should keep oxia and
> tonos combinations distinct, but this is incorrect according unicode
> and, as I said, is extremely dangerous when mixed with modern greek.
>
> Then again, the actual reason is that unicode cannocinal equivalence is not
> correctly implemented neither by applications nor by fonts.

Another way of looking at it is that the Unicode people are stuck in
the world of Windows and word processors and can't see past it.
Clearly something like the filesystem that deals with (essentially,
aside from \0 and /) arbitrary binary byte sequences cannot be
expected to, and should not, make Unicode canonical equivalence
substitutions.

If you're actually worried about people using the 'bad' character
choices in filenames, a better solution would probably be to advise
people making fonts to have these characters represented by the
replacement character glyph, so that only the applications which
understand canonical equivalences would display anything reasonable at
all. That would be a good discouragement against their use. :)

[Here I'm talking about people making terminal fonts, gui interface
element fonts, etc., not fonts for wordprocessing/print use which we
don't really have much influence over.]

However, this issue does get much more hairy with other canonical
equivalence issues like combining/precombined forms, canonical
ordering of combining characters, etc. I don't know any way to address
it except asking users not to be stupid. Somehow I expect the ones who
will be _typing_ filenames will be savvy enough to stick to sane
filename choices, and the rest will just select files from a
Qt/GTK/whatever dialog box.

It's important to remember that this is really nothing new with
Unicode. It's always been possible to make nasty filenames that look
equivalent but which are not, for instance embedding terminal escape
sequences in filenames...

> According to unicode, a proccess must not treat equivalent characters
> differently, nor assume that some other proccess does.

This requirement is vague and inherently impossible to satisfy if you
use broad enough concepts of 'a process'. For example, is it illegal
for strlen to return different numbers on strings that have the same
canonical representation, but which are a different number of bytes?
:)

> Even more, a text may be automatically normalized at any time (without
> the user or any other program knowing that) by the system or a intermediate
> proccess, having some characters decomposed or replaced by their
> canonical equivalents.

Yes, lovely. A binary-clean text editor or hex editor that processes
the text as UTF-8 (or any other unicode encoding) can trash the binary
file at any time. Just lovely. Moreover, guidelines like this are
encouraging implementors of UTF-8 text editors to make broken
non-binary-clean implementations, and discouraging anyone who wants a
binary-clean system from considering UTF-8.

Gross design mistakes like this, and the Windows/16bit-centricness of
the Unicode spec, have me largely convinced that UCS (ISO-10646) is
the standard we should follow for basic character handling under *nix,
rather than Unicode, and that Unicode should just be used as a guide
for supplemental functionality (such as case folding, collation, etc.)
in applications that need such features.

> >I hope there is a way to put the genie back into the bottle. Just making
> >the keyboard entry for oxia "hard, forcing people not to use it" does
> >not seem to be the right way.
> The correct way is the maturity of unicode:
> When all the texts are beeing normalized, all programs will become aware
> of character equivalence, and smart fonts will be used to decide which
> glyph suits best for every case.

Normalization at display time to select a glyph image is a very good
idea. Normalization of the actual stored data is a horrible mistake.

> In the meantime, some font designers use this workaround to improve
> the displaying of their fonts, thus making the problem persistant

:(

Rich

Jan Willem Stumpel

unread,
Feb 10, 2006, 6:06:07 AM2/10/06
to
Πιστιόλης Κωνσταντίνος wrote:
> Την Mon, 06 Feb 2006 21:58:13 +0100,ο(η) Jan Willem Stumpel
> <jstu...@planet.nl> έγραψε/wrote:

> In ancient greek and modern "katharevousa" (a formal archaic greek)
> there were three accents. [..]

Thanks very much for this explanation. I put a digest of it on my
‘user-level’ utf-8 page.

Regards, Jan

Πιστιόλης Κωνσταντίνος

unread,
Feb 10, 2006, 1:28:21 PM2/10/06
to
Την Fri, 10 Feb 2006 12:06:07 +0100,ο(η) Jan Willem Stumpel
<jstu...@planet.nl> έγραψε/wrote:

> Πιστιόλης Κωνσταντίνος wrote:
>> Την Mon, 06 Feb 2006 21:58:13 +0100,ο(η) Jan Willem Stumpel
>> <jstu...@planet.nl> έγραψε/wrote:
>

>> In ancient greek and modern "katharevousa" (a formal archaic greek)

>> there were three accents. [..]
>
> Thanks very much for this explanation. I put a digest of it on my
> ‘user-level’ utf-8 page.

In that page you propose:
...A font which includes all accent combinations for Classical Greek is,
for instance, FreeSerif. The efont bitmap fonts (for xterm) also have
them...

Which may or may not be valid depending which symbol your keymap produces
for acute (oxia or tonos). FreeSerif has a different symbol for 'tonos'
and 'oxia' and ancient greek is propably not viewed correctly if someone
types using the gr(polytonic) keymap with el_GR.UTF-8 locale

Check http://ptolemy.tlg.uci.edu/~opoudjis/unicode/unicode_gkbkgd.html#oxia
to see which fonts define different symbols

Jan Willem Stumpel

unread,
Feb 10, 2006, 2:14:16 PM2/10/06
to
Πιστιόλης Κωνσταντίνος wrote:
>
> In that page you propose:
> ...A font which includes all accent combinations for Classical Greek is,
> for instance, FreeSerif. The efont bitmap fonts (for xterm) also have
> them...
>
> Which may or may not be valid depending which symbol your keymap produces
> for acute (oxia or tonos). FreeSerif has a different symbol for 'tonos'
> and 'oxia' and ancient greek is probably not viewed correctly if someone

> types using the gr(polytonic) keymap with el_GR.UTF-8 locale
>
You are right of course. But this (I am sorry) is in the 'keyboard
input' section of my page, which I have not updated yet, and I am still
not quite sure what it should say. Should there, or should there not, be
input methods for both 'oxia' and 'tonos', given that they are
'officially' the same? I mean, what should be the advice to the classicists?

My request for comment was, so far, only on the new 'font' section of
the document, section 4.5.

Regards, Jan

Πιστιόλης Κωνσταντίνος

unread,
Feb 10, 2006, 6:32:27 PM2/10/06
to
Την Fri, 10 Feb 2006 20:14:16 +0100,ο(η) Jan Willem Stumpel
<jstu...@planet.nl> έγραψε/wrote:

> Πιστιόλης Κωνσταντίνος wrote:
>>
>> In that page you propose:
>> ...A font which includes all accent combinations for Classical Greek is,
>> for instance, FreeSerif. The efont bitmap fonts (for xterm) also have
>> them...
>>
>> Which may or may not be valid depending which symbol your keymap
>> produces
>> for acute (oxia or tonos). FreeSerif has a different symbol for 'tonos'

>> and 'oxia' and ancient greek is probably not viewed correctly if someone


>> types using the gr(polytonic) keymap with el_GR.UTF-8 locale
>>

> You are right of course. But this (I am sorry) is in the 'keyboard
> input' section of my page, which I have not updated yet, and I am still
> not quite sure what it should say. Should there, or should there not, be
> input methods for both 'oxia' and 'tonos', given that they are
> 'officially' the same? I mean, what should be the advice to the
> classicists?
>
> My request for comment was, so far, only on the new 'font' section of
> the document, section 4.5.
>

Ok, quite explanatory!
Just one comment:
... Typographical fashions in Greece have now changed, so this solution
is right for modern Greek also...
It's not like a typographic fashion change; modern greek may still use any
glyph
for 'tonos'. You may see a dot, an acute, a line, a triangle, even a comma
if it is on a capital letter (like capital A-acute Ά, usually accent goes
to the left of capital letters). Let me explain more.

There is only one accent mark for modern greek, and it doesn't really
matter
how to draw it. It is just that the greek government admitted that
'tonos' which has replaced the former three accents (oxia, varia,
perispomeni)
is actualy nothing more than 'oxia'.
In other words, formally speaking, oxia replaced both varia and
perispomeni.

Why is valid for monotonic tonos (oxia) to have any glyph?
Because, at least since my parents remember (1940), noone cared about the
difference between varia (`) and oxia (΄). The books were printing them
correctly
but noone bothered in hand writing the formal 'katharevousa' or 'dimotiki'
greek. People used to make a distinction only between perispomeni
and tonos (meaning oxia or varia) and they usually preffered the glyph
of oxia or a vertical line above for this tonos.
Modern polytonic greek scripts usually don't use varia (grave). oxia
is mostly used in it's place


Technically speaking, a 'correct' font may be:
1. monotonic, (with no polytonic characters at all) where it doesn't
matter which glyph it uses for tonos
2. polytonic, which shall define the same glyph in 0x1f71 as in 0x3ac
and it should be oxia. (if it is not oxia, the font is still usable for
monotonic greek, even for polytonic if one does not use varia, but
not for ancient greek or modern polytonic greek with varia)
The 'correct' way to render different glyphs for every case, is probably
a 'smart' font implementation (unfortunately too far from today's reality).

Some greek terminology which may be useful
------------------------------------------
'Tonos' (τόνος) in greek means 'accent (mark)' in general, so this word was
used to indicate an accent without specifying which one
there are three tonos'es (οξεία, βαρεία, περισπωμένη)

'pnevma' (πνεῦμα) is the breathing mark. There are two of them
-'psili' (ψιλή) smooth breathing mark (comma above) and
-'dasia' (δασεία) rough breathing mark (reversed comma above).
Both do not exist in modern monotonic greek

'ypogegrameni' (ὑπογεγραμμένη) is the iota subscript (like ῃ, ᾳ)
and it also does not exist in monotonic greek.

'monotonic' and 'polytonic' greek, stands for using only one 'tonos'
or all the symbols. Modern greek is officially monotonic, but some
people (old men, the church, men of literature) still use it (me too).

There were two branches of evolution of the greek language. The
informal language of people, called 'dimotiki' (δημοτική, which means
'public') and the formal language of ecudated people 'katharevousa'
(καθαρεύουσα, which means 'pure'). Katharevousa comes in many versions,
depending how close it is to ancient greek.
Today dimotiki is the official language and practically only the
church sometimes uses 'simple' katharevousa (the most modern version).
Church always uses polytonic greek, but it does't distinguish between
oxia and varia (uses oxia only)


I hope it helped.
Feel free to ask any question about greek

regards,
Konstantinos

Jan Willem Stumpel

unread,
Feb 20, 2006, 4:48:19 PM2/20/06
to
Πιστιόλης Κωνσταντίνος wrote:
> Την Fri, 10 Feb 2006 20:14:16 +0100,ο(η) Jan Willem Stumpel
> <jstu...@planet.nl> έγραψε/wrote:

>> My request for comment was, so far, only on the new 'font' section


>> of the document, section 4.5.
>>
>
> Ok, quite explanatory! Just one comment: ... Typographical fashions
> in Greece have now changed, so this solution is right for modern
> Greek also... It's not like a typographic fashion change; modern

> greek may still use any glyph for 'tonos'. [..]

I got the impression that typographical fashion in Greece has changed from
http://ptolemy.tlg.uci.edu/~opoudjis/unicode/unicode_gkbkgd.html#oxia :

It would be an exaggeration to say that the erstwhile dots and
wedges have completely died out — especially as they have been
given a new lease of life by font developers' sluggishness.
However, the non-acute tonos seems to have become restricted
to display type or otherwise marked circumstances; quality
typography uses the acute.

This suggested some (recent?) change in typographical fashion to me.

Anyway, what I wanted to say is that the FreeSerif font now, in its very
latest version (Debian package ttf-freefont_20060126-0.1_all.deb),
displays alpha-oxia (0x3ac) the same as alpha-acute (0x1f7). So there is
some progress on the font side. The direct result is that on the
keyboard side, the need for a separate acute and tonos has become less
"acute"!

Regards, Jan

Thomas Wolff

unread,
Apr 13, 2006, 11:56:44 AM4/13/06
to
Πιστιόλης Κωνσταντίνος wrote:

> There is only one accent mark for modern greek, and it doesn't really
> matter
> how to draw it. It is just that the greek government admitted that
> 'tonos' which has replaced the former three accents (oxia, varia,
> perispomeni)
> is actualy nothing more than 'oxia'.
> In other words, formally speaking, oxia replaced both varia and
> perispomeni.

> ...

You have given a nice overview of Greek accent marks but it does not
seem complete, looking at Unicode, so what about the others?
From UnicodeData.txt, I see that the following combining marks
occur in single or multiple combinations with Greek letters:
DASIA
DIALYTIKA
MACRON
OXIA
PERISPOMENI
PROSGEGRAMMENI
PSILI
TONOS
VARIA
VRACHY
YPOGEGRAMMENI

the following of which have not been mentioned in your overview:
DIALYTIKA
MACRON
PROSGEGRAMMENI
VRACHY


The question that I am interested in most is what attachment of
accent prefix to function keys would you suggest? Is any common
attachment available with common input methods?
I would like to enhance my editor mined with Greek input. Easily,
two or three function keys are available, in shift-mode variations:

Fn, Shift-Fn, Alt-Fn, Alt-Shift-Fn, Control-Fn, Control-Shift-Fn
(where Fn is F2...F12, preferably F5, F6, F7)

With some straight-forward X keyboard configuration, e.g. shifted
digit keys can be added to the choice:

Alt-0, Alt-Shift-0, Control-0, Control-Shift-0,
Alt-Control-0, Alt-Control-Shift-0
(with digits 0...9)

For discussion, I have the following proposal:

(most important?)
TONOS
΄ 0384;GREEK TONOS
F6 (combined with acute for Latin letters)
OXIA
´ 1FFD;GREEK OXIA
Control-F6 (combined with circumflex)
or Alt-F6 because it's an alternative?
VARIA
` 1FEF;GREEK VARIA
Shift-F6 (combined with grave)
PERISPOMENI
῀ 1FC0;GREEK PERISPOMENI
Shift-F5 (combined with tilde)
or F5 because it's one of the more frequent accents?

(less important?)
PSILI
᾿ 1FBF;GREEK PSILI
Control-F5 (because it looks similar to oxia on Control-F6)
DASIA
῾ 1FFE;GREEK DASIA
Shift-F5 (because it looks similar to varia on Shift-F6)
YPOGEGRAMMENI
ͺ 037A;GREEK YPOGEGRAMMENI
Control-F5 (combined with cedilla)
or Control-5 (combined with dot below)

(even less important?)
DIALYTIKA
ϊ 03CA;GREEK SMALL LETTER IOTA WITH DIALYTIKA
F5 (combined with diaeresis)
or (if F5 preferred for perispomeni) ...?
MACRON
ᾱ 1FB1;GREEK SMALL LETTER ALPHA WITH MACRON
Control-9 (combined with stroke)
PROSGEGRAMMENI
ι 1FBE;GREEK PROSGEGRAMMENI
Alt-Control-5 (looking like an alternate to ypogegrammeni)
VRACHY
ᾰ 1FB0;GREEK SMALL LETTER ALPHA WITH VRACHY
Control-7 (combined with breve)


I appreciate your comments and suggestions.

Kind regards,
Thomas Wolff

0 new messages