Including Khmer in ICU.

2 views
Skip to first unread message

Javier SOLA

unread,
Oct 16, 2003, 4:19:10 PM10/16/03
to icu-i...@www-126.southbury.usf.ibm.com
Dear colleagues,

I am looking at the different technical posibilities to include Unicode
based Khmer into a UNIX/LINUX interface such as Gnome. The final goal of
the project would be to translate the interface into Khmer, as well as some
basic applications.

The Pango (rendering and layout framework used by Gnome, I believe) webpage
says that they now include support for Open Type Indic Fonts using ICU
library code.

Khmer writing includes most of the usual problems or Indic languages,
vowels pronounced after a consonant may be written before, above, under,
before and above, below and above, before and after the consonant. Each
consonant has two ways of being written, a normal one and a small one under
another consonant.

There seem to be already Unicode OpenType fonts for Khmer.

For what I have seen (and I know very little about all this), including
support for Khmer in ICU would be the best technical option to make Khmer
arrive to the different user interfaces for Linux/Unix.

I have been seeing in the ICU pages that for the Thai language not all the
information is coded into the font, but it requires some special-purpose
ICU rules. As Khmer has probably similar problems, I would like to know if
this is due to the fact that the existing fonts are not complete enough or
to the fact that the OpenType and Apple font formats do not allow for the
codification of all the necessary information.

Sincerely,

Javier SOLA
Battambang
Cambodia

Markus Scherer

unread,
Oct 17, 2003, 12:48:53 PM10/17/03
to Javier SOLA, icu-i...@www-126.southbury.usf.ibm.com, Eric Mader, Mark Davis, Doug Felt
Dear Javier Sola,

Our layout engine maintainer is currently out of the office and will be
back next week.

While I don't know the details on Khmer myself, I can give you a brief
answer about font technologies (and hope I get it right :-). Various font
technologies leave different amounts of processing to be done in the
layout code vs. in the font tables. OpenType is designed to perform all
the processing that always needs to be done for a certain script in the
layout code, and to use the font tables only to express the typographical
choices within the script's structure. For each script, a part of the
OpenType specification gives implementation guidelines for the layout
engine and defines the separation between layout engine functionality and
font tables.

See
http://www.microsoft.com/typography/otfntdev/khmerot/default.htm
http://www.microsoft.com/typography/specs/default.htm
http://www.microsoft.com/typography/users.htm

Best regards,
markus

Markus Scherer マルクス IBM GCoC-Unicode/ICU San José, CA
markus....@us.ibm.com





Javier SOLA <js@...257...>
Sent by: icu-issu...@oss.software.ibm.com
2003-10-16 13:19

To: icu-i...@oss.software.ibm.com
cc:
Subject: Including Khmer in ICU.
_______________________________________________
icu-issues mailing list
icu-i...@oss.software.ibm.com
http://oss.software.ibm.com/developerworks/oss/mailman/listinfo/icu-issues


Eric Mader

unread,
Oct 22, 2003, 4:17:34 PM10/22/03
to Javier SOLA, Doug Felt, icu-i...@www-126.southbury.usf.ibm.com, Mark Davis, Markus Scherer



Hello Javier,

I'm the ICU LayoutEngine maintainer, and I'm back in the office :-)

The Indic support in Pango doesn't use the ICU code directly. Rather, it
uses code which was *derived* from the ICU code. This has turned out to be
a bit of a maintainance problem, and we're looking at rewritting the ICU
code so that it can be more directly incorporated into Pango. At this time
we don't know when, or even if, we will do this.

I've had a quick look at the Microsoft spec. for OpenType Khmer, and it
does seem that it has many features in common w/ the Indic scripts, as well
as some which are unique to Khmer. Off the top of my head I can't say for
sure if it makes sense extend the existing Indic code to handle Khmer, but
it does seem worth investigating, especially if we decide to rewrite it to
make it more portable and easier to maintain.

Depending on what we decide to do about the Indic code, we may be able to
add Khmer support to Pango "for free" once it's been implemented in ICU. In
the worst case, I'd expect that ICU's Khmer code could be ported to Pango
without too much trouble.

The Thai code in the ICU LayoutEngine was written to handle Thai fonts
which were produced before Microsoft published the spec. for building
OpenType Thai fonts. It has built-in knowledge of three different
pre-OpenType Thai font layouts. At some point, we'll extend this to support
the OpenType Thai fonts as well. It makes sense to me to look at this at
the same time we look at adding Khmer support. Again, I can't say anything
when, or even if, we'll do any of this work.

Regards,
Eric Mader
IBM GCoC - San José
5600 Cottle Rd. M/S 50-2/B11
San Jose, CA 95193




Markus Scherer
To: Javier SOLA <js@...257...>
10/17/2003 09:48 cc: icu-i...@oss.software.ibm.com, Eric Mader/San
AM Jose/IBM@...41..., Mark Davis/Cupertino/IBM@...41..., Doug
Felt/Cupertino/IBM@IBMUS
From: Markus Scherer/Cupertino/IBM@IBMUS
Subject: Re: Including Khmer in ICU.(Document link: Eric Mader)




Dear Javier Sola,

Our layout engine maintainer is currently out of the office and will be
back next week.

While I don't know the details on Khmer myself, I can give you a brief
answer about font technologies (and hope I get it right :-). Various font
technologies leave different amounts of processing to be done in the layout
code vs. in the font tables. OpenType is designed to perform all the
processing that always needs to be done for a certain script in the layout
code, and to use the font tables only to express the typographical choices
within the script's structure. For each script, a part of the OpenType
specification gives implementation guidelines for the layout engine and
defines the separation between layout engine functionality and font tables.

See
http://www.microsoft.com/typography/otfntdev/khmerot/default.htm
http://www.microsoft.com/typography/specs/default.htm
http://www.microsoft.com/typography/users.htm

Best regards,
markus

Markus Scherer マルクス IBM GCoC-Unicode/ICU San José, CA
markus....@us.ibm.com




Javier SOLA <js@...257...>
Sent by: To: icu-i...@oss.software.ibm.com
icu-issues-admin@...221... cc:
are.ibm.com Subject: Including Khmer in ICU.


2003-10-16 13:19
Reply all
Reply to author
Forward
0 new messages