Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: Proposed fix for Malayalam (& other Indic?) chars and wcwidth

0 views
Skip to first unread message

rajeev joseph sebastian

unread,
Nov 11, 2006, 4:05:40 AM11/11/06
to
Thanks for the info. I will try something out ...

Regards,
Rajeev

----- Original Message ----
From: Rich Felker <dal...@aerifal.cx>
To: linux...@nl.linux.org
Sent: Thursday, November 9, 2006 11:28:21 PM
Subject: Re: Proposed fix for Malayalam (& other Indic?) chars and wcwidth

On Tue, Nov 07, 2006 at 01:13:24AM -0800, rajeev joseph sebastian wrote:
> Well, I think I misunderstood ...

No problem.

> -----------
> In the first para, I asked whether it was possible to use TrueType
> in the terminal. If we cannot, then we need to use some hybrid of
> bitmap fonts and OT fonts, such that, the OT features can be used
> (atleast the GSUB if nothing else) and the Bitmap features can be
> used (i.e., using a bitmap instead of outlines).

Yes, UCF also solves the problem of character->glyph mapping in a way
that's more cell-oriented, but an application (e.g. mlterm) using
OpenType fonts could use the OT tables instead and get the same
effect.

> -----------
>
> In the last para, I said that I would try (or rather the Typographer
> and I could try) the following:
>
> 1) Since you are assigning widths to characters, and since each
> logical cluster would get a width = sum of the widths of the
> characters in that cluster, ...
>
> 2) ... all we need to do is design the font in such a way that, the
> glyph corresponding to a logical cluster would use as much space as
> available to it.
>
> E.g.,
>
> kra cluster consists of ka + chandrakkala + ra
> so, when a software (say ls or cat) outputs a sequence ka +
> chandrakkala + ra, the kra logical cluster will get widthC =
> width(ka) + width(chandrakkala) + width(ra) allocated to it. In the
> font, we make sure that the kra *glyph* which corresponds to the kra
> *logical cluster* uses as much as possible of widthC.
>
> With this, characters have a width specification, and glyphs can be
> moulded to use as much of the space as possible/necessary as per the
> widths assigned to each *character*.
>
> ----------
>
> I hope I have set things right ?

Yep, this is right! Maybe you or your typographer friend could try
sketching out a few glyphs and see if it seems to work out well or not
(and what character width assignments would be required). The
character cell size I'm working with for my font with widespread
coverage of lots of scripts is 8x16, but larger or smaller font sizes
could of course be made too. In assigning widths. my inclination is
never to assume that more than 3 (or 4?) vertical strokes can fit in a
single cell, since 3 is the number in the latin characters "m" and "w"
and since a cell size too small to represent latin characters is
probably not useful anywhere.

In terms of simplifying font design, it helps if conjunct forms can be
reduced as much as possible to 'glueing together pieces'. UCF allows
the shape of the pieces to vary depending on the adjacent pieces. For
example a latin "fi" ligature is made not by creating a single wide
"fi" glyph but instead a special glyph for "f when it is followed by
i" and a special glyph for "i when it follows f". In conjunct
formation for many scripts (including diacritic placement for western
scripts, stacking for Tibetan, and various others) this model works
out nicer and greatly reduces the number of glyphs needed (and the
amount of maintainence/font design work). However, if needed, it's
possible to convert whole predrawn "conjunct glyphs" to the UCF rules
format -- it just might require a lot of glyphs. For Malayalam, a mix
of the two approaches is probably appropriate, depending on whether
the particular conjunct is formed by putting together 'reusable' parts
or whether it's highly unique to the character sequence it represents.

Hopefully this information is helpful to you or anyone else thinking
about designing fonts.

Rich


--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/

--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/


0 new messages