Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Announcing uuterm and ucf (universal charcell font)

52 views

Skip to first unread message

Rich Felker

unread,

Oct 5, 2006, 6:03:30 PM10/5/06

After much work, I finally have a working (but still experimental)
version of uuterm and the "ucf" bitmap font format I proposed in
August. Source for uuterm is browsable at
http://svn.mplayerhq.hu/uuterm/ and a sample ucf font is linked from
the included README.

Since ucf is probably more interesting to members of this list than
particular software, I'll skip the stuff about uuterm and just get to
the point of ucf. I based the design loosely on Markus Kuhn's old
proposal for a bitmap font format that recognizes the difference
between glyphs and characters. "Source code" for a ucf font looks
like:

# sa+la
:000000007B1129650300000000000000 0F66+0FB3

# sa+*
:000000007B2945030000000000000000 0F66+[0F90-0FAC] 0F66+[0FAE-0FB0] 0F66+[0FB4-0FBC]

...

# ra la sha ssa sa
:000000003E08081C2201010000000000 0F62 0F6A
:00000000394545491D03010000000000 0F63
:000000000709096F3911090101000000 0F64
:000000007048487B4E44484040000000 0F65
:000000007B1129456313010000000000 0F66

The long hex number is a glyph bitmap, which can be edited easily with
a program like Roman Czyborra'a "hexdraw" (from the GNU unifont
protject), or imported/exported from other formats. Unlike unifont
however there is no limitation on character cell size.

The numbers that follow are the characters that the glyph can
represent, and in which contexts. In the above example, the first
glyph is used for the Tibetan consonant "sa" (U+0F66) when a combining
"la" (U+0FB3) is attached to it. The second glyph is used for "sa"
when any of the listed ranges of combining characters is attached, and
the third glyph is used in any case not matching previous ones.

Aside from the WITH_ATTACHED rule (represented by "+"), the format
also has ATTACHED_TO (for shaping combining marks depending on the
base character or previous combining mark) as well as rules for
examining the character(s) in the previous/next cell (in visual
order). Together with application of visual reordering rules by the
application, I believe this is sufficient for nice (not perfect, but
on a comparable level to rendering English text monospaced)
presentation of Indic text.

I will be converting GNU unifont and/or other free 8x16-cell fonts to
make a fairly complete UCF font with all the necessary contextual
glyph replacements, but it will be a slow process and I'm in no hurry.
I'd welcome others who get interested in it to work on such a thing.
I'd also be interested in studying the feasability of getting support
for UCF in various *NIX consoles.

A few comments on "Why not just use OpenType??":

- The GSUB model does not adapt well to a character cell device where
characters are organized into cells and where arbitrary string
replacements don't make sense.

- The glyph metric data is as large as the actual glyphs, doubling
font size. Charcell fonts don't need any glyph metrics.

- I don't think you can implement OpenType in less than 100 lines of
C. The UCF char-to-glyph mapping algorithm is easy to implement and
tiny.

- Personally I like solutions that are adapted to the nature of the
particular problem (character cell device) rather than trying to
apply an overly general solution that will be awkward at best.

- Something like UCF has a chance of getting into *NIX console drivers
someday. I doubt anything OpenType-based would ever pass the
necessary bloat tests to get integrated at such a low level.

Rich

--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/

Rich Felker

unread,

Oct 7, 2006, 1:09:53 AM10/7/06

[cc'ing the list since i think it's relevant]

On Fri, Oct 06, 2006 at 04:55:51PM -0400, Daniel Glassey wrote:
> btw there is discussion about trying to integrate as much as possible on
> http://live.gnome.org/UnifiedTextLayoutEngine that you might like to
> contribute to.

well sadly i think the only thing i could contribute to this is
detracting from it. i'm strongly against pushing common apis. what
needs to happen in this area is not for everyone to agree on a single
codebase to use (which will invariably be ill-suited to many people's
needs), but instead to move the topic of layout _out_ of the code and
into data or standards -- either new tables in fonts or generic tables
that apply to all fonts, much like the unicode tables are generic.
then, everyone can use whatever implementation (choice of language,
etc.) suits them while still agreeing on a common expected behavior.
but whatever is standardized _must_ always be behavior. not code. a
single codebase, free/libre or not, is not a standard but an
implementation!

graphite might be the solution we're looking for, or it might be
ridiculously overcomplex and bloated. i'd need to research it more to
have an opinion but i'm quite interested in it. basically it's like a
much more powerful version of what i did with ucf (whereas ucf is
extremely simple because the task it needs to accomplish is simple).

one thing i'm sure of though, from working on uuterm and ucf: there
are two _very_ different issues people are trying to solve, and i
think many of the people working on them don't understand the
difference. "complex" stacking of diacritic marks is absolutely not a
layout issue. the solution can be fully specified in terms of simple
substitution tables, or substitution+positioning. ligatures can also
be entirely handled in this way -- even the notoriously-"complex"
indic scripts. i find it appalling that most apps don't support these
correctly and then claim it's because of complex layout issues. part
of my intent in the experiment of uuterm is demonstrating that
combining stacks, shaping, and ligatures are not a complex layout
issue.

rendering bidi text, diagonal urdu, mixed horizontal and vertical text
flows, etc. is complex (and except for bidi these things probably only
belong in word processing, desktop publishing, web browsers, etc. --
not your average plaintext textbox). on the other hand getting
combining stacks and ligatures right is _not_ complex. having done it
in less than 100 lines of c, i can now say this with confidence...

rich

Rich Felker

unread,

Oct 9, 2006, 6:57:16 PM10/9/06

On Mon, Oct 09, 2006 at 12:37:24PM -0600, Wesley J. Landaker wrote:

> On Thursday 05 October 2006 16:03, Rich Felker wrote:
> > A few comments on "Why not just use OpenType??":
> >
> > - The GSUB model does not adapt well to a character cell device where
> > characters are organized into cells and where arbitrary string
> > replacements don't make sense.
> >
> > - The glyph metric data is as large as the actual glyphs, doubling
> > font size. Charcell fonts don't need any glyph metrics.
> >
> > - I don't think you can implement OpenType in less than 100 lines of
> > C. The UCF char-to-glyph mapping algorithm is easy to implement and
> > tiny.
> >
> > - Personally I like solutions that are adapted to the nature of the
> > particular problem (character cell device) rather than trying to
> > apply an overly general solution that will be awkward at best.
> >
> > - Something like UCF has a chance of getting into *NIX console drivers
> > someday. I doubt anything OpenType-based would ever pass the
> > necessary bloat tests to get integrated at such a low level.
>

> The main point here, which I don't argue against, is that OpenType is complex
> and bloated when applied to minimally simple charcell devices. So, say I want
> to go implement this right away... I code it up and... ah, no fonts!

Actually we have all of "GNU Unifont" plus plenty of other bitmap
fonts, all of which are easy to convert. Unfortunately the
European/Western glyphs in GNU Unifont are extremely ugly; if it
weren't for that I would just have converted them already.

I'll be working on the scripts that are interesting to me, but my
viewpoint here is as follows: there are VERY MANY scripts with
absolutely no terminal emulator that can display them, or with only
one locale-specific terminal emulator with very poor features. If a
terminal emulator implements UCF support, it _automatically_ supports
these scripts as soon as someone makes a font. No coding is required
by users wanting to get their script supported; just font drawing.
While this doesn't help so much with the goal of getting a complete
font for all scripts, it does make it very easy to achieve the local
goal of supporting just one or two scripts you need, as the need
arises.

> To help create UCF fonts, it seems like having an OpenType to UCF converter
> would be a *really* big help.

Well, I mostly disagree. TrueType/OpenType fonts simply do not make
legible character cell fonts, between not being designed for fixed
width and the classic problem of poor rendering at small sizes.

In any case, if you can make bitmaps from your OpenType fonts, it's
trivial to use the glyphs in a UCF font, and programs to make bitmaps
(e.g. BDF) out of OpenType fonts already exist. However, the OpenType
tables for substitutions and positioning are built on an entirely
different framework of layout that's about character sequences,
baselines, and anchor points as opposed to character cells, so IMO
there's very little hope of converting such tables in a meaningful
way. If you have an idea for how this could be done, I'd be very
interested in hearing it!

Keep in mind that most glyphs don't even need any such tables. The
vast majority of glyphs are CJK. Also, the fact that UCF doesn't need
precomposed glyphs for accented characters cuts down vastly on the
number of glyphs needed. As an example, my current Tibetan UCF font
has only 113 glyphs because it makes powerful use of combining. Fonts
with precomposed glyphs can have well over 1000. The situation for
Latin is similar.

> Even if you still had to tweak it manually
> afterward for >75% of the glyphs, it would still be a big win, reduce a lot
> of manual labor, and would help tide the "gee, UCF sounds like a good idea;
> too bad there will never be any fonts" arguments.

Well, we'll see. :)

> Not to distract from your work here,

No problem, comments are welcome.

> but you implied that you are going to
> work on converting fonts manually. Even just for your own use, wouldn't it
> save you time in the long run to get a minimal OpenType to UCF converter
> working?

Well, my plan right now for fonts is split into several parts:

For Latin, Cyrillic, Greek, etc. I plan to compile fonts in several
styles: one that's classic VGA-style glyphs, one with a more standard
modern non-bold terminal look, and one based on the font I personally
use, which I designed for Latin-only a long time ago (extending it to
non-Latin alphabets). The first two are matters of importing; the
latter is a matter of drawing.

For other scripts, I'm converting glyphs if there are nice existing
ones (for instance the Thai font in GNU Unifont seems decent) and
drawing new glyphs for scripts that don't yet have any good bitmap
fonts. For scripts with heavy use of combining marks, shaping, or
ligatures, a good bit of drawing would be necessary even if I found
existing bitmaps, in order to make the shaping work right. This is
especially true for Indic languages I think, which aren't well-suited
to character cells but which can be forced to align to charcell
boundaries without being much more offensive than fixed-width Latin
is.. :)

Rich

P.S. It's my intent that a good UCF-using program would allow multiple
simultaneous font files in use and search them all for glyphs. A more
advanced one might even use separate fonts for Chinese/Japanese style
ideographs, etc. and have terminal escapes or whatever to switch
styles.

0 new messages