On Tue, 22 Feb 2022 07:36:22 -0800 (PST)
luser droog <
luser...@gmail.com> wrote:
> [...] And packing
> the glyph selection into a composite font would be a ton of work if
> it's even possible.
It is possible to create a tree of composite fonts, where each byte in
a UTF-8 sequence dispatches to the next font, and the last one picks
the glyph. The problems with this approach are 1. the complexity
creating and populating the font tree, and 2. the fact that
the base fonts at the leaves can only encode 64 glyphs each (since
that's how many values the last byte in a multibyte UTF-8 sequence can
hold), and not even at the beginning of the /Encoding array, which is a
waste.
A simpler approach is to reencode the UTF-8 string to a made-up UTF-24
encoding (3 bytes per codepoint), and then use a simple chain of 8x8
(FMapType 2) composite fonts. Here the first byte selects the Unicode
plane (sections of 65536 codepoints; only 4 or 5 are assigned), the
second byte the segment of 256 codepoints in that plane, and the third
one the glyph inside that segment.
While in theory this needs 1 comp. font to choose the plane + 256 comp.
fonts (1 for each plane) + 265x256 base fonts = 65793 fonts, the
majority of them are just the same empty font.
Below is an example of this approach. You get a unicode font by calling
"unicodize" on a font with CharStrings, and you reencode UTF-8 strings
with the "u" operator:
/Courier-Unicode /Courier findfont unicodize 12 scalefont setfont
(oh là là)u show
It uses the AdobeGlyphList for now -- maybe David will come up with
something better.
The code has probably some bugs. I only tested it with Emacs' "Hello"
demo:
%!PS
/f /Arial findfont def
/uf /UFont f unicodize def
uf 14 scalefont setfont
700
[
( Europe: ¡Hola!, Grüß Gott, Hyvää päivää, Tere õhtust, Bonġu)
( Cześć!, Dobrý den, Здравствуйте!, Γειά σας, გამარჯობა)
( Africa: ሠላም)
( Middle/Near East: שָׁלוֹם, السّلام عليكم)
( South Asia: નમસ્તે, नमस्ते, ನಮಸ್ಕಾರ, നമസ്കാരം, ଶୁଣିବେ,)
( ආයුබෝවන්, வணக்கம், నమస్కారం, བཀྲ་ཤིས་བདེ་ལེགས༎)
( South East Asia: ជំរាបសួរ, ສະບາຍດີ, မင်္ဂလာပါ, สวัสดีครับ,
Chào bạn) ( East Asia: 你好, 早晨, こんにちは, 안녕하세요)
( Misc: Eĥoŝanĝo ĉiuĵaŭde, ⠓⠑⠇⠇⠕, ∀ p ∈ world • hello p □)
( CJK variety: GB(元气,开发), BIG5(元氣,開發), JIS(元気,開発),
KSC(元氣,開發)) ( Unicode charset: Eĥoŝanĝo ĉiuĵaŭde, Γειά σας,
שלום, Здравствуйте!) ] {
1 index 20 exch moveto
u show
30 sub
} forall
pop
showpage
Here's the code. Our old friend the iterator makes an appearance :)
%!PS
%% create a composite font suitable for strings with UTF-24 encoding
%: key originalfont -- newfont
/unicodize {
40 dict begin
/ofont exch def
/key exch def
/fname key dup length string cvs def
/basefonts 10 dict def
/planefonts 10 dict def
%: string string -- name
/newname {
/s2 exch def /s1 exch def
/s s1 length s2 length add 1 add string def
s 0 s1 putinterval
s s1 length (-) putinterval
s s1 length 1 add s2 putinterval
s cvn
} def
%: int -- string
/tohex { 16 10 string cvrs } def
%: array element -- newarray
/append { /e exch def [ exch aload pop e ] } def
%: suffix -- font
/newbasefont {
/suffix exch def
/name fname suffix newname def
ofont dup length dict copy
dup /Encoding [ 256 { /.notdef } repeat ] put
dup /FontName name put
dup basefonts exch name exch put
} def
/emptybasefont (Base-E) newbasefont def
%: suffix -- font
/newplanefont {
/suffix exch def
/name fname suffix newname def
<< /FontType 0
/FontMatrix [ 1 0 0 1 0 0 ]
/FontName name
/FMapType 2
/Encoding [ 256 { 0 } repeat ]
/FDepVector [ emptybasefont ]
>>
dup planefonts exch name exch put
} def
/emptyplanefont (Plane-E) newplanefont def
/mainfont << /FontType 0
/FontMatrix [ 1 0 0 1 0 0 ]
/FontName fname
/FMapType 2
/Encoding [ 256 { 0 } repeat ]
/FDepVector [ emptyplanefont ]
>> def
%: font subfont code --
/addsubfont {
/c exch def /sf exch def /f exch def
f /FDepVector 2 copy get sf append put
f /Encoding get c f /FDepVector get length 1 sub put
} def
%: glyphname code --
/putglyph {
dup /plane exch 65536 idiv def
dup /range exch 65536 mod 256 idiv def
/code exch 256 mod def
/glyph exch def
/idx mainfont /Encoding get plane get def
idx 0 eq {
plane tohex newplanefont
dup mainfont exch plane addsubfont
} {
mainfont /FDepVector get idx get
} ifelse
/planefont exch def
/idx planefont /Encoding get range get def
idx 0 eq {
plane 256 mul range add tohex newbasefont
dup planefont exch range addsubfont
} {
planefont /FDepVector get idx get
} ifelse
/basefont exch def
basefont /Encoding get code glyph put
} def
%: glyphname -- code true | false
/getcode {
/g exch def
AdobeGlyphList g known {
AdobeGlyphList g get true
} {
/s g g length string cvs def
s length 7 eq {
s 0 3 getinterval (uni) eq {
s 7 string copy dup 0 (16#) putinterval
{ cvi } stopped { pop false } { true } ifelse
} {
s 0 1 getinterval (u) eq {
9 string dup 3 s 1 6 getinterval putinterval
dup 0 (16#) putinterval
{ cvi } stopped { pop false } { true } ifelse
} { false } ifelse
} ifelse
} { false } ifelse
} ifelse
} def
% fill the fonts...
ofont /CharStrings get { pop dup getcode { putglyph } { pop } ifelse } forall
% register them...
basefonts { definefont pop } forall
planefonts { definefont pop } forall
% register & return main font
key mainfont definefont
end
} bind def
%: string|array -- iterator ( -- nextchar true | false )
/sequenceiterator {
2 dict begin
/s exch def
/counter [ 0 ] def
[ counter 0 /get cvx s length /lt cvx [
s counter 0 /get cvx /get cvx true
counter 0 2 /copy cvx /get cvx 1 /add cvx /put cvx
] cvx [
false
] cvx /ifelse cvx
] cvx
end
} bind def
%% reencode UTF-8 to UTF-24
%: string -- string
/u {
3 dict begin
/src exch def
/nextch src sequenceiterator def
% count UTF-8 sequence starts
0 src { dup 128 lt exch 2#11000000 and 2#11000000 eq or
{ 1 } { 0 } ifelse add } forall
3 mul string /dest exch def
0 {
% decode sequence
nextch not { exit } if
dup 128 lt {
0 % 0xxxxxxx - 0 following bytes
} {
dup dup 2#11000000 ge exch 2#11011111 le and {
2#00011111 and 1 % 110xxxxx - 1 following byte
} {
dup dup 2#11100000 ge exch 2#11101111 le and {
2#00001111 and 2 % 1110xxxx - 2 following bytes
} {
dup dup 2#11110000 ge exch 2#11110111 le and {
2#00000111 and 3 % 11110xxx - 3 following bytes
} {
pop 0 0 % invalid sequence
} ifelse
} ifelse
} ifelse
} ifelse
{ 6 bitshift nextch pop 2#00111111 and add } repeat
% stack: index-to-dest, codepoint
2 copy 65536 idiv dest 3 1 roll put
exch 1 add exch 2 copy 65536 mod 256 idiv dest 3 1 roll put
exch 1 add exch 2 copy 256 mod dest 3 1 roll put pop
1 add
} loop
pop
dest
end
} bind def
--