Using glyphs with higher code points / typesetting with Cyrillic

74 views
Skip to first unread message

Stanislav Paskalev

unread,
May 20, 2023, 7:56:28 AM5/20/23
to PMW_Music
Hi all,

I want to use PMW to typeset music that includes inscriptions using Cyrillic letters. So far I've found the follow way which works.

1. Get some Type 1 fonts with Cyrillic glyphs that are under the 256 limit.
2. Convert them to PFA format using the 't1ascii' command from the t1utils package ( see https://github.com/kohler/t1utils )
3. Write my pmw input files using UTF-8 encoding
4. Process my input files using a python script that encodes any non-ascii character using 'cp1251' encoding and then wrapping that integer in \\int\\ format that pmw expects.
5. Pass the result to pmw to render a postscript files.

While this works and I'm quite happy with it it restricts my choice of fonts as I have to use both a type 1 font that includes Cyrillic glyphs AND they have to be mapped under the 256th code point/index. 

I tried using the Tempora Type 1 font from https://ctan.org/tex-archive/fonts/tempora but I can't seem to make use of the utf-8 mapping feature of pmw. A cyrillic 'а' (a) charecter will always be recognized as invalid input and substituted, regardless of any kind of mapping that I try. Even then the mapping is only valid up to the 511th code point/index.

Is there a way to use utf-8 input directly with pmw and map that to common OTF fonts ?

Regards,
Stanislav

Philip Hazel

unread,
May 20, 2023, 1:13:55 PM5/20/23
to PMW_Music
From release 5.20 PMW should be able to handle OpenType (.otf) fonts as well as the PostScript PFA format. However, owing to PMW's ancient history, by default, it can only support characters 0-255 in any non-standardly-encoded font. There is the possibility of providing a Unicode Translation file (.utr) which translates from arbitrary Unicode code points, but this still restricts you to the characters that the font has as its default encoding (see some .utr examples in the fontmetrics directory). However, the .utr file can also contain a re-encoding for the font, and it allows up to 512 characters, so you can select (by name) which 512 characters you want. This is all documented in the 5.20 manual. Look for the section "Unicode translation files".

So I think that it should be possible to make your life easier.

Philip

Stanislav Paskalev

unread,
May 20, 2023, 3:58:23 PM5/20/23
to PMW_Music
Mapping the U+0430 to a code point works, e.g. a "U+0430 165 CYRILLIC SMALL LETTER A" will give me a YEN sign using the Tempora-Bold font from http://mirrors.ctan.org/fonts/tempora/type1/Tempora-Bold.pfb

Now I want to map the characters from the font in range. Examining the font with FontForge I see that the "CYRILLIC SMALL LETTER A" has an identifier "afii10065", an internal index in the font of 710 and the unicode code point U+0430.

I tried adding a line like '/afii10065 165 CYRILLIC SMALL LETTER A' to bring afii10065 in the 0-511 range but that doesn't seem to work - pmw is using the original YEN 165 glyph from the font.

Philip Hazel

unread,
May 21, 2023, 3:21:51 AM5/21/23
to pmw_...@googlegroups.com
Please send me (Philip...@gmail.com, not this list) the PostScript output that is generated - I can then see if it's behaving correctly. Better also send me the input and the .utr file that you are using.

Regards,
Philip


--
You received this message because you are subscribed to the Google Groups "PMW_Music" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pmw_music+...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/pmw_music/6ad1bb72-3cc7-46f2-b13a-73608c4ffc68n%40googlegroups.com.

Stanislav Paskalev

unread,
May 21, 2023, 10:08:11 AM5/21/23
to PMW_Music
Meanwhile I've found a way to use basically any font with my existing setup by re-encoding the fonts using FontForge.

1. Open the .ttf/.otf using FontForge
2. Select Encoding -> Add Encoding Name... and input CP1251
3. Select Encoding -> Reencode -> CP1251
4. Select everything above the 256th code point and use Encoding -> Detach & remove glyphs
5. Follow-up with Encoding -> Remove unused slots
6. Select Element -> Font Info -> General and set the Em Size to 1000 units
7. Export to .pfa and .afm using File -> Generate Fonts... (skip fixing errors)

I've tried this with GentiumPlus and GentiumBookPlus and both successfully converted using this approach in all variants (regular/bold/etc).

Here's the python script that I'm using in case anyone wants to follow this approach, which should work for different encodings as well.

#!/usr/bin/env python3
import sys
f = open(sys.argv[1])
while True:
c = f.read(1)
if not c:
break
if c.isascii():
print(c, end='')
else:
print('\\' + str(int.from_bytes(c.encode('cp1251'))) + '\\', end='')

Regards,
Stanislav

Philip Hazel

unread,
May 22, 2023, 11:35:43 AM5/22/23
to PMW_Music
To the list, for information::

Stanislav sent me his example, and I deterimined that the problem was caused by his font specifying Standard Encoding. In this case PMW thinks it knows all about the font and ignores a .utr file. This is clearly sub-optimal so I am (when time permits) going to make some changes that will make it a bit more straightforward to use this kind of font.

Philip

On Saturday, 20 May 2023 at 12:56:28 UTC+1 ksh...@gmail.com wrote:
Reply all
Reply to author
Forward
0 new messages