Attempting to use non-embedded Type0 CID fonts to render Chinese, Japanese, and Korean characters

Nathan Broadbent

unread,

Apr 10, 2020, 9:15:17 AM4/10/20

to Prawn

Hello,

I'm working on support for Chinese, Japanese, and Korean characters in my application. I found this PDFClownCJKFont repo, and it adds "non-embedded Type0 CID fonts": https://github.com/shannah/PDFClownCJKFont

This example PDF renders perfectly on my machine and uses my system's built-in fonts (without any special fonts installed): https://github.com/shannah/PDFClownCJKFont/blob/master/UnicodeTest.pdf?raw=true

It's also a very small PDF (only 48KB), because it doesn't need to include any embedded fonts (which can be 1-2 MB). So it would be really nice to get this working. Would it be possible to do something like this with Prawn?

I've been experimenting with this today. I took the font object from the example PDF and created a Prawn::Font subclass that adds this to the Font resources, but I got stuck on the "compute_width_of" / "character_width_by_code" methods. I have no idea how to do that. Is it possible to compute these from the /W entries?

Thanks,

Nathan

Alexander Mankuta

unread,

Apr 10, 2020, 9:36:01 AM4/10/20

to Prawn

Hi,

You're on the right track (subclassing Font) but I don't think it will be easy. For one, it might be not enough to only have widths (/W entry). But since you have the array you pretty much just make look up in that array according to your chosen encoding to implement character_with_by_code. compute_width_of is essentially a sum of individual character width look ups.

In theory you can drop /W from your document as it's optional, saving a few more bytes.

I think an easier approach would be to subclass Font::TTF class, give it your font file for metrics but change the objects it generates for PDF.

However, This approach is counterproductive. PDF is has Portable in its name and your approach is not portable. It requires external resources. If your user for whatever reason don't have the resource the document will be blank, no fallback font will be used. This is not as unlikely situation as you might think: user might not have internet connection, user might use a renderer that doesn't download missing fonts (i.e any other than Adobe Acrobat), etc. Then there are issues with the base font lookup. It's not very robust. The font is identified by a string, not any globally unique identifier or concrete address. This can fail in a number of ways too: user has a font installed with the same name but it doesn't provide required glyphs, user has the font installed but system identifies it by a different string, etc.

--

Regards,

Alex

Nathan Broadbent

unread,

Apr 10, 2020, 10:43:33 AM4/10/20

to Prawn

Hi Alex,

Thanks for your reply! I also created a test PDF in Adobe Acrobat to see what they do, and they embed their Adobe Myungjo Std M font for Korean text.

I also found the free UnBatang font for Korean, and the resulting PDF is only 115kb, so I think this is fine! I will skip the non-embeddable fonts idea for now. Thanks again for your help!