Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

pdf4tcl and Chinese characters

58 views
Skip to first unread message

Harald Oehlmann

unread,
Jan 24, 2024, 1:15:48 PMJan 24
to
Thanks for great pdf4tcl !

I have a string with Chinese characters.
I output them with pdf4tcl:

pdf setFont {9 p} Helvetica
pdf setFillColor black
pdf text "实地"

I only get question marks.
The interesting ::pdf4tcl::createFont command should be used to select
256 glyphs. Well, Chinese language has a magnitude of this.

Has anybody solved this issue ?

Thanks for any hin,
Harald


pdf4tcl 0.9.4 on TCL 8.6.13...

Rich

unread,
Jan 24, 2024, 1:41:33 PMJan 24
to
You are bumping into a PDF limitation.

Each "font" within a PDF can address at most 256 characters. This is a
limit from very early in PDF's lifetime, and creates a real PIA for
using non-ASCII characters.

Basically you have to create a "custom" font in the pdf using
::pdf4tcl::createFontSpecEnc with a custom encoding of codepoints (the
byte values) to actual character glyphs. Then you have to "change
font" to your custom font in order to draw these characters, and use
your custom assigned code point value for the glyph you want output.

I.e., ASCII assigns 65 decimal to capital A. Using
::pdf4tcl::createFont you can assign 65 decimal to output the glyph 实
and then when you want to output that glyph, you 'change font' to your
custom font and output 65 decimal as the "character".

Harald Oehlmann

unread,
Jan 24, 2024, 2:18:30 PMJan 24
to
Thank you, Rich. That is what I feared.
Is there nobody out there who has automated this?
I suppose, this is not easy...
You also want to have one text field with one font, otherwise, the text
is interrupted, I suppose.

So, I will try to create a function, which assembles the glyphs of one
text, then creates a font and then outputs it.
In a 2nd step, an optimization may be done to find one font with 256
characters, which assembles as many text snippets as possible.

I have to sleep on this...

Thanks,
Harald

Rich

unread,
Jan 24, 2024, 4:18:00 PMJan 24
to
Harald Oehlmann <wort...@yahoo.com> wrote:
> Am 24.01.2024 um 19:41 schrieb Rich:
>> Harald Oehlmann <wort...@yahoo.com> wrote:
>>> Thanks for great pdf4tcl !
>>>
>>> I have a string with Chinese characters.
>>> I output them with pdf4tcl:
>>> ...
>>>
>>> I only get question marks.
>>> The interesting ::pdf4tcl::createFont command should be used to select
>>> 256 glyphs. Well, Chinese language has a magnitude of this.
>>
>> You are bumping into a PDF limitation.
>>
>> Each "font" within a PDF can address at most 256 characters. This is a
>> limit from very early in PDF's lifetime, and creates a real PIA for
>> using non-ASCII characters.
>>
>> Basically you have to create a "custom" font in the pdf using
>> ::pdf4tcl::createFontSpecEnc with a custom encoding of codepoints (the
>> ...
>>
> Thank you, Rich. That is what I feared.
> Is there nobody out there who has automated this?

Not that I'm aware of for pdf4tcl. Possibly for some other library for
some other language.

> I suppose, this is not easy...

Not trivial, not rocket science either.

> You also want to have one text field with one font, otherwise, the text
> is interrupted, I suppose.

Depending upon what you mean by text field, you can switch fonts before
drawing each glyph if you like and it will have no impact on the final
viewing of the pdf. If by field you mean a data entry field, then I
have no idea there.

When you delve down into the PDF internals, you find that PDF is
nothing more than instructions to place glyphs at x,y positions on a
sheet of virtual paper. I.e., internally it is very much like the
Tcl canvas widget. Which is why 'font switches' don't cause problems
with the render (unless you, the creator, create vastly different
actual fonts for 'effect'). But if the plural "fonts" are all of the
same size and all from the same base, font switches are invisible in
the final render.

> So, I will try to create a function, which assembles the glyphs of one
> text, then creates a font and then outputs it.

Yes, you either have to decide what glyphs you want ahead of time, and
'pre-create' fonts to draw those glyphs, or you have to analyze the
characters you want to "print" for the pdf (or for the current page)
and create a custom font for those characters.

The one advantage you get for the second method is that most unicode
TTF font files are huge, and if you create a custom internal font for
only the used characters, pdf4tcl only embeds the glyphs for the
characters you actually use, which means if you only use 1% of the
glyphs, you only store 1% of the font file into the pdf, making the pdf
smaller.

> In a 2nd step, an optimization may be done to find one font with 256
> characters, which assembles as many text snippets as possible.

Yes, it will be possible to do so, sometimes. For Chinese, given the
huge number of total characters, this may be difficult to do in a
general sense for all possibilities, but you might come close.

lamuzz...@gmail.com

unread,
Jan 24, 2024, 6:27:07 PMJan 24
to
Harald,
take a look to tclfpdf (https://github.com/lamuzzachiodi/tclfpdf).
There are an example (utf8.tcl, pasted below) with chinese characters using font simhei.ttf.
May be this help you.
Saludos,

Alejandro

#--- utf8.tcl -----------
package require tclfpdf
namespace import ::tclfpdf::*

Init;
AddPage;
# Add a Unicode font (uses UTF-8)
AddFont "DejaVu" "" "DejaVuSansCondensed.ttf" 1;
SetFont "DejaVu" "" 14;
Write 8 " -----
English: Hello World
Greek: Γειά σου κόσμος
Polish: Witaj świecie
Portuguese: Olá mundo
Spanish: Hola mundo
Russian: Здравствулте мир
Vietnamese: Xin chào thế giới
------";
Ln 10;
AddFont "simhei" "" "simhei.ttf" 1;
SetFont "simhei" "" 20;
Write 10 "Chinese: 你好世界";
#Select a standard font (uses windows-1252)
SetFont "Arial" "" 14;
Ln 10;
Write 5 "The file size of this PDF is only 16 KB.";
Output "utf8.pdf";

Harald Oehlmann

unread,
Jan 25, 2024, 2:44:08 AMJan 25
to
Muchas gracias, Alejandro,
looks promissing,
Harald
0 new messages