[reportlab-users] PDF without embedding fonts

Yuriy Asyutin

unread,

May 12, 2010, 4:19:06 PM5/12/10

to Support list for users of Reportlab software

Hello

I have a task to create PDF that contains utf-8 encoded data in Cyrillic range without embedding any font resource. I've reviewed reportlab lib and found that for not embedding fonts are such options:
1. to use one of the 14 standard fonts (those fonts do not contain Cyrillic glyphs)
2. CID keyed fonts - current cidfontdata contain reference to CJK fonts.

Cyrillic glyphs are supported by Extended Language pack that can be downloaded and installed into your desktop from adobe.com (MinionPro and MyriadPro fonts OTF).

Looks like Extended Language pack fonts are not CID keyed (there is no CMap files) - we get only *.otf files.

So, I tried to follow similar to CID font reference approach - created another key Myriad in CIDFontInfo dict to test how such font can be accessed from PDF file:

CIDFontInfo['Myriad'] = {
            'Type':'/Font',
            'Subtype':'/Type1',
            'Name': '/MyriadPro-Regular' ,
            'BaseFont': '/MyriadPro-Regular',
            'Encoding': '/WinAnsiEncoding',
            'Widths': [500]*256

In such way I could render glyphs of Latin group.

My questions are:

1. Is `Subtype/Type1` valid value for such OTF fonts and can I access Cyrillic glyphs by using one of the unicode encodings and creating special ToUnicode list?

Actually, I'm not sure - am I on the right way? Or maybe I can include a CMap stream into PDF source and follow CID keyed appproach for Extended Language pack.

I'd really appreciate any help to show me right direction.

Thanks

Yuriy

Tim Roberts

unread,

May 12, 2010, 4:21:57 PM5/12/10

to reportlab-users

Yuriy Asyutin wrote:
>
> I have a task to create PDF that contains utf-8 encoded data in
> Cyrillic range without embedding any font resource.

May I ask why? In the VAST majority of cases, it is far better to embed
the fonts. Otherwise, your document can only be viewed on computers
that already contain the exact version of the exact font you used.

--
Tim Roberts, ti...@probo.com
Providenza & Boekelheide, Inc.

_______________________________________________
reportlab-users mailing list
reportl...@lists2.reportlab.com
http://two.pairlist.net/mailman/listinfo/reportlab-users

Yuriy Asyutin

unread,

May 12, 2010, 4:31:03 PM5/12/10

to reportlab-users

Such requirement is to avoid any font licensing (even BSD like) and reduce file size to maximum.

In case users will be using Adobe Reader for viewing PDF - all they need only to download once Language pack, suggested by PDF viewer.

Thanks

2010/5/12 Tim Roberts <ti...@probo.com>

Andy Robinson

unread,

May 13, 2010, 4:45:56 AM5/13/10

to reportlab-users, Support list for users of Reportlab software

On 12 May 2010 21:19, Yuriy Asyutin <yasy...@gmail.com> wrote:
> My questions are:
>
> 1. Is `Subtype/Type1` valid value for such OTF fonts and can I access
> Cyrillic glyphs by using one of the unicode encodings and creating special
> ToUnicode list?
>
> Actually, I'm not sure - am I on the right way? Or maybe I can include a
> CMap stream into PDF source and follow CID keyed appproach for Extended
> Language pack.

I am sorry to say that I have no idea at all.

The full PDF specification is available online but if that doesn't
help. To get the CID font support working, many years ago, I
basically ran Japanese 'Hello World' postscript files through
Distiller and studied the PDFs which came out in a text editor to work
out what structures needed to be in the PDF. The basic concepts of
CMaps and getting the glyph widths were clear enough, but I basically
copied properties which were known to work. Adobe's documentation
was some help, but not much.

Do you have any examples of real-world PDFs which display Cyrillic but
don't contain any fonts? If so, please show us one and maybe we can
see how it is done. I wonder if they work for you, but for a Western
user Acrobat will prompt them to download an extra font pack? The
only other approach I can see is to convert all glyphs to bezier
curves, which would be very slow.

Best Regards,

--
Andy Robinson
CEO/Chief Architect
ReportLab Europe Ltd.
Media House, 3 Palmerston Road, Wimbledon, London SW19 1PG, UK
Tel +44-20-8545-1570

Robin Becker

unread,

May 13, 2010, 6:38:50 AM5/13/10

to reportlab-users, Andy Robinson, Support list for users of Reportlab software

On 13/05/2010 09:45, Andy Robinson wrote:
........

>
> Do you have any examples of real-world PDFs which display Cyrillic but
> don't contain any fonts? If so, please show us one and maybe we can
> see how it is done. I wonder if they work for you, but for a Western
> user Acrobat will prompt them to download an extra font pack? The
> only other approach I can see is to convert all glyphs to bezier
> curves, which would be very slow.

........
I feel this might work if instead of using an 8 bit encoding for text you use
the 16bit unicode encoding which seems to be allowed everywhere text is allowed.
I know this works with CJK in the document attributes eg Author Title etc etc
and I think it works in outlines etc etc so it would probably work for Cyrillic
in the same way.
--
Robin Becker

Yuriy Asyutin

unread,

May 14, 2010, 5:37:43 AM5/14/10

to reportlab-users

PDF reference documentation didn't help me a lot either, so I tried to define needed information the same way as you - created a PDF with LifeCycle with `not embed fonts` option and investigated PDF source. I've attached file created by LC which contains Cyrillic, but now I see that there is also additional embedded font source.
Answering your question `I wonder if they work for you, but for a Western
user Acrobat will prompt them to download an extra font pack?`looks like the encoding I used WinAnsiEncoding does not require to download any font packs it will substitute with any suitable font. I believe set to the one of the unicode encodings will do this.

Thanks a lot, I'll try some examples to investigate it more.

2010/5/13 Andy Robinson <an...@reportlab.com>

ukrainian.pdf

Yuriy Asyutin

unread,

May 14, 2010, 5:47:39 AM5/14/10

to reportlab-users

I'll try some examples to find out this - think you are right. Anyway CJK fonts work in ReportLab even without parsing CMap files - both Asian and Extended language packs consist of *.otf resources however Asian font pack has also cmap files - Does it mean that only CJK Asian pack consist of CID-keyed fonts? Can I consider Extended lang pack also as CID-keyed fonts? I'm asking those questions because I couldn't find on adobe.com such information.

Thanks for the help and usefull hints.

2010/5/13 Robin Becker <ro...@reportlab.com>

Andy Robinson

unread,

May 14, 2010, 5:56:03 AM5/14/10

to reportlab-users

On 14 May 2010 10:47, Yuriy Asyutin <yasy...@gmail.com> wrote:
> I'll try some examples to find out this - think you are right. Anyway CJK
> fonts work in ReportLab even without parsing CMap files

We used to include and parse the CMap files for CJK fonts. Then we
found a compact way to include this in arrays in the Python source
code, so the info in .cmap went into *.py. Then, when we switched to
Unicode input, we didn't need all CMaps because we decreed that the
input would always be Unicode code points.

reportlab/pdfbase/_cidfontdata.py contains the remaining 'hard-coded'
data needed to create CJK documents: basically structures the same as
we observed in distilled PDFs.

I guess something similar and much shorter could be done for standard
Cyrillic fonts.

Today is very busy with visitors but I hope to look at your attached
file at the weekend or next week

Good luck,

Andy

Reply all

Reply to author

Forward