For the T1 fonts we have such a mechanism via a list of substitution fonts.
That's used in pdfbase.pdfmetrics.unicode2T1 to fix up any encoding issues from
the available fonts. That's reasonable for T1 because the maximum number of
glyphs is 256.
In the TTF fonts the assumption is that they cover all of utf8/unicode and we
make lazy subset fonts so we don't get errors at the right time; in fact we only
detect that the font lacks a glyph when we are building the subset. That means
we might end up trying to build a subset for a different font in the middle of
building subsets. I'm not sure how feasible it would be to do that.
--
Robin Becker
_______________________________________________
reportlab-users mailing list
reportl...@lists2.reportlab.com
http://two.pairlist.net/mailman/listinfo/reportlab-users
On 18/06/2011 06:58, Glenn Linderman wrote:
So in generating some documents from UTF-8 text files, most of which are in.........
European languages, but one is in Chinese (Simplified script), I discovered that
the Times-New-Roman font that I'd been using for the European languages doesn't
contain the CJK characters. So the Chinese text I have also has some European
characters.
Does the ReportLab API provide a way of selecting multiple fonts at the same
time, so that characters not in one font will be found in another?
For the T1 fonts we have such a mechanism via a list of substitution fonts. That's used in pdfbase.pdfmetrics.unicode2T1 to fix up any encoding issues from the available fonts. That's reasonable for T1 because the maximum number of glyphs is 256.
In the TTF fonts the assumption is that they cover all of utf8/unicode and we make lazy subset fonts so we don't get errors at the right time; in fact we only detect that the font lacks a glyph when we are building the subset. That means we might end up trying to build a subset for a different font in the middle of building subsets. I'm not sure how feasible it would be to do that.
> So I think I'm using TTF fonts, I have no clue that T1 fonts would even work for
> Chinese because of the glyph limit... but in my text editor and browser, they or
> Windows do appropriate font substitutions, and the Chinese, since it is UTF-8,
> "just works".
Unless you are specifically declaring and registering a font it will not be used
in reportlab unless it is one of the standard 14. So for example this is from a
Japanese based test and uses msmincho.ttc
> from reportlab.pdfbase.ttfonts import TTFont
> try:
> msmincho = TTFont('MS Mincho','msmincho.ttc',subfontIndex=0,asciiReadable=0)
> fn = ' file=msmincho.ttc subfont 0'
> except:
> try:
> msmincho = TTFont('MS Mincho','msmincho.ttf',asciiReadable=0)
> fn = 'file=msmincho.ttf'
> except:
> #Ubuntu - works on Lucid Lynx if xpdf-japanese installed
> try:
> msmincho = TTFont('MS Mincho','ttf-japanese-mincho.ttf')
> fn = 'file=msmincho.ttf'
> except:
> msmincho = None
> if msmincho is None:
> c.setFont('Helvetica', 12)
> c.drawString(100,600, 'Cannot find msmincho.ttf or msmincho.ttc')
> else:
> pdfmetrics.registerFont(msmincho)
> c.setFont('MS Mincho', 30)
The same could be done for some of the MS fonts eg pmingliu.ttf or simsun.ttf,
but I am certainly no expert here.
...........
> Being rather ignorant of Windows font APIs (I attempted to research Windows font
> APIs some time back, discovered there were at least 4 different font APIs
> available, couldn't figure out which were the new ones, or the recommended ones,
> and never did figure out any of them, since I didn't know which one to study), I
> wouldn't know either why it "just works" in the text editor and browser, and why
> it couldn't "just work" in reportlab... even if the embedded subset font would
> happen to contain characters from a substituted font, because that is what is
> available on the machine that is creating the PDF.
>
I don't believe we are using any specific API to obtain/find fonts.
> Is there any good reference material for Windows font APIs? I'm not even sure
> what Chinese font is in use on my computer to be substituted in for the
> characters that are not in Times-New-Roman, nor how to determine that, as a
> first step to specifying it for use in reportlab. Whatever it is, it must come
> with Windows.
>
I know that there is an Asian font pack for Acrobat reader (but that allows the
meta information to be rendered properly). In addition you can always add asian
language support from the windows OS, but which fonts that provides I don't
actually know.
> Is there any good reference material for how to solve my problem above using
> reportlab? I could probably figure out and code a solution, if I knew where to
> start...
>
Ask here, others have certainly faced and overcome these problems
iText supports specifying a list of fonts that can be selected that are
checked in order for rendering text using the FontSelector class
(http://api.itextpdf.com/itext/com/itextpdf/text/pdf/FontSelector.html).
I use FontSelector in iText quite extensively and it would be good if I
could use ReportLab instead of calling Java from Python to generate PDFs.
Does ReportLab plan to include an equivalent to this in a future version
or is there a way to emulate it?
In the TTF fonts the assumption is that they cover all of utf8/unicode and we make lazy subset fonts so we don't get errors at the right time; in fact we only detect that the font lacks a glyph when we are building the subset. That means we might end up trying to build a subset for a different font in the middle of building subsets. I'm not sure how feasible it would be to do that.
On 4 January 2015 at 05:18, Glenn Linderman <v+py...@g.nevcal.com> wrote:But this user doesn't always know. Now I could make an assumption about certain character ranges being in certain fonts, but is there a way I can ask reportlab "Does this character exist in this font?"Give us some time to remind ourselves how it works - maybe another 3.5 years ;-)
Remember that we don't draw the text, Adobe Reader (or whatever you use instead) does, and it's quite happy being fed IDs of glyphs which don't exist and/or they may introduce font substitution mechanisms of their own at any time. So while it's useful to know your font might lack a glyph, it doesn't mean it won't get displayed somehow.
There is no convenient high-level API but we ought to add one. Robin, how feasible would that be?
On this front, I note that Just van Rossum's fonttools package is
being updated; Behdad Esfahdod, who's very well known for his work on
rendering Persian text and now works for Google, has been bringing it
up to date:
https://github.com/behdad/fonttools/
This is not what we use for speed reasons but it's probably the right
tool if you want to query and understand what's in a font.
The slowest part in most real-world documents is paragraph-wrapping, which requires us looking up the width of every character to size the words.