How do I enumerate all fonts in PDF?

247 views
Skip to first unread message

Support

unread,
Jun 18, 2010, 4:37:49 PM6/18/10
to PDFTron PDFNet SDK
Q: Is there a way to enumerate all of the embedded fonts in a
document, or do I need to just collect a list of font names from the
text blocks?

Once I know the name of an embedded font, and we are calling
GetGlyphPath(), how do we know what size buffers to allocate for
operators and data?

----------------------
A: There are several options you can use to enumerate fonts in PDF.

You can follow the same pattern used to extract all embedded images
(as shown in ImageExtract sample - http://www.pdftron.com/pdfnet/samplecode.html#ImageExtract).
In case you are traversing low-level object list you can recognize
because they are dictionaries with Type -> Font entry. In case you are
traversing display list of a page, you can access fonts via element's
GState (element.GetGState().GetFont()).

A third approach is to traverse all font resources listed under page
resource dictionary. For example:

Obj res = page.GetResourceDict();
if (res != null) {
Obj fonts = res.FindObj("Font");
if (fonts != null) {
... now enumerate xobjects in xobjs dictionary ...
for (DictIterator itr = fonts.GetDictIterator(); itr.HasNext();
itr.Next()) {
Font font = new Font(itr.Current());
....
}
}
}



> are calling GetGlyphPath(), how do we know what size buffers to
> allocate for operators and data?

You can use element.GetPointCount() and element.GetPathTypesCount() to
obtain the number of entries in each array.

Ryan

unread,
Feb 9, 2016, 7:43:28 PM2/9/16
to PDFTron PDFNet SDK
Here is the full code to print them out, in VB

Dim itr As PageIterator = doc.GetPageIterator()
While itr.HasNext()  '  Read every page
    Console.WriteLine("Page {0:d} ----------------------------------------", itr.GetPageNumber())
    Dim res As Obj = itr.Current().GetResourceDict()
    If Not res Is Nothing Then
        Dim fonts As Obj = res.FindObj("Font")
        If Not fonts Is Nothing Then
            Dim ditr As DictIterator = fonts.GetDictIterator()
            While ditr.HasNext()
                Dim font As Font = New Font(ditr.Value())
                Console.WriteLine(font.GetFamilyName())
                ditr.Next()
            End While
        End If
    End If
    itr.Next()
End While


Lee Gillie

unread,
Mar 21, 2016, 4:05:55 PM3/21/16
to PDFTron PDFNet SDK on behalf of Ryan
--
You received this message because you are subscribed to the "PDFTron PDFNet SDK" group (http://www.pdftron.com/pdfnet/forum.html).
---
You received this message because you are subscribed to the Google Groups "PDFTron PDFNet SDK" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pdfnet-sdk+...@googlegroups.com.
To post to this group, send email to pdfne...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Here is my code....

    Private Sub FontCatalogToolStripMenuItem_Click(sender As System.Object, e As System.EventArgs) Handles FontCatalogToolStripMenuItem.Click

        Dim itr As PageIterator = PDFDocument.GetPageIterator()

        While itr.HasNext()  '  Read every page
            Debug.WriteLine("Page {0:d} ----------------------------------------", itr.GetPageNumber())

            Dim res As Obj = itr.Current().GetResourceDict()
            If Not res Is Nothing Then
                Dim fonts As Obj = res.FindObj("Font")
                If Not fonts Is Nothing Then
                    Dim ditr As DictIterator = fonts.GetDictIterator()
                    While ditr.HasNext()
                        Dim font As Font = New Font(ditr.Value())
                        Debug.WriteLine(font.GetFamilyName() & " | " & font.GetEmbeddedFontName)

                        ditr.Next()
                    End While
                End If
            End If
            itr.Next()
        End While

    End Sub

Here is sample output from the above....

Page 1 ----------------------------------------
Page 2 ----------------------------------------
Page 3 ----------------------------------------
Page 4 ----------------------------------------
Page 5 ----------------------------------------
Page 6 ----------------------------------------
Page 7 ----------------------------------------
Page 8 ----------------------------------------
            ...
Page 2043 ----------------------------------------
Page 2044 ----------------------------------------
Page 2045 ----------------------------------------
Page 2046 ----------------------------------------
Page 2047 ----------------------------------------

I am running against PDFNet 6.6.0.38591. It is not finding any fonts. Can you suggest what may need change please?

Thanks for your help - Best regards, Lee Gillie CCP

Ryan

unread,
Mar 21, 2016, 4:46:34 PM3/21/16
to PDFTron PDFNet SDK
There might not be any fonts. If you open the PDF in a PDF reader, can you select and copy/paste the text? If not, then there is no actual text.

If there is text, then the issue must be that all the text is in form XObjects. The following code is a more exhaustive search, looking for fonts in XObjects (which in turn can contain nested XObjects)

Dim itr As PageIterator = doc.GetPageIterator()
While itr.HasNext()  
'  Read every page

    Console.WriteLine("Page {0:d} ----------------------------------------", itr.GetPageNumber())
    Dim res As Obj = itr.Current().GetResourceDict()
    If Not res Is Nothing Then
        IterateFonts(res.FindObj("Font"))
        IterateFormXObject(res.FindObj("XObject"))

    End If
    itr.Next()
End While


Sub IterateFonts(fonts As Obj)

    If Not fonts Is Nothing Then
        Dim ditr As DictIterator = fonts.GetDictIterator()
        While ditr.HasNext()
            Dim font As Font = New Font(ditr.Value())
            Console.WriteLine(font.GetFamilyName())
            ditr.Next()
        End While
    End If


Sub IterateFormXObject(xobjects As Obj)
    If Not xobjects Is Nothing Then
        Dim ditr As DictIterator = xobjects.GetDictIterator()
        While ditr.HasNext()
            Dim xobject As Obj = ditr.Value()
            Dim resources As Obj = xobject.FindObj("Resources")
            if Not resources Is Nothing Then
                IterateFonts(resources.FindObj("Font"))
                IterateFormXObject(resources.FindObj("XObject"))
            End If

Lee Gillie

unread,
Mar 22, 2016, 12:10:19 PM3/22/16
to PDFTron PDFNet SDK on behalf of Ryan
XObject suggestion picked up the missing fonts - thanks.

But a question still on the same vein.  I find the fonts on each page. (see below showing glyph counts as evidenced using iterating fonts for each page and using GetCharCodeIterator).

How do these two from page 1 differ?
Name:ABCDEE+Arial,Bold, FamilyName:Arial, EmbeddedFontName: 39 characters
Name:ABCDEE+Arial,Bold, FamilyName:Arial, EmbeddedFontName:ABCDEE+Arial,Bold 39 characters

Or these from page 4?
Name:ABCEEE+Arial,Italic, FamilyName:Arial, EmbeddedFontName: 33 characters
Name:ABCEEE+Arial,Italic, FamilyName:Arial, EmbeddedFontName:ABCEEE+Arial,Italic 33 characters

I think to build a composite list of fonts for the entire document I need to key a dictionary on Name+FamilyName+EmbeddedFontName, and only keep unique occurrences.  Each keyed occurrence seen on separate pages seems to refer to the same glyph count. I suspect they return a reference to the same font object? Also, except for embedded font name property differing, some of these almost look like they might be duplicates.

Our ultimate goal is to provide a pick-list for export of True Type Fonts for the entire document. Storing these temporarily aids us in ancillary processes we will do to enhance the output in the printing process.

Page 1 ----------------------------
Name:ABCDEE+Arial, FamilyName:Arial, EmbeddedFontName: 91 characters
Name:ABCDEE+Times New Roman, FamilyName:Times New Roman, EmbeddedFontName:ABCDEE+Times New Roman 8 characters
Name:ABCDEE+Calibri,Bold, FamilyName:Calibri, EmbeddedFontName: 49 characters
Name:ABCDEE+Calibri,Bold, FamilyName:Calibri, EmbeddedFontName:ABCDEE+Calibri,Bold 49 characters
Name:ABCEEE+Calibri, FamilyName:Calibri, EmbeddedFontName: 45 characters
Name:ABCEEE+Calibri, FamilyName:Calibri, EmbeddedFontName:ABCEEE+Calibri 45 characters
Name:ABCDEE+Arial, FamilyName:Arial, EmbeddedFontName:ABCDEE+Arial 91 characters
Name:ABCDEE+BakerSignet BT, FamilyName:BakerSignet BT, EmbeddedFontName: 43 characters
Name:ABCDEE+BakerSignet BT, FamilyName:BakerSignet BT, EmbeddedFontName:ABCDEE+BakerSignet BT 43 characters
Name:ABCDEE+Arial,Bold, FamilyName:Arial, EmbeddedFontName: 39 characters
Name:ABCDEE+Arial,Bold, FamilyName:Arial, EmbeddedFontName:ABCDEE+Arial,Bold 39 characters
Name:ABCDEE+Arial,BoldItalic, FamilyName:Arial, EmbeddedFontName: 39 characters
Name:ABCDEE+Arial,BoldItalic, FamilyName:Arial, EmbeddedFontName:ABCDEE+Arial,BoldItalic 39 characters
Name:ABCDEE+Arial Black, FamilyName:Arial, EmbeddedFontName:ABCDEE+Arial Black 16 characters
Page 2 ----------------------------
Name:ABCDEE+Arial, FamilyName:Arial, EmbeddedFontName: 91 characters
Name:ABCDEE+Times New Roman, FamilyName:Times New Roman, EmbeddedFontName:ABCDEE+Times New Roman 8 characters
Name:ABCDEE+Arial, FamilyName:Arial, EmbeddedFontName:ABCDEE+Arial 91 characters
Name:ABCDEE+BakerSignet BT, FamilyName:BakerSignet BT, EmbeddedFontName:ABCDEE+BakerSignet BT 43 characters
Page 3 ----------------------------
Name:ABCDEE+Arial, FamilyName:Arial, EmbeddedFontName: 91 characters
Name:ABCDEE+Times New Roman, FamilyName:Times New Roman, EmbeddedFontName:ABCDEE+Times New Roman 8 characters
Name:ABCDEE+Calibri,Bold, FamilyName:Calibri, EmbeddedFontName:ABCDEE+Calibri,Bold 49 characters
Name:ABCEEE+Poor Richard, FamilyName:Poor Richard, EmbeddedFontName:ABCEEE+Poor Richard 21 characters
Name:ABCEEE+Yorktown, FamilyName:Yorktown, EmbeddedFontName:ABCEEE+Yorktown 14 characters
Name:ABCEEE+Arial Narrow, FamilyName:Arial, EmbeddedFontName:ABCEEE+Arial Narrow 40 characters
Name:ABCEEE+Arial Narrow,Bold, FamilyName:Arial, EmbeddedFontName:ABCEEE+Arial Narrow,Bold 35 characters
Name:ABCEEE+Anderson Thunderbirds Are GO!, FamilyName:Anderson Thunderbirds Are GO!, EmbeddedFontName:ABCEEE+Anderson Thunderbirds Are GO! 23 characters
Name:ABCDEE+Arial, FamilyName:Arial, EmbeddedFontName:ABCDEE+Arial 91 characters
Name:ABCEEE+Times New Roman, FamilyName:Times New Roman, EmbeddedFontName: 8 characters
Name:ABCEEE+DFKai-SB, FamilyName:DFKai-SB, EmbeddedFontName:ABCEEE+DFKai-SB 23 characters
Name:ABCEEE+Blue Highway,Bold, FamilyName:Blue Highway, EmbeddedFontName:ABCEEE+Blue Highway,Bold 36 characters
Name:ABCDEE+BakerSignet BT, FamilyName:BakerSignet BT, EmbeddedFontName:ABCDEE+BakerSignet BT 43 characters
Name:ABCDEE+Arial,Bold, FamilyName:Arial, EmbeddedFontName: 39 characters
Name:ABCDEE+Arial,Bold, FamilyName:Arial, EmbeddedFontName:ABCDEE+Arial,Bold 39 characters
Name:ABCDEE+Arial,BoldItalic, FamilyName:Arial, EmbeddedFontName: 39 characters
Page 4 ----------------------------
Name:ABCDEE+Arial, FamilyName:Arial, EmbeddedFontName: 91 characters
Name:ABCDEE+Calibri,Bold, FamilyName:Calibri, EmbeddedFontName: 49 characters
Name:ABCDEE+Calibri,Bold, FamilyName:Calibri, EmbeddedFontName:ABCDEE+Calibri,Bold 49 characters
Name:ABCEEE+Calibri, FamilyName:Calibri, EmbeddedFontName: 45 characters
Name:ABCEEE+Calibri, FamilyName:Calibri, EmbeddedFontName:ABCEEE+Calibri 45 characters
Name:ABCEEE+Arial Narrow,Bold, FamilyName:Arial, EmbeddedFontName:ABCEEE+Arial Narrow,Bold 35 characters
Name:ABCDEE+Arial, FamilyName:Arial, EmbeddedFontName:ABCDEE+Arial 91 characters
Name:ABCEEE+Calibri,Italic, FamilyName:Calibri, EmbeddedFontName: 56 characters
Name:ABCEEE+Calibri,Italic, FamilyName:Calibri, EmbeddedFontName:ABCEEE+Calibri,Italic 56 characters
Name:ABCEEE+Engravers MT, FamilyName:Engravers MT, EmbeddedFontName:ABCEEE+Engravers MT 13 characters
Name:ABCEEE+Arial,Italic, FamilyName:Arial, EmbeddedFontName: 33 characters
Name:ABCEEE+Arial,Italic, FamilyName:Arial, EmbeddedFontName:ABCEEE+Arial,Italic 33 characters
Name:ABCEEE+Castellar, FamilyName:Castellar, EmbeddedFontName:ABCEEE+Castellar 30 characters
Name:ABCFEE+Gisha,Bold, FamilyName:Gisha, EmbeddedFontName:ABCFEE+Gisha,Bold 49 characters
Name:ABCFEE+Watson, FamilyName:Watson, EmbeddedFontName:ABCFEE+Watson 19 characters
Name:ABCDEE+Arial,Bold, FamilyName:Arial, EmbeddedFontName: 39 characters
Name:ABCDEE+Arial,BoldItalic, FamilyName:Arial, EmbeddedFontName: 39 characters

As always, thanks for helping us to understand.

Ryan

unread,
Mar 22, 2016, 2:48:50 PM3/22/16
to PDFTron PDFNet SDK
When you download our desktop SDK, there is a tool called COSEdit in it. This allows you to graphically navigate the PDF. The code we provided earlier, contains the structure+keys that you would look for. Though to get to a page you go through /Root/Pages.

There might be duplicates, or they might be different fonts, but with the same FontFile object, so they actually share the same binary glyph data.

You could also send the PDF to support at pdftron.com for review.

Finally, it would help if you explained what your overall objective is.
Reply all
Reply to author
Forward
0 new messages