How do I extract text stored in existing PDF annotation appearance stream?

148 views
Skip to first unread message

Support

unread,
May 22, 2013, 2:47:42 PM5/22/13
to pdfne...@googlegroups.com
Q:
 

I'd like to know if I can get the text written to an annotation using ElementBuilder#CreateTextRun after the annotation has been created in PDFNet.

            using (ElementBuilder builder = new ElementBuilder())
            {
                using (ElementWriter writer = new ElementWriter())
                {
                    writer.Begin(doc, true);
                    Element el;
                    // ...Other elements are created and written as well
                    el = builder.CreateTextRun(text, font, fontSize);

                    el.GetGState().SetFillColorSpace(ColorSpace.CreateDeviceRGB());
                    el.GetGState().SetFillColor(textColorPt);
                    // ticket #972
                    el.GetGState().SetFillOpacity(textOpacity));

                    Rect txtBox = new Rect();
                    el.GetBBox(txtBox);

                    Matrix2D textMatrix;
                    switch (rotation)
                    {
                        case Page.Rotate.e_90: textMatrix = new Matrix2D(0, 1, -1, 0, (bound.x2 + bound.x1 + .8 * txtBox.Height()) / 2 + offset, (bound.y1 + bound.y2 - txtBox.Width()) / 2);
                            break;
                        case Page.Rotate.e_180: textMatrix = new Matrix2D(-1, 0, 0, -1, (bound.x2 + bound.x1 + txtBox.Width()) / 2, (bound.y1 + bound.y2 + 0.8 * txtBox.Height()) / 2 + offset);
                            break;
                        case Page.Rotate.e_270: textMatrix = new Matrix2D(0, -1, 1, 0, (bound.x2 + bound.x1 - .8 * txtBox.Height()) / 2 - offset, (bound.y1 + bound.y2 + txtBox.Width()) / 2);
                            break;
                        default: textMatrix = new Matrix2D(1, 0, 0, 1, (bound.x2 + bound.x1 - txtBox.Width()) / 2, (bound.y1 + bound.y2 - 0.8 * txtBox.Height()) / 2 - offset);
                            break;
                    }

                    el.SetTextMatrix(textMatrix);

                    writer.WriteElement(el);
                    writer.WriteElement(builder.CreateTextEnd());

                    created = writer.End();
                }
            }
----------------
A:
 

To get the Unicode string for a text element, you can call Element.GetTextString(). To get the raw text data instead, you can call Element.GetTextData(). Please check out the API reference for pdftron.PDF.Element for more details.

 

Another (simpler) option that may sometime work is using annot.GetContents().

If text is under annotation (e.g. for highlight) you can use TextExtractor.GetTextUnderAnnot(annot)

 

Reply all
Reply to author
Forward
0 new messages