How to search and highlight text using PDFNet?

686 views
Skip to first unread message

Support

unread,
Jun 6, 2008, 3:03:57 PM6/6/08
to PDFTron PDFNet SDK
Q: I have a requirement similar to "How to search and highlight text
using PDFNet" (http://groups.google.com/group/pdfnet-sdk/browse_thread/
thread/18fa3d43e99f647d/)?

Unfortunately i am getting errors trying to compile your code on the
page. I was able to compile, execute and adapt quite a bit of your
samples , but my particular requirement is to highlight certain
strings in the pdf and add bookmarks to the pages where this string
exists. The first part of the requirement where i have to highlight a
search string is a major requirement which i am not able to get past.

The following is the location where i am getting the error:

Obj quads = Obj.CreateArray();

-------
A: This code snippet is using some APIs that are deprecated starting
with PDFNet v4. For instructions on how to move old code to the latest
API, please see http://www.pdftron.com/net/pdfnet4_upgrade.txt.

The following is the updated 'PDF highlight' sample:


//---------------------------------------------------
// The following sample illustrates how to programmatic highlight
text.
// The sample is using TextExtractor to extract words and PDFDraw
class to
// rasterize pages with highlight annotations. The sample also saves
modified
// PDF documents that includes highlighted text.
//
// If you are looking for interactive text selection and highlighting,
PDFView
// class already includes built-in tool modes for text search and
highlighting.
// For a concrete example of how to use these functions, please take a
look at
// the latest version of PDFView sample project.
//---------------------------------------------------

using System;
using pdftron;
using pdftron.Common;
using pdftron.Filters;
using pdftron.SDF;
using pdftron.PDF;

namespace TextHighlightTestCS
{
class PDFTextHighligh
{
// Use PDFNet to generate appearance stream for highlight
annotation.
static Obj CreateHighlightAppearance(PDFDoc doc, Rect bbox, ColorPt
higlight_color)
{
// Create a button appearance stream
------------------------------------
ElementBuilder build = new ElementBuilder();
ElementWriter writer = new ElementWriter();
writer.Begin(doc);

// Draw background
Element element = build.CreateRect(bbox.x1 - 2, bbox.y1, bbox.x2 +
2, bbox.y2);
element.SetPathFill(true);
element.SetPathStroke(false);
GState gs = element.GetGState();
gs.SetFillColorSpace(ColorSpace.CreateDeviceRGB());
gs.SetFillColor(higlight_color);
gs.SetBlendMode(GState.BlendMode.e_bl_multiply);
writer.WriteElement(element);
Obj stm = writer.End();

build.Dispose();
writer.Dispose();

// Set the bounding box
stm.PutRect("BBox", bbox.x1, bbox.y1, bbox.x2, bbox.y2);
stm.PutName("Subtype", "Form");
return stm;
}

// Create a Highlight Annotation.
static Annot CreateHighlightAnnot(PDFDoc doc, Rect bbox, ColorPt
highlight_color)
{
Annot a = Annot.Create(doc, Annot.Type.e_Highlight, bbox);
a.SetColor(highlight_color);
a.SetAppearance(CreateHighlightAppearance(doc, bbox,
highlight_color));

Obj quads = a.GetSDFObj().PutArray("QuadPoints");
quads.PushBackNumber(bbox.x1);
quads.PushBackNumber(bbox.y2);
quads.PushBackNumber(bbox.x2);
quads.PushBackNumber(bbox.y2);
quads.PushBackNumber(bbox.x1);
quads.PushBackNumber(bbox.y1);
quads.PushBackNumber(bbox.x2);
quads.PushBackNumber(bbox.y1);
return a;
}

static void Main(string[] args)
{
PDFNet.Initialize();
PDFNet.SetResourcesPath("../../../../../resources");

// Relative path to the folder containing test files.
const string input_path = "../../../../TestFiles/";
const string output_path = "../../../../TestFiles/Output/";

try
{
PDFDoc doc = new PDFDoc(input_path + "newsletter.pdf");
doc.InitSecurityHandler();

// Highlight all "Robin" instances in the input document.
ColorPt highlight_color = new ColorPt(1, 1, 0); // Yellow

TextExtractor txt = new TextExtractor(); // Used to extract words
Rect word_bbox = new Rect();

PDFDraw pdfdraw = new PDFDraw(96); // Used to export PDF pages to
bitmap.

PageIterator itr = doc.GetPageIterator();
for (; itr.HasNext(); itr.Next())
{
Page page = itr.Current();
txt.Begin(page); // Read the page.

// Example 2. Extract words one by one.
TextExtractor.Word word;
String word_str;
for (TextExtractor.Line line = txt.GetFirstLine();
line.IsValid(); line=line.GetNextLine())
{
for (word=line.GetFirstWord(); word.IsValid();
word=word.GetNextWord())
{
word_str = word.GetString().ToUpper(); // For case-insensitive
search.
if (word_str.StartsWith("ROBIN") ||
word_str.EndsWith("ROBIN"))
{
word_bbox = word.GetBBox();
// Console.WriteLine("{0} \t bbox: {1}, {2}, {3}, {4}\n",
word, word_bbox.x1, word_bbox.y1, word_bbox.x2, word_bbox.y2);
page.AnnotPushBack(CreateHighlightAnnot(doc, word_bbox,
highlight_color));
}

}
}

string outname = string.Format("{0}out{1:d}.jpg", output_path,
itr.GetPageNumber());
Console.WriteLine(outname);
pdfdraw.Export(page, outname, "jpg");
}

pdfdraw.Dispose();
txt.Dispose();

doc.Save(output_path + "output.pdf",
SDFDoc.SaveOptions.e_linearized);
doc.Close();
Console.WriteLine("Done.");
}
catch (PDFNetException e)
{
Console.WriteLine(e.Message);
}
}
}
}

http://pdfnet-sdk.googlegroups.com/web/HighlightPDFText.cs

Dillon, Gregory

unread,
Jun 26, 2008, 3:39:05 PM6/26/08
to pdfne...@googlegroups.com
We are noticing that images compressed using JBIG2 are only readable by
Adobe Reader/Acrobat 8 (via Web plugin). Is there a way to make the
compression capable with Acrobat 7 and above?

Greg Dillon

Support

unread,
Jun 26, 2008, 5:50:50 PM6/26/08
to PDFTron PDFNet SDK

JBIG2 compression is available starting with PDF 1.4 (i.e. Acrobat 5)
and it does not require a special plug-in. PDFNet SDK can also
compress & decompress embedded JBIG2 images.

Support

unread,
Nov 15, 2011, 3:08:27 PM11/15/11
to pdfne...@googlegroups.com
 
 
The above link is no longer active. The updated link is:
 
Reply all
Reply to author
Forward
0 new messages