Creating 'Searchabe PDF Images' from OCR-ed 'hidden' text and scanned TIFFs.

184 views
Skip to first unread message

Support

unread,
May 22, 2007, 3:25:13 PM5/22/07
to PDFTron PDFNet SDK
Q:

We perform the OCR on the TIFF file with third party software. It just
comes out as text file.

Can you please send me a sample code to merge images as sarchable PDF
document? It this possible?

---

A:

Besides OCR text output you would also need to know the positioning
information for each word. This is important so that text highlighting
works as expected. Using this information you can add hidden text to
existing PDF images using ElementBuilder and ElemenWriter API along
the following lines:

PDFNet.Initialize();
try
{
PDFDoc doc = new PDFDoc("my.pdf");
doc.InitSecurityHandler();

ElementBuilder eb = new ElementBuilder();
ElementWriter writer = new ElementWriter();

Page page = doc.PageBegin().Current(); // Get the first page
writer.Begin(page); // Begin writing to the page

// Begin writing a block of text
Element element = eb.CreateTextBegin(
Font.Create(doc,
Font.StandardType1Font.e_times_roman), 12);
writer.WriteElement(element);

string txt = "Hello World!";
element = eb.CreateTextRun(txt);

// Set text positioning matrix...
// Scale-up text 5 times and shift it by (0,600)
element.SetTextMatrix(5, 0, 0, 5, 0, 600);

// In order to make invisible text that can be highlighted or
searched,
// you need to set TextRenderingMode flag in the graphics state of
the
// text element.
GState gstate = element.GetGState();

// During testing/debugging you may want to comment-out the following
line:
gstate.SetTextRenderMode(GState.TextRenderingMode.e_invisible_text);
writer.WriteElement(element);

// Finish the block of text
writer.WriteElement(eb.CreateTextEnd());
writer.End();

doc.Save("out.pdf", 0);
doc.Close();
}
catch (PDFNetException e) {
Console.WriteLine(e.Message);
}

The above sample code adds some hidden text to an existing PDF
documents. Using PDFNet you can also dynamically create PDF document
from existing image/TIFF files as illustrated in AddImage sample
project (http://www.pdftron.com/net/samplecode.html#AddImage).

For more examples of how to add new page content, please see
ElementBuilder sample project (http://www.pdftron.com/net/
samplecode.html#ElementBuilder).

Reply all
Reply to author
Forward
0 new messages