Textextractor not extracting text correctly

28 views
Skip to first unread message

Dennis Van Acker

unread,
Jul 7, 2016, 3:00:57 PM7/7/16
to PDFTron PDFNet SDK
I'm using a textextractor to select text from a rect.

Unfortunatly, the extracted text isn't always correct, it sometimes
adds an extra character at the end or cuts off a character in the beginning

I can see that the rectangle is correct because i highlight it...

what i'm doing:

 TextExtractor txtExtractor = new TextExtractor();
 txtExtractor.Begin(page, hightlightAnnot.GetRect(),TextExtractor.ProcessingFlags.e_remove_hidden_text);
 hightlightAnnot.SetContents(txtExtractor.GetAsText());
 string text = hightlightAnnot.GetContents();

Ryan

unread,
Jul 15, 2016, 12:26:38 PM7/15/16
to PDFTron PDFNet SDK
When you say sometimes, you mean on the same document you get different results on subsequent tries, or do you mean on different documents?

To diagnose the issue further we need the following information.

1. Input file(s)
2. Generated output
3. Clear description of what you expected to get (include screenshot if appropriate as we may not see what you do)

If you can't post the document here, send to support at pdftron.com

Reply all
Reply to author
Forward
0 new messages