How to extract cropped text

19 views

Skip to first unread message

Anatoly Kudrevatukh

unread,

Oct 22, 2013, 1:35:16 PM10/22/13

to pdfne...@googlegroups.com

Q: The text can be cropped due to the visual page dimensions, but we need to retrieve all text even if it is outside of the page bounds. Could you let me know if there is any way to do this using the TextExtractor.

A: You can do it by adjusting crop box on the page (page.SetCropBix(big_rect)) prior to text extraction.

If you want to know exact crop box that includes all elements you can use element reader to get a union of all bboxes on the page.

Alternatively according to PDF spec the maximum page dimensions should be 14,400 by 14,400 units, so you can use that as a crop box value.

Reply all

Reply to author

Forward

0 new messages