tesseract PDF line offset in 4.0.0 alpha.

64 views
Skip to first unread message

Janpieter Sollie

unread,
Jul 8, 2017, 2:24:55 AM7/8/17
to tesseract-ocr
Hello everyone,

I found out tesseract a few days ago, and am experimenting with it to make searchable PDFs.
But I have a few problems, and maybe one of you is able to help:
- when generating PDFs, all spaces are transferred to newlines.  This does not happen when generating a .txt file.  Why?  Is there a tesseract parameter which controls this behaviour?
- when generating PDFs, all text is selectable on the line under where the real text is placed.  Is there an offset in parameters I need to set to 0?

thanks!

DJArty

unread,
Feb 19, 2018, 7:54:09 AM2/19/18
to tesseract-ocr
What exactly pdf viewer / rendered you use?  Did you try another one?

Reply all
Reply to author
Forward
0 new messages