Scan Tailor and Tesseract OCR

100 views
Skip to first unread message

Markus Hofinger

unread,
Jun 4, 2016, 5:45:03 PM6/4/16
to scantailor-devel
Hi all,

and thank you for Scan-Tailor! It is truly an amazing program for archiving purposes!

I used it in combination with Tesseract OCR to archive an ancient book for private purposes...
The result is like you know it from google books - which means you have a pdf file with (grayscale png) images of the book and invisible searchable text exactly alingd in front of it. So one can read the book in original quality, and still search and copy passages.

So I don*t know if you guys know Tesseract-OCR, but it is a truly awesome open source OCR program that comes pre-trained language depending files. It even supports ancient things like gothic/fractura fonts etc.

So my suggestion would be, to build an easy interface for Tesseract-ocr into Scan-Tailor - maybe resulting in Scan-Tailor and OCR ;-)
This could boost the interest for Scan Tailor and reduce the effort to digitize pretty much everything. :)
By the way both projects reside on github :)


Regards Max
Reply all
Reply to author
Forward
0 new messages