Understanding tesseract functionality?

63 views
Skip to first unread message

sesivets

unread,
May 5, 2015, 4:07:26 AM5/5/15
to tesser...@googlegroups.com
Hello
I'm in the process of looking for a C++ OCR library for recognizing difficult to parse text in PDF files and I'm wondering if tesseract-OCR is used for this kind of thing.

Basically, some PDF files are corrupted or have non-standard encoding and I can't parse them using existing parsing tools built in C++.  What I would then normally do is convert the pdf page (each page, one at a time) into an image file and then re-print it as a PDF file. I would then run Adobe's OCR Text Recognition function on it and then go on to parse the pdf file. .

I'm wondering if tesseract can be used for this kind of thing? I need an OCR library in C++ to incorporate in my programs and I'm unsure if tesseract is such a library or not. 

Thanks
Reply all
Reply to author
Forward
0 new messages