--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f85d93e3-ea49-47bc-aab9-5af9b4a268b1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hello,if you are referring to some code ("inspecting the code I think found some pieces...") please make a reference/link to it.Tesseract is able to OCR everything that is leptonica able to open or everything you or programmer is able to convert to leptonica PIX structure ;-)I did not have a change to test leptonica 1.71, but 1.70 was not able to open pdf. So the answer to your 1. question is no. leptonica/tesseract do not support OCR-ing of multi-page PDFs neither single pdf. But it support multi-page tif.
Regarding your question 2 - I am not aware about any such initiative. tesseract is OCRing images and pdf is not image format but document format (e.g. request to OCR pdf is the same as request to OCR odt, doc, docx, html etc.).
Am Dienstag, 5. August 2014 09:25:35 UTC+2 schrieb zdenop:Hello,if you are referring to some code ("inspecting the code I think found some pieces...") please make a reference/link to it.Tesseract is able to OCR everything that is leptonica able to open or everything you or programmer is able to convert to leptonica PIX structure ;-)I did not have a change to test leptonica 1.71, but 1.70 was not able to open pdf. So the answer to your 1. question is no. leptonica/tesseract do not support OCR-ing of multi-page PDFs neither single pdf. But it support multi-page tif.
I have tried this twice, but this approach failed (as far as I remember I got these messages http://stackoverflow.com/questions/5083492/problem-with-tesseract-and-tiff-format ). I will try to investigate, why (or what I did wrong) and - in case that the problem persists - post as a regular bug report. Currently, I am unsure what really happened.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/a9ddcb3e-ea37-4a62-839d-ee5c2e32cd20%40googlegroups.com.
My current investigation showed that Leptonica cannot convert an input multi-page PDF to TIFF multi-page.