Hi all,
I just want to update the info i have about tesseract.
I would need an OCR program that can recognize text in scanned documents.
Those are in jpg or multipage pdf format.
Pages may be up side down.
They also might contain images, tables and headings.
Can i recognize those pages out of the box with tesseract?
Can tesseract also recognize tables and headings?
A few years ago someone would need to process the images first.
Is this still the status?
Greetings,
Simon
Hello sir,
I have
read your project description.
Recently, I worked on the very similar OCR project to yours.
In that project, OCR recognized texts, numbers and symbols from PDF draft.
OCR using Tesseract OCR was good but to ensure more accuracy, I preprocessed images from PDF with OpenCV.
Finally, I could provide the wonderful OCR results.
You can test OCR on this link with attached PDF files.
I am sure I can help you with your project.
Thanks.
Kind Regards.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/a6d49be8cccb40c287d480a9e0053807%40hohenems.at.
Hi Misti,
Thanks for the info.
Will have a look at that.
Yes getting a good picture as a blind person isn't all that easy.
Which output format might be the best to preserve the most formatting, headings and other things? hocr?
Greetings,
Simon
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAEnOb6S_Xrz%3D8LY_Gf8BbAdVoJZAqPR09tO6PpnKW-5C-Y%2Bt4g%40mail.gmail.com.