I am a newbie...
Is there a standard way to extract text from PDF using tesseract-ocr ?
Thanks
I would not recommend that, as it resamples the image. The pdfimages
program extracts raster images from PDF. These you can then feed to tesseract.
The text is actually stored as text, rather than as images, then
pdftotext will extract the text.
Someone else may have a better solution though.
Hi
Thanks
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com.
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en.
Unless your PDF is comprised of images, this is not the way to go. PDF
is a document format, not an image format. Use a tool like pdftotext.
James