New to command line and tesseract. Errors for PDF

518 views
Skip to first unread message

Jacob Stoker

unread,
Mar 24, 2016, 12:35:39 PM3/24/16
to tesseract-ocr
Hello tesseract-ocr world!

So I'm running tesseract. I have everything installed as far as I can tell. I'm on the El Capitan version.

I run the command:

tesseract mm.pdf mm pdf

hoping to take the file "mm.pdf" and ocr the file into a searchable PDF. I get an error: pixReadMem: size < 12

I assume this means that the text wasn't large enough? Do I just have to rescan the document with different settings?


For my own curiosity, how does tesseract know where the file is in the first place, given that the command was so short?

zdenko podobny

unread,
Mar 24, 2016, 12:38:32 PM3/24/16
to tesser...@googlegroups.com
pdf (mm.pdf) is not image file. It is document file. 
Tesseract accept as input only image files (tiff, png, jpeg... based on your leptonica build)

Zdenko

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f87af2a9-db5c-4278-b362-82949df9773b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages