Re: Preprocess Image

Message has been deleted

Hongguo An

unread,

Jun 4, 2018, 1:22:43 PM6/4/18

to tesseract-ocr

Can anybody help? thanks in advance

On Thursday, May 31, 2018 at 12:57:20 PM UTC-7, Hongguo An wrote:

Hi:
When trying to OCR the above image, the date 09/02/2017 is always wrong, (0G/02/2017).

This is tesseract 4 running on linux, the cmd line is:
tesseract stdin stdout -l eng --psm 11 --oem 1 -c textonly_pdf=1 -c tessedit_create_pdf=1 | pdftotext -layout - -

Is there any way to pre-process the image to make it work? (preferably using convert)

Thanks
Hongguo An

ShreeDevi Kumar

unread,

Jun 4, 2018, 1:50:50 PM6/4/18

to tesser...@googlegroups.com

Take a look at http://www.fmwconcepts.com/imagemagick/textcleaner/

and other scripts by Fred

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/fd0e766e-fba2-43a7-91ea-51de94f621b2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Art Rhyno.

unread,

Jun 5, 2018, 7:59:20 PM6/5/18

to tesser...@googlegroups.com

Maybe try a Gaussian blur and upsize a bit? Something like:

convert -blur 2x10 -resize 110%

art

--

You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

Reply all

Reply to author

Forward