Re: Preprocess Image

126 views
Skip to first unread message
Message has been deleted

Hongguo An

unread,
Jun 4, 2018, 1:22:43 PM6/4/18
to tesseract-ocr
Can anybody help? thanks in advance

On Thursday, May 31, 2018 at 12:57:20 PM UTC-7, Hongguo An wrote:

Hi:
When trying to OCR the above image, the date 09/02/2017 is always wrong, (0G/02/2017).


This is tesseract 4 running on linux, the cmd line is: 

tesseract stdin stdout -l eng --psm 11 --oem 1 -c textonly_pdf=1 -c tessedit_create_pdf=1 | pdftotext -layout - -


Is there any way to pre-process the image to make it work? (preferably using convert)


Thanks

Hongguo An

ShreeDevi Kumar

unread,
Jun 4, 2018, 1:50:50 PM6/4/18
to tesser...@googlegroups.com
and other scripts by Fred

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/fd0e766e-fba2-43a7-91ea-51de94f621b2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Art Rhyno.

unread,
Jun 5, 2018, 7:59:20 PM6/5/18
to tesser...@googlegroups.com

Maybe try a Gaussian blur and upsize a bit? Something like:

 

convert -blur 2x10 -resize 110%

 

art

--

You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

Reply all
Reply to author
Forward
0 new messages