Hi:When trying to OCR the above image, the date 09/02/2017 is always wrong, (0G/02/2017).
This is tesseract 4 running on linux, the cmd line is:
tesseract stdin stdout -l eng --psm 11 --oem 1 -c textonly_pdf=1 -c tessedit_create_pdf=1 | pdftotext -layout - -
Is there any way to pre-process the image to make it work? (preferably using convert)
Thanks
Hongguo An
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/fd0e766e-fba2-43a7-91ea-51de94f621b2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Maybe try a Gaussian blur and upsize a bit? Something like:
convert -blur 2x10 -resize 110%
art
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.