OCR broken characters in images using Tessearact

84 views
Skip to first unread message

harsha

unread,
Nov 21, 2019, 11:38:19 AM11/21/19
to tesseract-ocr
Hi ,


test1.png

I am trying to OCR images and struggling with images like test1.png




test2.png

after applying image processing, the best I could reach is test2.png to make tesseract give some related results



I have tried tesseract with all possible psm values on test2.png and the best result I could get is 27837 "  with --psm 6
Is their a way I can improve my output on this kind of images . I have lots of them in my dataset.
Any help would be appreciated


Thanks
test1.png
test2.png

Shree Devi Kumar

unread,
Nov 22, 2019, 12:17:32 AM11/22/19
to tesseract-ocr
convert test1.png -despeckle -despeckle -despeckle -despeckle -despeckle -despeckle -despeckle -despeckle -despeckle -despeckle miff:- | textcleaner -f 25 -o 10 - result.png
convert -units PixelsPerInch result.png -resample 300  result1.png
tesseract result1.png -
27627


--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/31a12bb1-4552-4a26-9fd7-6da0e54b5b90%40googlegroups.com.


--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

lucmaa

unread,
Nov 23, 2019, 1:23:12 AM11/23/19
to tesseract-ocr
Hi, shree
Why is the option -despeckle repeated so many times in the command convert? 

On Friday, 22 November 2019 13:17:32 UTC+8, shree wrote:
convert test1.png -despeckle -despeckle -despeckle -despeckle -despeckle -despeckle -despeckle -despeckle -despeckle -despeckle miff:- | textcleaner -f 25 -o 10 - result.png
convert -units PixelsPerInch result.png -resample 300  result1.png
tesseract result1.png -
27627


On Thu, Nov 21, 2019 at 10:08 PM harsha <harsha1...@gmail.com> wrote:
Hi ,


test1.png

I am trying to OCR images and struggling with images like test1.png




test2.png

after applying image processing, the best I could reach is test2.png to make tesseract give some related results



I have tried tesseract with all possible psm values on test2.png and the best result I could get is 27837 "  with --psm 6
Is their a way I can improve my output on this kind of images . I have lots of them in my dataset.
Any help would be appreciated


Thanks

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.

Shree Devi Kumar

unread,
Nov 23, 2019, 1:43:13 AM11/23/19
to tesseract-ocr

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/95a1e6c4-f4e7-4519-9754-364ebb906099%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages