question about training tesseract

546 views
Skip to first unread message

morteza neishaboori

unread,
Jun 27, 2014, 6:47:55 AM6/27/14
to tesser...@googlegroups.com
Hello,
I want to train tesseract to detect words in such images in the link below!

I tried but it was not successful! now I will be happy if somebody can give me some hints if it's at all possible to do this with tesseract?!

Paul

unread,
Jul 1, 2014, 4:22:46 PM7/1/14
to tesser...@googlegroups.com
This paper suggests a binarization approach that might be helpful with your imagery. Unfortunately you need to implement it on your own in a preprocessing step,
since Tesseract only uses Otsu's method for binarization. Thus the bad results.

zdenko podobny

unread,
Jul 1, 2014, 5:04:31 PM7/1/14
to tesser...@googlegroups.com
Well, leptonica also provide some binarization methods (see source code[1]). Some explanation can be found at web[2]. Of course there are other binarization methods with published code - e.g. c++ source code for Niblack, Wolf can be found on christian wolf page[3]

IMO in this case it should be worthy to have a look at page segmentation - there is (older) presentation of leptonica posibilities[4].

Zdenko


--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/c149a5a9-f72c-4fa8-8f78-9432715d380c%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages