How do I use custom words list (whitelist only) to recognise only 1 word from image?

123 views
Skip to first unread message

Erikas Rudinskas

unread,
Sep 22, 2018, 12:32:32 PM9/22/18
to tesseract-ocr
Hi,

I am trying to find a way on how to define my own words list for tesseract. I want to use only my defined words and guess the most likely one.

So I have a small image with a single word in it. I process it with this command in order to get pure "black on white" type of image:

$ convert -colorspace gray -auto-level -threshold 60% -type bilevel -depth 8 image.png newimage.png

Then I try to extract a single word from that image:

$ tesseract newimage.png -psm 8 stdout

and it returns a single word (which is great), but slightly incorrect:

Expectation: nieko
Result: flieko

I've just spent like 5+ hours trying to find any documentation or tutorial on how to set a whitelist dictionary for words recognition. Any tips on that?

Nicolle Alexandre

unread,
Dec 2, 2018, 11:43:28 PM12/2/18
to tesseract-ocr
how did you do it? i am having the same problem
Reply all
Reply to author
Forward
0 new messages