I am trying to find a way on how to define my own words list for tesseract. I want to use only my defined words and guess the most likely one.
So I have a small image with a single word in it. I process it with this command in order to get pure "black on white" type of image:
$ convert -colorspace gray -auto-level -threshold 60% -type bilevel -depth 8 image.png newimage.png
Then I try to extract a single word from that image:
$ tesseract newimage.png -psm 8 stdout
and it returns a single word (which is great), but slightly incorrect:
Expectation: nieko
Result: flieko
I've just spent like 5+ hours trying to find any documentation or tutorial on how to set a whitelist dictionary for words recognition. Any tips on that?