tesseract with user-words

529 views
Skip to first unread message

Justin Seabrook

unread,
Sep 20, 2022, 8:36:26 AM9/20/22
to tesseract-ocr
I know there are some similar posts - I've read them all! - but they don't seem to provide an answer.  I'm in  Windows 11 with Tesseract 5.2.0.20220712.

I was having trouble applying a user word list instead of the dawg list so I made a very simple example with one is not correctly detected plus a user-words file with one entry of a close match.

So, here's the image, temp.png, which is a slightly blurred image of "testW0rd", and using this command:
"C:\Program Files\Tesseract-OCR\tesseract" temp.png output --psm 3
I get the result "testwurd" in output.txt.

OK, so following instructions in now when I put a file called eng.user-words with one entry - "testWord" in C:\Program Files\Tesseract-OCR\tessdata and a text file called bazaar in C:\Program Files\Tesseract-OCR\tessdata\configs with the following lines:
load_system_dawg     F
load_freq_dawg       F
user_words_suffix    user-words
language_model_penalty_non_dict_word 1

And run again, I get the same result as before: "testwurd".  It doesn't seem to be using the user-words file?  Or rather since it errors if it's not there, it is accessing it but possibly not doing anything with it?

Any ideas why this is not working, would really appreciate some help with this from an expert.

eng.user-words
bazaar
temp.png
Reply all
Reply to author
Forward
0 new messages