Hi all,
I have an issue with providing list of user word to tesseract. I use Windows 10.
Installed tesseract version:
>tesseract.exe -v
tesseract v5.0.0-alpha.20191030
leptonica-1.78.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
Found AVX2
Found AVX
Found FMA
Found SSE
Found libarchive 3.3.2 zlib/1.2.11 liblzma/5.2.3 bz2lib/1.0.6 liblz4/1.7.5
My test image:
I have "eng.user-words" file in the directory with traindata files that contains:
Config file "bazaar" as follow:
load_system_dawg F
load_freq_dawg F
user_words_file path/to/eng.user-words
user_words_suffix user-words
language_model_penalty_non_freq_dict_word 1
language_model_penalty_non_dict_word 1
Running this command
"C:\Program Files\Tesseract-OCR\tesseract.exe" test.jpg stdout -l eng bazaar
gives "Bladeblabla" instead of "B1adeb1ab1a"
As well as this command
"C:\Program Files\Tesseract-OCR\tesseract.exe" test.jpg stdout -l eng --user-words path/to/eng.user-words
gives "Bladeblabla" instead of "B1adeb1ab1a"
Where am I wrong?