recognize only from user world list

1,435 views
Skip to first unread message

appl...@gmail.com

unread,
Feb 13, 2017, 2:24:58 AM2/13/17
to tesseract-ocr
Hi, 

I want to detect only from predefined words. I read here (https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc) and tried, but did not work. It still detect words that does not appear in my list.

tesseract image.jpg output --user-words user_words.txt  conf


conf is a text file with this: 

load_system_dawg     F
load_freq_dawg       F
user_words_suffix    user-words

When I check parameters, by  --print-parameters option, all parameters are correctly recognized. Here it is. So I have no idea why it still outputs the word outside the list. 

MacBookST:Desktop Satoshi$ tesseract image.jpg output --user-words user_words.txt --print-parameters conf | grep -i user_words

Tesseract Open Source OCR Engine v3.04.01 with Leptonica

user_words_file user_words.txt A filename of user-provided words.

user_words_suffix user-words A suffix of user-provided words located in tessdata.

MacBookST:Desktop Satoshi$ tesseract image.jpg output --user-words user_words.txt --print-parameters conf | grep -i load_system_dawg

Tesseract Open Source OCR Engine v3.04.01 with Leptonica

load_system_dawg 0 Load system word dawg.

MacBookST:Desktop Satoshi$ tesseract image.jpg output --user-words user_words.txt --print-parameters conf | grep -i load_freq_dawg

Tesseract Open Source OCR Engine v3.04.01 with Leptonica

load_freq_dawg 0 Load frequent word dawg.


Can anyone help me? I found many questions related to this, so I hope someone already figures it out. 

Best, 
Satoshi

appl...@gmail.com

unread,
Apr 8, 2017, 1:46:00 PM4/8/17
to tesseract-ocr
I still cannot restrict the vocabulary into pre-defined ones.  Whatever way I try, it recognize the words that is not in the list.....  I don't know why this simple problem cannot be resolved.... I checked previous discussions and many people have similar questions. If anyone solved this, I would appreciate it if you share the solution. 

Satoshi

2017年2月13日月曜日 2時24分58秒 UTC-5 appl...@gmail.com:
Reply all
Reply to author
Forward
0 new messages