setting user-words in api?

63 views
Skip to first unread message

Jochen Naumann

unread,
Jul 3, 2019, 11:29:59 AM7/3/19
to tesseract-ocr
Hi, I can set the user-words file on the command line with tesseract tool, but how do I set this using the api? 
I searched for it in the sourcecode but could not find it, woult appreciate any help.

Quan Nguyen

unread,
Jul 3, 2019, 12:16:03 PM7/3/19
to tesseract-ocr

Jochen Naumann

unread,
Jul 3, 2019, 1:59:07 PM7/3/19
to tesser...@googlegroups.com
Thanks, I already tried api->SetVariable("user_words_suffix", "user-words");
Did not work, while specifying it in a config file and using the command line tesseract tool it works.
I used a file monitor tool to see if the process tries to open a user-words file, but it did not. The tesseract tool however does.
But I am using 4.1, where this is fixed.
Do you have a working example?



--
You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/vN6jRopxB5Y/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/6d3c039b-1d58-427f-b53a-5ef8a3639c40%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Zdenko Podobny

unread,
Jul 3, 2019, 2:31:17 PM7/3/19
to tesser...@googlegroups.com
If command line work for you that most easy way is to follow tesseract executable code[1]:
IMO you need to use variable user_words_file; AFAIR user_words_suffix specifies only file extension...
Then it should work[2] e.g. tessseract will load user words (effect on recognition is other topic).

st 3. 7. 2019 o 19:59 Jochen Naumann <jochen....@gmail.com> napísal(a):
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Jochen Naumann

unread,
Jul 4, 2019, 7:10:46 AM7/4/19
to tesser...@googlegroups.com
 user_words_file  also does not work, the file is not loaded ( checked with file monitor).


Shree Devi Kumar

unread,
Jul 5, 2019, 12:38:57 AM7/5/19
to tesser...@googlegroups.com
I have made a wiki page for using user_patterns with API. Please see https://github.com/tesseract-ocr/tesseract/wiki/APIExample-user_patterns

You can try similarly for user_words.


For more options, visit https://groups.google.com/d/optout.


--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

Jochen Naumann

unread,
Jul 5, 2019, 4:25:39 AM7/5/19
to tesser...@googlegroups.com
Thanks, Shree. I appreciate your help!
I tried your example and it works with your image. Bit it does not work with the attached  image test2.jpg,. Tesseract always reads the O as 0, although I provided the following pattern: L9143CO\d\d\d
I added the user_words_file parameter to the config file, but the setting is ignored (file monitor shows that my.patterns is accessed but tesseract api never tries to open a file called my.user-words)

my config file:

user_patterns_file my.patterns
user_words_file my.user-words
lstm_use_matrix 1

Have a nice day.

test2.jpg

Shree Devi Kumar

unread,
Jul 5, 2019, 6:08:02 AM7/5/19
to tesser...@googlegroups.com
I haven't tried user_words yet. 
pre-processing the image gets you better results.

It works with the modified image and 

\A\d\d\d\d\A\A\d\d\d




For more options, visit https://groups.google.com/d/optout.
test2.png
Reply all
Reply to author
Forward
0 new messages