What can you expect from --user-words option?

55 views
Skip to first unread message

Youcef

unread,
Jun 1, 2017, 11:44:59 AM6/1/17
to tesseract-ocr
Hi,

I have searched a lot about --user-words option in the internet to know more about it, but unsuccessfully.

I'am treating a simple case with spanish trained data doing :

api/tesseract -l spa --psm 6 test.png output  tessdata/configs/unlv;

I expect the following output from my image :

numero de documento

but instead, i'm getting

mumsro ne odcumento

It's a little bit frustrating because that words don't exist in spanish. So i define a spa.user-words file like :

documento
numero

and run the following command line :

api/tesseract --user-words spa.user_words -l spa --psm 6 test.png output  tessdata/configs/unlv;


But i still got bad ocr

numero de documento

Am I using --user-words option in the right way? Can i get wanted results using this option?
Many thanks

PS : I have also uncombine spa.traineddata , add 'documento' and 'numero' to the spa.freq-dawg, and recombine but without any improvment.


Reply all
Reply to author
Forward
0 new messages