Setting User_Patterns does not seem to effect output. (Alternatively, are there any ways to limit tesseract to output strings that are a specific length?)

45 views
Skip to first unread message

David Orshan

unread,
Mar 2, 2015, 12:32:17 PM3/2/15
to tesser...@googlegroups.com
Hi,

I want to have tesseract recognize images that I know contain a single word that is 8 characters long.

I found a few mentions of user_patterns here: http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html#_config_files_and_augmenting_with_user_data, which seems to be the solution I need, so I tried following the instructions, but I can't seem to get the file to effect my output. As a sanity check, I tried setting user_patterns to only contain a string of "\d\d\d\d\d\d\d\d", which I thought should cause an output of only numbers, but there is no effect (i'm getting outputs that are 4 characters long and only letters). I also tried changing the language_model_penalty_non_dict_word to 1.0 in an attempt to force tesseract to accept my user-defined dictionary, but that also didn't work.

Does anybody have any idea what I could be doing wrong? Alternatively, is there any other way to limit tesseract to strings that are a certain length?

Thanks for the help
Reply all
Reply to author
Forward
0 new messages