Setting User_Patterns does not seem to effect output. (Alternatively, are there any ways to limit tesseract to output strings that are a specific length?)

45 views

Skip to first unread message

David Orshan

unread,

Mar 2, 2015, 12:32:17 PM3/2/15

to tesser...@googlegroups.com

Hi,

I want to have tesseract recognize images that I know contain a single word that is 8 characters long.

I found a few mentions of user_patterns here: http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html#_config_files_and_augmenting_with_user_data, which seems to be the solution I need, so I tried following the instructions, but I can't seem to get the file to effect my output. As a sanity check, I tried setting user_patterns to only contain a string of "\d\d\d\d\d\d\d\d", which I thought should cause an output of only numbers, but there is no effect (i'm getting outputs that are 4 characters long and only letters). I also tried changing the language_model_penalty_non_dict_word to 1.0 in an attempt to force tesseract to accept my user-defined dictionary, but that also didn't work.

Does anybody have any idea what I could be doing wrong? Alternatively, is there any other way to limit tesseract to strings that are a certain length?

Thanks for the help

Reply all

Reply to author

Forward

0 new messages