Hi,
I want to have tesseract recognize images that I know contain a single word that is 8 characters long.
I found a few mentions of user_patterns here:
http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html#_config_files_and_augmenting_with_user_data, which seems to be the solution I need, so I tried following the instructions, but I can't seem to get the file to effect my output. As a sanity check, I tried setting user_patterns to only contain a string of "\d\d\d\d\d\d\d\d", which I thought should cause an output of only numbers, but there is no effect (i'm getting outputs that are 4 characters long and only letters). I also tried changing the language_model_penalty_non_dict_word to 1.0 in an attempt to force tesseract to accept my user-defined dictionary, but that also didn't work.
Does anybody have any idea what I could be doing wrong? Alternatively, is there any other way to limit tesseract to strings that are a certain length?
Thanks for the help