instructions to create a custom dictionary.
You can ereate local variables for the pipelines within the template by
prefixing the variable name with a “$" Sign. Variable names have to be
eomposed of alphanumeric characters and the underseore. In the example
below I have used a few variations that work for variable names.
and I was expecting it to _only_ have words from the custom dictionary. (eg, "local", "variable", etc..)
Am I misunderstanding how custom dictionaries are supposed to work? Are the words in a custom dictionary merely a "hint" rather than a constraint on what words can be emitted in the ocr output?
Here are the steps I used to regenerate a new eng.traineddata file:
$ combine_tessdata -u tessdata/eng.traineddata /tmp/eng.
$ wordlist2dawg eng.wordlist eng.word-dawg eng.unicharset (where eng.wordlist contains word list mentioned above with "local", "variables", etc)
$ combine_tessdata /tmp/eng.
$ mv eng.traineddata ~/tmp/tessdata/eng.traineddata
And here is how I called tesseract
$ tesseract --tessdata-dir /tmp ocrimage ocrimage