Hi all,
I would like to know if I can just unpack-modify-pack files from traineddata and get an improvement in OCR (i am using tesseract 3.04). More specifically, I want to add to characters (like section character §) and new words to the dictionary. Do I need to re-train tesseract or it will "just use" new traineddata file?
Or maybe a mixed method: some of the below files require retraining and others do not?
- deu.bigram-dawg
- deu.freq-dawg
- deu.inttemp
- deu.normproto
- deu.number-dawg
- deu.params-model
- deu.pffmtable
- deu.punc-dawg
- deu.shapetable
- deu.traineddata
- deu.unicharambigs
- deu.unicharset
- deu.word-dawg
I did a short test with extending the wordlist by adding words from one document and do not see any significant improvement. Maybe I am doing something wrong.
Thanks in advance and best regards,
Nikolai KROT