modifying traineddata without retraining possible?

73 views

Skip to first unread message

Nikolai Krot

unread,

Nov 1, 2017, 3:41:54 PM11/1/17

to tesseract-ocr

Hi all,

I would like to know if I can just unpack-modify-pack files from traineddata and get an improvement in OCR (i am using tesseract 3.04). More specifically, I want to add to characters (like section character §) and new words to the dictionary. Do I need to re-train tesseract or it will "just use" new traineddata file?

Or maybe a mixed method: some of the below files require retraining and others do not?

deu.bigram-dawg
deu.freq-dawg
deu.inttemp
deu.normproto
deu.number-dawg
deu.params-model
deu.pffmtable
deu.punc-dawg
deu.shapetable
deu.traineddata
deu.unicharambigs
deu.unicharset
deu.word-dawg

I did a short test with extending the wordlist by adding words from one document and do not see any significant improvement. Maybe I am doing something wrong.

Thanks in advance and best regards,
Nikolai KROT

Reply all

Reply to author

Forward

0 new messages