modifying traineddata without retraining possible?

73 views
Skip to first unread message

Nikolai Krot

unread,
Nov 1, 2017, 3:41:54 PM11/1/17
to tesseract-ocr
Hi all,

I would like to know if I can just unpack-modify-pack files from traineddata and get an improvement in OCR (i am using tesseract 3.04). More specifically, I want to add to characters (like section character §) and new words to the dictionary. Do I need to re-train tesseract or it will "just use" new traineddata file?

Or maybe a mixed method: some of the below files require retraining and others do not?
  1. deu.bigram-dawg
  2. deu.freq-dawg
  3. deu.inttemp
  4. deu.normproto
  5. deu.number-dawg
  6. deu.params-model
  7. deu.pffmtable
  8. deu.punc-dawg
  9. deu.shapetable
  10. deu.traineddata
  11. deu.unicharambigs
  12. deu.unicharset
  13. deu.word-dawg
I did a short test with extending the wordlist by adding words from one document and do not see any significant improvement. Maybe I am doing something wrong.

Thanks in advance and best regards,
Nikolai KROT

Reply all
Reply to author
Forward
0 new messages