Tesseract 4: Training / Evaluating: What are your confidence values with your own Tesseract 4 model?

67 views
Skip to first unread message

kolomiyets

unread,
May 4, 2017, 8:04:34 AM5/4/17
to tesseract-ocr
Hi

I am trying to train my own Tesseract model (V. 4, by replacing top layer as described in the tutorial). Besides of non-explainable OCR problems (see https://github.com/tesseract-ocr/tesseract/issues/734#issuecomment-299132760), when I compare outputs produced by my model and by one of the standard models, I observe quite big differences.

I trained a model until the 0.005 convergence level (below the default value 0.01), and then evaluated the model on small data it was trained with. The confidence values (produced by my model) are between 40-55 (even for very frequent and unambiguous words), whereas a standard model achieves between 80-95, with 50-70 for visually ambiguous words.

I was wondering if you achieve confidence levels close to tessdata models? If so, how did you achieve this. Are the standard tesseract models overfitted (Try to OCR a common but misspelled word ;)?

Cheers,
Alex

Reply all
Reply to author
Forward
0 new messages