Can I incrementally train Tesseract 3.03 using single-page TIFFs with ground truth?

37 views

Skip to first unread message

Ilya Z

unread,

Apr 24, 2015, 9:58:46 AM4/24/15

to tesser...@googlegroups.com

I have a set of English single-page TIFF document images that come with ground truth files. Each TIFF has a single rectangular zone of text and each GT file is a UTF8 text file containing the correct text.

I built T3.03 from the source and applied it to this set using whatever English model that came out of the box. Results were mixed and so the question I am trying to answer is this:

Can I incrementally train Tesseract using a part of this corpus to get better accuracy?

I've been reading https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 but it's unclear to me if incremental training is possible. Is it? How would I have to modify the training procedure to include previosuly trained data in it to increment it with whatever comes from the new data?

Thx

Reply all

Reply to author

Forward

0 new messages