Can I incrementally train Tesseract 3.03 using single-page TIFFs with ground truth?

37 views
Skip to first unread message

Ilya Z

unread,
Apr 24, 2015, 9:58:46 AM4/24/15
to tesser...@googlegroups.com
I have a set of English single-page TIFF document images that come with ground truth files. Each TIFF has a single rectangular zone of text and each GT file is a UTF8 text file containing the correct text.

I built T3.03 from the source and applied it to this set using whatever English model that came out of the box. Results were mixed and so the question I am trying to answer is this:

Can I incrementally train Tesseract using a part of this corpus to get better accuracy?

I've been reading https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 but it's unclear to me if incremental training is possible. Is it? How would I have to modify the training procedure to include previosuly trained data in it to increment it with whatever comes from the new data?

Thx


Reply all
Reply to author
Forward
0 new messages