Training wrongly recognized text using Tesseract OCR training libraries

46 views

Skip to first unread message

thomas

unread,

Jul 24, 2015, 3:35:31 AM7/24/15

to tesseract-ocr

I used Tesseract to recognize texts. Some texts are wrongly recognized. So I need to retrain them. I read the articles from here and here. According to the discussions there, I understood as mftrainingand cntraining can accept only maximum of 64 tr files.

One tr file and one box file are produced from one jpg file.

So it means,one time training accepts only maximum of 64 jpg files, then final output is eng.traineddata (for example) for one time training of 64 jpg files.

If I have 200 files to train, how can I make only one eng.traineddata file for all those 200 files.