how to make box file for tesseract

31 views
Skip to first unread message

Jingjing Lin

unread,
Jun 12, 2019, 3:07:30 PM6/12/19
to tesseract-ocr
I'm very confused about how to prepare text for further training tesseract. I don't think the page below gives any useful information about this:
Shouldn't there be a process where we input the correct text? Or is it that tesseract can just do everything for us? That will be kind of insane since tesseract makes mistake?

Mox Betex

unread,
Jun 12, 2019, 5:28:52 PM6/12/19
to tesseract-ocr
You don't have to manually create .box files.

Use OCR-D for training https://github.com/OCR-D/ocrd-train

In data/ground-truth folder you put tif/gt.txt files and when you run make training it will generate box files.
For every tif image you write correct text in gt.txt file, nothing else.
Look at the https://github.com/OCR-D/ocrd-train/blob/master/ocrd-testset.zip for example of tif/gt.txt files.

Reply all
Reply to author
Forward
0 new messages