Tesseract makebox config with known lines of text

25 views
Skip to first unread message

a.f...@sheffield.ac.uk

unread,
Jul 15, 2020, 6:37:36 AM7/15/20
to tesseract-ocr
I'm using a loop around "tesseract $X $X batch.nochop makebox" to produce box files to be corrected and re-used for training, and have two questions.

Is there a way to make it produce the line-by-line format (rather than character-by-character) that newer versions of tesseract support as training data? (I'm using tesseract 4.0.0 in a docker container.)

I have a TSV file (which I could transform into some other format) with the correct string for the text in each image file, but it does not have the pixel locations. Is there any way to tell tesseract makebox to use those strings and "make them fit" the image?

Thanks,
Adam
Reply all
Reply to author
Forward
0 new messages