Easy training?

Dimitry Khanukaev

unread,

Aug 6, 2018, 3:25:07 AM8/6/18

to tesseract-ocr

Hi is there way to do easy training with following concept:

- I know font of program messages that need recognition

- I know background

- Even amount of messages is limited

Could I? :
- Just pass to training pairs (the screenshot of the error message + the text on that screenshot).

- Pass/train to Tesseract number of those pairs

Get training result and use Tesseract with it expecting that those images (aka screenshots) with texts that I've supplied will be recognized very correctly? (Up to the exact texts that I've been training them with)

In other words is it conceptually wrong way of thinking?

Sort of I know my images I know exact text on them - can I just tell Tesseract to train against the images to give me the texts that are paired with those images)

Sort of not getting deep into boxes and stuff :) Kinda training light? :)

Thank you for any help.

Dimitry Khanukaev

unread,

Aug 6, 2018, 3:49:16 AM8/6/18

to tesseract-ocr

I've attached the example of error message.

By "Pass/train to Tesseract number of those pairs" I mean do training for Tesseract by giving pairs images like that + the text that should be recognized from the image.

test.png

Raniem AROUR

unread,

Sep 4, 2018, 9:04:35 AM9/4/18

to tesseract-ocr

I was struggling just like you, until I found this github repository: https://github.com/OCR-D/ocrd-train

It will make your life super easy. All you need to do is to put your images in tif format and your text should have the same image name with extension .gt.txt. It will take care of all the rest for you. (you might need to update the Makefile according to your local machine)

Whether to train from scratch or fine-tune depends on your own language, data and the problem you are trying to solve. For me the fine tunining is what I need cause I am happy with the current performance but need to add upon it.

All the useful details you might need can be found in this answer

Thanks to @ShreeShrii for providing support on every matter.

Regards

Reply all

Reply to author

Forward