How to Use the QT Box Editor to Train Tesseract?

147 views

Skip to first unread message

Swagato Barman Roy

unread,

Sep 18, 2017, 7:33:18 AM9/18/17

to tesseract-ocr

Using tesseract to read and extract text from a Singapore employment pass, looking like this.

The performance at the first go is not good and have to train it using the box editor. That is where I am facing the difficulty. I have correctly compiled and installed it (Ubuntu 16.04) from the source code at github. But somehow I cannot get a hang of the graphic interface. The image is there, along with some detected letters to its left.

I believe somehow I have mark a small box corresponding to each letter to tell tesseract how the letter looks like. But that is where I have the difficulty. Should there be a small box for every character in the image and a row in the table? Or should I mark the same letter everywhere in the image and there will be a single row mapping to its every occurrence.

I imagine it is not something conceptually difficult to use the GUI, but any simple tutorial or guide would be great. Thanks in advance.

Reply all

Reply to author

Forward

0 new messages