How to Use the QT Box Editor to Train Tesseract?

147 views
Skip to first unread message

Swagato Barman Roy

unread,
Sep 18, 2017, 7:33:18 AM9/18/17
to tesseract-ocr
Using tesseract to read and extract text from a Singapore employment pass, looking like this. 



The performance at the first go is not good and have to train it using the box editor. That is where I am facing the difficulty. I have correctly compiled and installed it (Ubuntu 16.04) from the source code at github. But somehow I cannot get a hang of the graphic interface. The image is there, along with some detected letters to its left. 

I believe somehow I have mark a small box corresponding to each letter to tell tesseract how the letter looks like. But that is where I have the difficulty. Should there be a small box for every character in the image and a row in the table? Or should I mark the same letter everywhere in the image and there will be a single row mapping to its every occurrence. 

I imagine it is not something conceptually difficult to use the GUI, but any simple tutorial or guide would be great. Thanks in advance. 




Reply all
Reply to author
Forward
0 new messages