Using tesseract to read and extract text from a Singapore employment pass, looking like this.
The performance at the first go is not good and have to train it using the box editor. That is where I am facing the difficulty. I have correctly compiled and installed it (Ubuntu 16.04) from the source code at
github. But somehow I cannot get a hang of the graphic interface. The image is there, along with some detected letters to its left.
I believe somehow I have mark a small box corresponding to each letter to tell tesseract how the letter looks like. But that is where I have the difficulty. Should there be a small box for every character in the image and a row in the table? Or should I mark the same letter everywhere in the image and there will be a single row mapping to its every occurrence.
I imagine it is not something conceptually difficult to use the GUI, but any simple tutorial or guide would be great. Thanks in advance.