Training tesseract on non-letter symbols

58 views
Skip to first unread message

Piotr Gryta

unread,
May 10, 2016, 3:13:36 AM5/10/16
to tesseract-ocr
Hi everyone,
I am devoloping a Java application to vectorize a raster image. One of the steps is symbol recognition and I was hoping to train Tesseract to find them and return their pixel coordinates.
My question is:
1) Is it possible to make a dictionary of symbols to avoid detection of letters contained in English dictionary?
2) What steps should I perform?
I managed to make a box files for my training image, but later I get an Empty page! error.
I am glad for any suggestion,
Piotrek

Here is a sample image of tree sybols which I would like to train to check if it works:
https://gyazo.com/85a1db80f92f2df44625875bcf20d37d

Tom Morris

unread,
May 10, 2016, 11:52:52 AM5/10/16
to tesseract-ocr
This sounds more like a job for OpenCV or some other machine vision library.

Tom
Reply all
Reply to author
Forward
0 new messages