Tess4J training for new symbols.

546 views
Skip to first unread message

Alex

unread,
Feb 13, 2016, 8:38:24 AM2/13/16
to tesseract-ocr
How would I go about training for new Unicode symbols for Tess4J. I need Tesseract to detect a division symbol (÷), but it detects it as a plus sign.

Thanks.

Quan Nguyen

unread,
Feb 13, 2016, 5:13:46 PM2/13/16
to tesseract-ocr
You'd train Tesseract and then use the resultant .trainneddata file with Tess4J.

Tom Morris

unread,
Feb 14, 2016, 1:59:55 PM2/14/16
to tesseract-ocr
I don't know if the Tess4J wrapper supports multiple languages, or which language you're using as a base language, but you might consider training it and any other symbols you need into an entirely separate language and then OCRing using the Tess4J equivalent of -l eng+mylang (or whatever your base language is).

There is the equation "language" with code equ, but it apparently doesn't include that style division sign, which I was a little surprised at.  You might try -l eng+equ to start though and see perhaps the "wavy division sign" or some other symbol is close enough that it gets detected reliable in place of the plus sign, before going to the trouble of doing your own training.

I've attached the characters which are included in the equ training.

Tom
equtmp.unicharset
Reply all
Reply to author
Forward
0 new messages