characters omitted when alone but recognized clearly when next to each other

27 views
Skip to first unread message

Sergio Mendoza

unread,
Mar 8, 2016, 2:23:46 AM3/8/16
to tesseract-ocr

I have been using tesseract (tess-two to be more precise)to make an app in android to recognize certain non conventional symbols. The purpose is to identify the symbol and redirect to the description of said symbol.

To do this I passed those symbols as a font and did the corresponding training for each of them.



The symbols can be recognized almost perfectly whether they are alone in the image or they are next to each other... except for two (the ones below).





Both of these symbols are not recognized when alone, BUT they are correctly recognized if they are next to any other symbol.


For example:


Not recognized

_



Correctly recognized


_  b


_ y _




Problem is that they are not just mismatched with other symbols, but instead they are ignored completely. This occurs to me when calling:


TessBaseAPI baseApi;


...


 String text = baseApi.getUTF8Text();



The returned string is always empty. Like if it didn't even recognize the black blobs to begin with.

Anyone knows how I could fix this?






Reply all
Reply to author
Forward
0 new messages