To check several methods to improve character recognition, I've divided my image in characters and I send one character at a time to Tesseract (characters are fixed width).
I set the page segmentation mode to '10' (treat the image as a single character), I load every character and then I join the results and I get better accuracy than loading the entire image.
The problem is that some symbols are not recognized at all. For example: ':', '-'. It can be tested by loading the attached image into Tesseract.
If I load for example the full line that contains the ':' symbol, it is recognized, but other accuracy problems appear.
I would like to know if I could tweak the configuration to be able to recognize those symbols as single characters.
OS: Windows 10
Output of Tesseract -v:
tesseract 3.05.00dev
leptonica-1.73
libgif 4.1.6(?) : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.6.20 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.3 : libopenjp2 2.1.0
(Note: I've also posted this issue in stack overflow with no responses http://stackoverflow.com/questions/38607576/single-symbol-recognition-in-tesseract)
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/a799b1bf-0b7c-42ae-89f6-ef73da676b19%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.