Hi,
Tesseract is detecting the blobs for each character correctly at least. One trick is to leverage the coordinates of each character for extracting individual images, invert the colours, and use single character mode (-psm 10) to do the recognition. I think you have to dig into the API to get the character coordinates or use the makebox option (e.g. tesseract license.png license makebox). If you isolate each character, it usually recognizes it, not something that is recommended for a lot of text but maybe worthwhile in this case.
art
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
tesseract-oc...@googlegroups.com.
To post to this group, send email to
tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/abcbfacf-3491-4b85-87b1-a43e5e4de56f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
| name | value | description |
| editor_image_word_bb_color | 7 | Word bounding box colour |
| editor_image_blob_bb_color | 4 | Blob bounding box colour |
| editor_image_text_color | 2 | Correct text colour |
> I have one question about this. when using -psm 10, what background color should be used?
Hi Alex,
I did a simple invert but I have never burrowed very deeply into why it seems to make a difference for single characters. I usually add a margin to the character when extracted as well but opencv is probably the way to go in this case. Good luck!
art
Hi Alex,
Yes, some spacing around the character seems to help.
art