--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/4722674d-27a1-4b8e-8c5a-9e07dbe3ca7d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Which version of tesseract are you using?
ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Wed, Apr 25, 2018 at 8:29 PM, Youcef <youcef...@gmail.com> wrote:
Hi,Tesseract seems to post process its prediction.Here after, what I get after OCRizing images (same font, same size images generated with text2image):- an image containing "12345678I" => `123456781`
- an image containing "GLOTHUVFI" => `GLOTHUVFI`
- an image containing "12345678H" => `12345678H`
- an image containing "GLOTHUVFH" => `GLOTHUVFH`
- an image containing "12345678A" => `123456784`
- an image containing "GLOTHUVFA" => `GLOTHUVFA`It looks like Tesseract doesn't like a word with a some numbers and one letter at the end. In fact, if the letter looks like a number ("I" and "A" looks like "1" and "4" respectively), it replaces it by the closest number.
I have tried to tune following parameters without any changement in the result:
- segment_penalty_dict_frequent_word
- language_model_penalty_chartype
Thanks for any help.
Regards
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.