How to recognize single hyphen in psm=6

67 views
Skip to first unread message

Ankhbayar Gansukh

unread,
Oct 4, 2024, 1:16:46 PM10/4/24
to tesseract-ocr
Hello,
I am new to tesseract and would be very appreciate if someone help me.

Tesseract5 recognizes succesfully single hyphen in psm=7.
Example: tesseract --psm 7  single_hyphen_char.jpg stdout

When use psm=6 it does not recognize single hyphen.
Example: tesseract --psm 6  single_hyphen_char.jpg stdout

I cannot use psm 7 because image can be multiline.

Any advice would be greatly appreciated!
Ankhbayar





single_hyphen_char.jpg

Zdenko Podobny

unread,
Oct 5, 2024, 1:57:30 PM10/5/24
to tesser...@googlegroups.com
 I’m having trouble understanding the issue you're encountering—you mentioned that your input could be multiline, but the example you provided only contains a single character. I’m also surprised that even psm 7 works in this case.  

 The problem is that a hyphen, when presented as a single character, can sometimes be interpreted as a vertical line or graphical element and may be ignored. Typically, in a multiline context, it's recognized correctly.  

Zdenko


pi 4. 10. 2024 o 19:16 'Ankhbayar Gansukh' via tesseract-ocr <tesser...@googlegroups.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ab72fe8c-61d5-4a6f-936b-69e9658c03a7n%40googlegroups.com.

Ankhbayar Gansukh

unread,
Oct 6, 2024, 11:24:06 AM10/6/24
to tesseract-ocr
Hi Zdenop, thank you for your reply.
The input images are dynamic, can be multiline or singleline. I attached only image which i have problem. 
Yes psm 7 the example image works fine somehow. 
In multiline context (psm 6) it recognizes hyphens fine when it together with other characters, but does not recognize single hyphen when alone.
I wonder is there any way to train to recognize correctly single hyphen. Or any tesseract option.

Thanks,
Ankhbayar
Reply all
Reply to author
Forward
0 new messages