OCR inconistencies

68 views
Skip to first unread message

Jamiel Impoy

unread,
Jul 12, 2023, 10:37:25 AM7/12/23
to tesseract-ocr
Hello,

For Redact 5.3.1, there is a strange edge case that results in the OCR not recognizing words when surrounded by equal dashes. For example, " --- Hidden Text --- " is not recognized by the OCR but " --- Hidden text -- " is recognized.

I was wondering if anyone has ran into this problem and/or has a solution to this. This seems like a weird edge case but someone might have run into this before.

Zdenko Podobny

unread,
Jul 13, 2023, 3:02:07 AM7/13/23
to tesser...@googlegroups.com
Hello,

I am not sure what you do you meant with "Redact 5.3.1", but please provide test case to reproduce problem.
For me tesseract works:

tesseract incon.png -
--- Hidden text --


Zdenko


st 12. 7. 2023 o 16:37 Jamiel Impoy <jim...@gmail.com> napísal(a):
Hello,

For Redact 5.3.1, there is a strange edge case that results in the OCR not recognizing words when surrounded by equal dashes. For example, " --- Hidden Text --- " is not recognized by the OCR but " --- Hidden text -- " is recognized.

I was wondering if anyone has ran into this problem and/or has a solution to this. This seems like a weird edge case but someone might have run into this before.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7124a46c-6d76-46c2-87ab-fb4236ce3a00n%40googlegroups.com.
incon.png

Jamiel Impoy

unread,
Jul 14, 2023, 1:22:40 AM7/14/23
to tesseract-ocr
Here are some test cases
four.tif

Jamiel Impoy

unread,
Jul 14, 2023, 1:23:07 AM7/14/23
to tesseract-ocr
I apologize, I meant "Tesseract 5.3.1", and some test cases are linked here. None of these show up in the HOCR.

On Thursday, July 13, 2023 at 2:02:07 AM UTC-5 zdenop wrote:
four.jpg
Reply all
Reply to author
Forward
0 new messages