Phantom characters

57 views
Skip to first unread message

Jason Shepherd

unread,
Dec 31, 2023, 4:53:59 AM12/31/23
to tesseract-ocr
I'm using pytesseract and tesseract v5.3.3 to read some text from some images and I sometimes get these weird phantom characters. I've tried to do some image preprocessing like increasing the image size, erosion, thresholding, etc, but nothing seems to get rid of this random character that's spawing from nothing. Attached are two image examples (left side is processed, right is original with rect bounding boxes drawn), The blue rectangle to right of "KB PNG" is a '_' being detected even tho that space is completely blank. Any ideas on getting rid of this?

Zdenko Podobny

unread,
Jan 1, 2024, 12:28:38 PM1/1/24
to tesser...@googlegroups.com
post:
  1. Original image (without preprocessing)
  2. + image used for OCR (preprocessed) 
  3. + output from tesseract executable (not tesseract wrappers) and used parameters/option 
Otherwise, nobody can reproduce the problem and therefore suggest a solution.

Zdenko


ne 31. 12. 2023 o 10:53 Jason Shepherd <jmanshe...@gmail.com> napísal(a):
I'm using pytesseract and tesseract v5.3.3 to read some text from some images and I sometimes get these weird phantom characters. I've tried to do some image preprocessing like increasing the image size, erosion, thresholding, etc, but nothing seems to get rid of this random character that's spawing from nothing. Attached are two image examples (left side is processed, right is original with rect bounding boxes drawn), The blue rectangle to right of "KB PNG" is a '_' being detected even tho that space is completely blank. Any ideas on getting rid of this?

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/8800b99f-b92d-4dbf-83b8-d1d3da9c2bf4n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages