Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

OCR add extra characters from image file

106 views
Skip to first unread message

Farokh Irani

unread,
Jan 27, 2025, 10:56:33 AMJan 27
to tesseract-ocr
I have a small .TIF file with only around 28 characters. It's 300 DPI, B&W, no compression.
The issue is that in the image I have the following text:
04-50288 2 and after OCR, I wind up with the text 0464-502882.
I've tried using different --psm (6, 7, 11, 13), all produce the same output.

Any ideas on how I can fix this?

Thanks!

Sara Elshobaky

unread,
Jan 28, 2025, 1:41:35 AMJan 28
to tesser...@googlegroups.com
I'm also facing the same problem. 
- Which model are you using? 
- Is it from the original tessdata models or a new one you tuned?
- Also, is the original model from the tessdata folder, or from the tessdata/scripts folder?

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/62160493-9777-4a90-8450-7632bbaf3a80n%40googlegroups.com.

Farokh Irani

unread,
Jan 28, 2025, 7:54:03 AMJan 28
to tesseract-ocr
I'm using everything as provided in the download.

I was able to get some success by enlarging the image a bit when I cropped and converted it from PDF to TIF, but it still occurs on other images.

Reply all
Reply to author
Forward
0 new messages