Poor recognition of scanned typewriter produced pages

54 views
Skip to first unread message

James Head

unread,
Jun 5, 2025, 1:37:55 AMJun 5
to tesseract-ocr
I have installed Tesseract tesseract-ocr-w64-setup-5.5.0.20241111.exe and  gImageReader 3.4.2 on Windows 11 to recognise some 300dpi scans I have done of a typewritter produced account written by a family member many years ago.
The results don't look very good though. I see a lot of gibberish in the output.
I have looked through the guide, but can't see why the output should be so poor.
I was getting better results when I used FineReader OCR back in the late 1990's.
Can anybody point me where I am going wrong?
Screenshot 2025-06-04 222309.png

Zdenko Podobny

unread,
Jun 5, 2025, 5:18:18 AMJun 5
to tesser...@googlegroups.com
Hello,

To help troubleshoot this issue, please first try reproducing it using only Tesseract. This isolates whether the problem is with gImageReader or Tesseract.

In addition, refer to relevant sections of the Tesseract documentation focusing on image preprocessing for guidance on improving results. Alternatively, you might consider sharing the original source image so others can analyze it for potential solutions. 

Kind regards,


Zdenko


št 5. 6. 2025 o 7:37 'James Head' via tesseract-ocr <tesser...@googlegroups.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/11381caf-e622-4cc2-a024-e6fdf4c70ef3n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages