Poor recognition of scanned typewriter produced pages

73 views

Skip to first unread message

James Head

unread,

Jun 5, 2025, 1:37:55 AM6/5/25

to tesseract-ocr

I have installed Tesseract tesseract-ocr-w64-setup-5.5.0.20241111.exe and gImageReader 3.4.2 on Windows 11 to recognise some 300dpi scans I have done of a typewritter produced account written by a family member many years ago.

The results don't look very good though. I see a lot of gibberish in the output.

I have looked through the guide, but can't see why the output should be so poor.

I was getting better results when I used FineReader OCR back in the late 1990's.

Can anybody point me where I am going wrong?

Zdenko Podobny

unread,

Jun 5, 2025, 5:18:18 AM6/5/25

to tesser...@googlegroups.com

Hello,

To help troubleshoot this issue, please first try reproducing it using only Tesseract. This isolates whether the problem is with gImageReader or Tesseract.

In addition, refer to relevant sections of the Tesseract documentation focusing on image preprocessing for guidance on improving results. Alternatively, you might consider sharing the original source image so others can analyze it for potential solutions.

Kind regards,

Zdenko

št 5. 6. 2025 o 7:37 'James Head' via tesseract-ocr <tesser...@googlegroups.com> napísal(a):

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/11381caf-e622-4cc2-a024-e6fdf4c70ef3n%40googlegroups.com.

Reply all

Reply to author

Forward

0 new messages