Extremely bad OCR recognition of tesseract for webpage chars and code

Claudia

unread,

Mar 8, 2026, 3:28:12 PMMar 8

to tesseract-ocr

Occasionally I need an OCR recognition tool to convert an *.png image to text.

It is very important that the base is a *.png image snapshotted from a webpage.
So this is NOT a fuzzy, blurred scan from a newpaper magazin and no text detection from a photo.
Its just a super simple clear source which needs to be converted to text.

I considered tesseract or a GUI on top using tesseract to do this job.
I used OCRget as GUI.

Surprisingly the output is extremely bad.
Have a look at snapshots of source and output below

Hardly any brackets are recognized.
Many many chars are miss spelled.
No Indents are kept.

Is task really to difficult?
I cannot believe this.

Does anyone have suggestions to improve recognition quality?

_sample Python script for OCR.png

Python script output with tesseract OCRGet.png

hosam

unread,

Mar 9, 2026, 2:04:10 AMMar 9

to tesser...@googlegroups.com

Do Fine-tuning OCR model

Gesendet von Outlook für iOS

Von: tesser...@googlegroups.com <tesser...@googlegroups.com> im Auftrag von Claudia <cls...@gmail.com>
Gesendet: Monday, March 9, 2026 1:57:41 AM
An: tesseract-ocr <tesser...@googlegroups.com>
Betreff: [tesseract-ocr] Extremely bad OCR recognition of tesseract for webpage chars and code

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/1b0ad79d-8acc-455f-aab5-3c199ba89277n%40googlegroups.com.

Claudia

unread,

Mar 9, 2026, 4:45:14 AMMar 9

to tesseract-ocr

What Do you mean with "Fine-Tuning OCR Model"?
I cannot see corresponding options in OCRGet.
Shouldn't tesseract auto-detect the best tuning options for this maximum easiest OCR task?

Reply all

Reply to author

Forward