tesseract-ocr

96 views
Skip to first unread message

Mekuriaw Aze

unread,
Oct 20, 2024, 3:16:15 PM10/20/24
to tesseract-ocr
Dear all 
How can this image be converted to text by tesseract-ocr? help me, 
Details:

Tesseract Version: tesseract v5.0.0-alpha.20210506
Language Pack: Geez(Ethiopic)
Image Characteristics:
Background color: Brown 
Text Color: White
Image resolution: 1255x523
Image format: PNG
King_004.png

Ger Hobbelt

unread,
Oct 21, 2024, 8:20:08 AM10/21/24
to tesseract-ocr
Hi,

Hm, I've seen that photo on this mailing list before...
Anyway, your sample image shows a book page that is unreadable for humans due to low image resolution, let alone readable for any ocr engine.

Please do search the mailing list (and tesseract documentation site) for more info: the number one factor here is the need to have a high quality image at sufficient resolution so that a single line of text is about 30px D-height, that's about 30px x-height for your script.
There's published research, with charts, about OCR quality / effectiveness vs. pixels-per-text-line in image source material.

Then, once you have tackled that issue (new camera output), there's image cleanup = preprocessing, etc. before you will have any decent result, with tesseract or any other OCR system out there.

Take care,

Ger


--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/19e081d1-333a-4ae9-a19b-9d69f92e900dn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages