too many unrecognized words

108 views
Skip to first unread message

Dariush Mazlumi

unread,
Aug 6, 2024, 2:53:17 PM8/6/24
to tesseract-ocr
hello
I have a PDF file which I like to make searchable; and I found this OCR engine. however, it fails to detect many obvious and clear parts. hence, I'm attaching a page alongside the .tif processed image and the .txt output as an example here, hoping anyone can diagnose the issue so it might help towards a better product.
thanks
4.jpg
4 -l fas.txt.processed.tif
4 -l fas.txt.txt

Mekuriaw Aze

unread,
Aug 7, 2024, 3:40:58 AM8/7/24
to tesser...@googlegroups.com
Hello
my problem is like this how-to image-to-text conversion by ocr 
my image is ancient Ethiopian language I have an image but convert help me  
Best regards 
Mekuriaw

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/d762ce3a-9c58-43ca-808d-b8fdc4d204d7n%40googlegroups.com.

Dariush Mazlumi

unread,
Aug 7, 2024, 12:56:52 PM8/7/24
to tesseract-ocr
hello Mekuriaw
I have no idea how to fix it. if I had, probably I wouldn't ask for help here :)
besides, I think I have to mention my document is in persian, perhaps the circumstances are different for you, as you've mentioned your language is ancient. perhaps you should make a new post of your own on this manner?
good luck
Reply all
Reply to author
Forward
0 new messages