Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Handling text scans and cleaning

56 views
Skip to first unread message

Ajinkya Bobade

unread,
Apr 8, 2025, 12:09:52 AMApr 8
to tesser...@googlegroups.com
I have noticed that text cleaning is the most difficult part in OCR pipeline. I have struggled alot on this part, without properly cleaned text OCR simply fails in terms of accuracy. In order to handle text cleaning seperately I created  a GitHub repo that uses AI to clean up all text in a image. Once the text is cleaned we can choose our own custom OCR models on it. I have personally seen OCR accuracy shoot up to 99% on a properly preprocessed and cleaned image. 

Here is a Github: https://github.com/ajinkya933/ClearText link. 

Regards 
Ajinkya

Zdenko Podobny

unread,
Apr 9, 2025, 12:56:46 AMApr 9
to tesser...@googlegroups.com

ut 8. 4. 2025 o 6:09 Ajinkya Bobade <ajinkya...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/CAHy6iNOjhs7ZY7r26fGzqJOUr2e%2BF3bY%3DeDCHjM-VD7XH5M%3DTA%40mail.gmail.com.

Ajinkya Bobade

unread,
Apr 9, 2025, 1:15:22 AMApr 9
to tesseract-ocr
Thank you, just saw from your link that it is posted !! 
I'm so glad to hear this news  

Ajinkya
Reply all
Reply to author
Forward
0 new messages