Regarding Tesseract Alternative

2 views
Skip to first unread message

Devansh Varshney

unread,
Jun 16, 2025, 2:12:42 PM6/16/25
to Lector Users
Hi,

I am a contributor to LibreOffice and I have as of now used teserract based extension https://extensions.libreoffice.org/en/extensions/show/99360

and here is the GitHub repo - https://www.github.com/varshneydevansh/TejOCR

The thing what I have noticed is that tesseract is not that reliable when it comes to the maintain the structure of the OCR atleast and even felt that the accuracy is bit off.

I am looking for some alternatives given the advancements in Image Processing nowadays.

It could be either based on DocTr. https://www.github.com/mindee/doctr

or could be a VLM based derivative solution - https://www.huggingface.co/Qwen/Qwen2-VL-2B-Instruct
https://www.huggingface.co/blog/smolvlm#:~:text=such%20as%20a%20laptop!%20You,which%20use%20a%20similar%20approach

https://www.huggingface.co/microsoft/Florence-2-large#:~:text=OCR%20with%20Region

https://www.huggingface.co/AI-Safeguard/Ivy-VL-llava

What I am looking or is what could be the best way which uses as less compute as possible and can run entierly locally atleast by CPU.

Regards,
Devansh

Reply all
Reply to author
Forward
0 new messages