A few offline OCR : need help to test, train and deploy correctly

60 views
Skip to first unread message

श्रीमल्ललितालालितः

unread,
Sep 11, 2022, 3:52:12 PM9/11/22
to sanskrit-programmers
I've seen a few OCR on github and elsewhere while searching of python ML projects for OCR.

1.
It claims to be at par of Google Vision. I was unable to test it on macOS, since tensorflow server failed to be installed. Directions available elsewhere were for linux only
I'll be trying it soon on a Manjaro VM.
If anyone had any experience with this OCR, please share your experiences here.

2.
This is good for English and Chinese. It claims to support Hindi, but my experience with scanned documents is not good.

3.
I haven't yet tried this. Reading details show that it may work for single lines only. I'm not sure though. A blog by author is here.

4.
You can test it here.
It works well for hindi, but results for Sanskrit are poor. May need training.


I'm tired of online solutions like Google Vision Or Google Drive. Having offline solutions will allow me to run OCR on whatever PDF I've without depending on Internet and I may use the data to edit/train/publish/blog/search/research etc.

I'll like to request affluent people/programmers/coders to test these options and write their experience/guide for others like us. If one is able to use/create training data for any of these OCR and get similar/better results as Google Vision, it will be far better.

One may then use tkinter, etc. to create a usable GUI or locally deploy-able solution with WebUI for elderly/non-tech-savvy Sanskrit scholars.


Reply all
Reply to author
Forward
0 new messages