Hi all, Regarding Patrick’s question about easy OCR, I suspect he’s particularly looking for a tool that can handle multi-page PDFs in one go, which could be especially helpful for digitization projects like UTA’s Resource Library for Dharmaśāstra Studies <https://sites.utexas.edu/sanskrit/resources/dharmasastra/>. If Patrick or anyone else is interested, feel free to reach out to me directly. I’m looking for a few volunteers to test a new drag-and-drop interface I’m building to streamline access to Google Vision OCR, which is currently best in class and handles multi-page inputs well. Kind regards, Tyler
--
You received this message because you are subscribed to the Google Groups "sanskrit-programmers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sanskrit-program...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/sanskrit-programmers/CAFY6qgEpWVxQdrRDdyG%2BSMdRtwuyr9Ui227L8KYVgAddXOf70A%40mail.gmail.com.
Multi-page PDFs can be OCRed in some other ways too-1) There are many online tools available to convert PDF files into text, doc, markdown etc.
2) create a notebook in google colab for doing that. I have made such a notebook and converted some pdf files into text.
Hi all,Regarding Patrick’s question about easy OCR, I suspect he’s particularly looking for a tool that can handle multi-page PDFs in one go, which could be especially helpful for digitization projects like UTA’s Resource Library for Dharmaśāstra Studies.
If Patrick or anyone else is interested, feel free to reach out to me directly. I’m looking for a few volunteers to test a new drag-and-drop interface I’m building to streamline access to Google Vision OCR, which is currently best in class and handles multi-page inputs well.
Kind regards,
TylerOn Sat, May 10, 2025 at 8:00 AM <indology-request@list.indology.info> wrote:---------- Forwarded message ----------
From: Patrick Olivelle <j...@austin.utexas.edu>
To: Indology <indology@list.indology.info>
Cc:
Bcc:
Date: Fri, 9 May 2025 22:07:05 +0000
Subject: [INDOLOGY] OCR
Dear Friends:
I am wondering whether with the advance of AI technology we have easy OCR software to read Devanāgarī, easy enough to be used by someone like me!! We have the one prepared by Andrew Ollett, which he generously gave us. But that requires computer knowledge far beyond my reach. Is there on where you can just drop the Devanāgari scan, out pops a searchable file. This is probably a long shot, but I thought I would ask.
With thanks and best wishes,
Patrick Olivelle
I am pleased to announce several significant updates to the Dharmamitra platform that will be of interest to researchers in our field.
New OCR Capabilities As Tyler Neill mentioned, Dharmamitra now features fast OCR processing powered by the Gemini engine. Users can upload PDF files up to 100MB in size, with automatic conversion to IAST or Wylie transliteration if needed. We are also working on our own specialized OCR engine for Sanskrit.
Enhanced Translation Tools The platform's translator now includes an "upload image" input option, allowing researchers to move directly from screenshots of texts to translations.
Updated Chrome Extension We have significantly overhauled our Google Chrome extension, which can be found here. The chrome extension makes it possible to use Dharmamitra seamlessly when browsing GRETIL etexts, BuddhaNexus, DSBC etc.
These developments aim to facilitate the work of translators and philological researchers.
We extend our gratitude to the Tsadra Foundation for their significant support in making these advances possible.
Best regards,
Hi all,
Yes, I just launched that yesterday (Google Cloud Vision, individual billing required).
And so too did Sebastian launch a nearly identical service on Dharmamitra (Google Gemini, billing covered by grant money). Big day for OCR!Copying messages on Indology list below. In general, would it help to cross-post such announcements on BV-Parishat (I recently joined) and/or this list? I’m not sure how much cross-pollination happens naturally through shared membership.