All of DLI OCR-ed with drive?

12 views
Skip to first unread message

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,
Jul 1, 2024, 4:48:44 AMJul 1
to sanskrit-programmers, Martin Gluckman, tyler....@gmail.com, Suhas M सुहासो महेशसूनुः कविः बहुभाषाज्ञः भूतशास्त्रज्ञः


says:

Having this kind of technology at your fingertips opens up new possibilities. Sanskrit Research Institute (sri.auroville.org, led by Martin Gluckman), applied Google Drive API to its mirror of the 550k-item (31TB) Digital Library of India (dli.sanskritdictionary.com) and put MySQL-based search in front of it.

How do I access this resource? Is it possible to download ocr-s of particular texts? (That would save a lot of boring work.)


("At a smaller scale, the Jain Quantum project processed the 17k-item Jain eLibrary, again with Google Drive API, and also made it searchable, this time with a custom combination of fuzzy matching and trigram search. " is well known.)


Aside regarding handwriting recognition - 

Handwritten text recognition (HTR) for manuscripts has been worked on recently especially in the Tibetan space (see several presentations at this event in Vienna last year). Software options for this kind of training for bespoke material include not only the relatively well-known Transkribus system but also Kraken, in the lineage of ORCropus.



--
--
Vishvas /विश्वासः

Reply all
Reply to author
Forward
0 new messages