/ Arun/ news
As far as I am aware, there are two options for high-quality Sanskrit OCR that I know of:
I was curious to get a sense of how these two tools compared on a sample page, and whether there was a clear difference in quality between the two. Here I’ll describe how I conducted this comparison and my results.
Doing a fair analysis is time-consuming because a variety of page sizes, scan qualities, genres, and so on must be tested, and all must be tested for both the number and the severity of the errors produced.
So rather than make a firm pronouncement of how one tool compares to the other, I focused on just one page that I found particulary challenging and saw how the tools compare. Specifically, I focused on page 28 of [this edition] of the Raghuvamsa, with the commentary by Mallinatha.
This is a reasonable test because the text contains both large type for the main poem and smaller type (with long compounds) for the commentary. The print quality ranges from perfectly clear to quite ugly.
The two tools produce roughly the same number of errors, but when comparing by edit distance, SanskritOCR has roughly twice the severity of Google OCR.
Since the page I chose is particularly hard, I have confidence that Google OCR can maintain this quality or perform even better on easier data.
Both tools make errors that can seem at first like normal Sanskrit. Google OCR seems more prone to this, since its errors are less severe (and therefore closer to actual Sanskrit). This may be able to be worked around with better linguistic understanding.
My recommendations on this basis:
--
You received this message because you are subscribed to the Google Groups "sanskrit-programmers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sanskrit-program...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Kaushal Trivedi
To unsubscribe from this group and stop receiving emails from it, send an email to sanskrit-programmers+unsubscrib...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "sanskrit-programmers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sanskrit-programmers+unsubscrib...@googlegroups.com.