Hi,
For some time I have been working on parsing Apte's dictionary.It’s been a fun project, and I thought it might interest you since, some time last year, I asked on the Google Groups about the availability of a compound-expanded version of Apte's dictionary, and you were kind enough to provide input.
In the present project, the dictionary is treated as a list of recursively structured objects (aka Terms).While the work isn’t complete, I felt this was a good point to draw the attention of people active in the field. The code can now generate full compound words and annotate them with location information from the dictionary. I’ve uploaded the data to a Google Spreadsheet, and it would be great if you could take a look! Three-word compounds are still missing, but the jump to include them is not far off. I’m more interested in ensuring there are no spurious or incorrect entries (for example, संधि/णत्व/षत्व mistakes).
The larger goal is, of course, to fully parse the dictionary to make it computationally more accessible. I’m looking forward to your feedback!
Sumant
Thanks for the shoutout! I also appreciate the link you provided, I wasn’t aware of that feature.
To put it to some use, I’ve created a Chrome extension designed to make searching Apte's dictionary online more efficient. Here’s how we can use it: