Suggestions for small projects in compilers?

19 views
Skip to first unread message

Ramakrishna Upadrasta

unread,
Sep 17, 2017, 11:21:48 PM9/17/17
to sanskrit-p...@googlegroups.com
Namaste to Shri Vishwas and All,

Greetings from IITH! 

I had been part of this group for long (thanks to early communication with Shri Vishwas), but was generally silent. I am a faculty at IITH and work on Compilers and Programming Languages.

I wondering if it is possible to put together some parsing/code-generation problems involving Samskrita-tools so that these can be posted as mini-project problems for students at IITH. These will be a 1-month project for students who already had some familiarity with software along with tools like lex/yacc, ANTLR etc. 

Just to get an idea of the scope of the project, these are 3rd year CS students in IITH and entire batch of students in the past years had written all the parts of a COOL --> LLVM compiler (including scanning parsing using ANTLR, semantic analysis and code-generation to LLVM) over a span of a semester. 

For example, the recent mail by Shri Chetan Pandey seems quite ripe that issues in that can possibly be worked on by students. I am just thinking aloud in raising this question. If it can be worked in this year, it would be great.

namaste
Ramakrishna

विश्वासो वासुकिजः (Vishvas Vasuki)

unread,
Sep 18, 2017, 11:31:05 AM9/18/17
to sanskrit-programmers
​आर्यं रामकृष्णं नमस्करोमि। खलु भवान् अवगच्छत्येव संस्कृतम्?​

One relevant (and useful) project that comes to mind:

One can write finite state transducers (FST) for natural language processing (translation, declension, analysis) using lt-toolbox technologies. If one can write or fix a java/ scala library which would be able to efficiently compile FST specifications *and run the FST for a given input*, it would be quite useful. Currently, the latter part does not work (See https://github.com/vedavaapi/sanskrit-lttoolbox for context).

​Another idea is to "compile" numbers represented in kaTapayAdi​ system to decimals and (more interestingly) vice versa with the additional constraint of favoring valid sanskrit vocabulary.


--
You received this message because you are subscribed to the Google Groups "sanskrit-programmers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sanskrit-programmers+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
--
Vishvas /विश्वासः

prabhat kumar singh

unread,
Dec 21, 2017, 11:03:29 PM12/21/17
to sanskrit-programmers
Deam Ramakrishna Ji,

Would you be open to discuss projects on creating open source training corpus of Sanskrit?

Training corpus is an important element in building AI based tools. All the recent progress in AI and ML is attributed to availability of better data. It has been done in all first world languages, like English, French, German, Russian, even Hebrew. But nothing of that sort is done for Sanskrit. We should take steps.
As some early projects, I suggest direct import of most famous corpus like imagenet and tagging them in Sanskrit. This will help the general masses to train Sanskrit based models and also can make compilers using NLP techniques.

I have my own interest in such data set, as disclosure. I am working for years in this direction and have been only marginally successful  due to lack of data. I built one translator system recently for English>Sanskrit, which is 0.02% complete, and is WIP.
 You can test it here: https://xn--l1b8d.xn--h2brj9c/

I believe, if we unite and put a plan, the program can be funded by government as well. As this is essential for betterment of language tools, essentially Sanskrit.

Looking forward to your reply.
Namo Namah,
Prabhat
Reply all
Reply to author
Forward
0 new messages