i need to translate.. malyalam is my mother toungue-->> malayalam
ente mathrubhasha
i dont find Malayalam in google translation list. is there any thing
available .. or is there space for an initiative to this??
I have a database of malayalam - english (had to search from the old files)
It is quite difficult to deal with such lingustic translation.
It needs some sense of grammer to deal with, i think automating this will be a tough task.
How about ...
1) 50+ years of research (actually, 2000 if you consider Panini)
2) Extremely large corpus ... if you want to make a practical system
3) Large and talented team good in computational linguistics
4) a very practical theory that can model language effectively for
your purposes (seriously lacking for even small use cases in even
major languages)
5) since you want to do MT, you need one more theory to handle the
target language ... maybe even an IL model if you go that route
instead of direct translation.
There you go ...
Regards
Rajeev J Sebastian
How about ...
On Fri, Jul 24, 2009 at 5:19 PM, Varewoolf<vare...@gmail.com> wrote:
>
> i am so much interested to make this happen... i am always interested
> in linguistics...
> anybody tell me wat r the things we need primarily??
1) 50+ years of research (actually, 2000 if you consider Panini)
2) Extremely large corpus ... if you want to make a practical system
3) Large and talented team good in computational linguistics
4) a very practical theory that can model language effectively for
your purposes (seriously lacking for even small use cases in even
major languages)
5) since you want to do MT, you need one more theory to handle the
target language ... maybe even an IL model if you go that route
instead of direct translation.
It took 50+ years of research in MT to get this far, yes. Please look
at some of the academic journals to undestand this. I think even
Rich&Knight elementary textbook in AI has a chapter on why MT is not
possible in the short term.
Regards
Rajeev J Sebastian
Huh ?
>>
>> 2) Extremely large corpus ... if you want to make a practical system
>
> Only if you adopt copus based model. That is not going to practical in right
> now in the case of English to Malayalam translation
It is not practical to make *anything* without a corpus. Even if you
use a non-corpus based methodology to perform translation, you still
need a large corpus to *validate* that your method works for more than
toy examples. This is the biggest problem that faces any NLP work for
Indic languages, and one that some glorified institutions in India
neither builds up nor shares, most probably because all their systems
are capable of are translating toy examples.
>>
>> 3) Large and talented team good in computational linguistics
>
> Where is it? We can build up this
Best of Luck.
>>
>> 4) a very practical theory that can model language effectively for
>> your purposes (seriously lacking for even small use cases in even
>> major languages)
>
> A perfect grammar for Malayalam is required. Especially in Sysntax and
> Morphology. Malayalam really lacks such studies.
I don't think any language has such an in-depth model that could be
used for generic MT. There are of course, special case models ...
which can be used for special cases.
>>
>> 5) since you want to do MT, you need one more theory to handle the
>> target language ... maybe even an IL model if you go that route
>> instead of direct translation.
>
> First of all we need a good English to Malayalam dict in e-format. Which
> gives excat meaning POS, etc. Not like one saying Science - ശാസ്ത്രം,
> തര്ക്കശാസ്ത്രം like.
POS tagged dataset is just one component of a complete corpus.
Regards
Rajeev J Sebastian
Huh ?
On Fri, Jul 24, 2009 at 7:02 PM, JAGANADH G<jaga...@gmail.com> wrote:
>
>
> On Fri, Jul 24, 2009 at 5:29 PM, Rajeev J Sebastian
> <rajeev.s...@gmail.com> wrote:
>>
>> On Fri, Jul 24, 2009 at 5:19 PM, Varewoolf<vare...@gmail.com> wrote:
>> >
>> > i am so much interested to make this happen... i am always interested
>> > in linguistics...
>> > anybody tell me wat r the things we need primarily??
>>
>> How about ...
>>
>> 1) 50+ years of research (actually, 2000 if you consider Panini)
>
> It is history ? If you can work hard you can reduce the zero from it.
It is not practical to make *anything* without a corpus. Even if you
>>
>> 2) Extremely large corpus ... if you want to make a practical system
>
> Only if you adopt copus based model. That is not going to practical in right
> now in the case of English to Malayalam translation
use a non-corpus based methodology to perform translation, you still
need a large corpus to *validate* that your method works for more than
toy examples. This is the biggest problem that faces any NLP work for
Indic languages, and one that some glorified institutions in India
neither builds up nor shares, most probably because all their systems
are capable of are translating toy examples.
Best of Luck.
>>
>> 3) Large and talented team good in computational linguistics
>
> Where is it? We can build up this
I don't think any language has such an in-depth model that could be
>>
>> 4) a very practical theory that can model language effectively for
>> your purposes (seriously lacking for even small use cases in even
>> major languages)
>
> A perfect grammar for Malayalam is required. Especially in Sysntax and
> Morphology. Malayalam really lacks such studies.
used for generic MT. There are of course, special case models ...
which can be used for special cases.
POS tagged dataset is just one component of a complete corpus.
>>
>> 5) since you want to do MT, you need one more theory to handle the
>> target language ... maybe even an IL model if you go that route
>> instead of direct translation.
>
> First of all we need a good English to Malayalam dict in e-format. Which
> gives excat meaning POS, etc. Not like one saying Science - ശാസ്ത്രം,
> തര്ക്കശാസ്ത്രം like.
Regards
Rajeev J Sebastian
hi all,
Machine Translation is one of the toughest Language computing problems and newer ideas and thoughts are coming up every year. Ministry of Communication Information Technology is spending lot of money on the project(along with some other projects). M.T. System for Malayalam is being developed by Tamil University, Tanchavoor. From what i understand, they are using a corpus based approach, tailored for a set of sentences than a generic algorithm.
When i talked to a friend, he pointed out somethings like, we need to think of the deviations from base grammer rules, when designing a system for real translation. I think whatever we do, translation process will remain same(remove all agglutination, identify key words, their POS and using that information, translate). Sandhi splitting and POS tagging are the important steps to tackle in my view.
May be Jagan, Santhosh Rajeev and all can add more to this. From what i understand, a normal rules based system wont work that well for malayalam since rules are not much followed in the normal writing scheme(both are right kind of approach).
JAGANADH G wrote:As I promised I prepared my notes on how to start on this project . Please find it @
http://jaganadhg.freeflux.net/blog/archive/2009/08/29/on-development-of-an-open-source-machine-translation-system-for-english-to-indian-languages.html
http://jaganadhg.freeflux.net/blog/archive/2009/08/30/on-english-to-indian-language-mt-ii.html
I think I already posted the link in this group .
<<BnTable.pm is for handling grammar and BnSonshi.pm is for Sandhi. >>
so wat abt making two db for malayalam grammar and Sandhi..??
Malayalam is more complex than Bengali( i am not sure!!).. but i think we can follow the directory/structure of this project.. please provide deeper info about it and suggestions..