english malayalam translator

532 views
Skip to first unread message

Varewoolf

unread,
Jul 22, 2009, 11:51:11 PM7/22/09
to ilug...@googlegroups.com
is there any good online english malayalam translator?? i am not
talking about transilerator .. i am asking that direct translation,,

i need to translate.. malyalam is my mother toungue-->> malayalam
ente mathrubhasha

i dont find Malayalam in google translation list. is there any thing
available .. or is there space for an initiative to this??

aniyathikkutty

unread,
Jul 23, 2009, 12:04:17 AM7/23/09
to ilug...@googlegroups.com

Varewoolf

unread,
Jul 23, 2009, 12:12:03 AM7/23/09
to ilug...@googlegroups.com
ty for the info.. but i afraid u dint get wat i said.. i said i need
a translator.. not a transilerator..that means ..i need to convert an
english sentence to malayalam .. see..

i am a human... i need this to convert as " njan oru manushyan " .. i
hope now u get the idea..




Wolfy

http://jeevanism.wordpress.com

aniyathikkutty

unread,
Jul 23, 2009, 12:17:59 AM7/23/09
to ilug...@googlegroups.com
@vare...@gmail.com


Hi...
i am sorry.
i just think that if you write enikariyamayirunnu...then it shows എനികരിയമയിരൂന്നു.
Thats y i suggest that links.
--

JAGANADH G

unread,
Jul 23, 2009, 3:23:17 AM7/23/09
to ilug...@googlegroups.com
No such applications are available.
It is not an easy job to develope such one.

Wait let us hope that some one will brig it soon line Google.

Jaganadh G
--
**********************************
JAGANADH G
http://jaganadhg.freeflux.net/blog

Varewoolf

unread,
Jul 23, 2009, 11:47:07 PM7/23/09
to ilug...@googlegroups.com
instead of waiting for the messiah.. why cant we give a try ??

Santhosh സന്തോഷ് VS

unread,
Jul 24, 2009, 12:18:38 AM7/24/09
to ilug...@googlegroups.com
I have a database of malayalam - english (had to search from the old files)
It is quite difficult to deal with such lingustic translation.
It needs some sense of grammer to deal with, i think automating this will be a tough task.
-- 
http://techplex.wordpress.com
 

JAGANADH G

unread,
Jul 24, 2009, 1:46:30 AM7/24/09
to ilug...@googlegroups.com


2009/7/24 Santhosh സന്തോഷ് VS <everlov...@gmail.com>

I have a database of malayalam - english (had to search from the old files)
It is quite difficult to deal with such lingustic translation.
It needs some sense of grammer to deal with, i think automating this will be a tough task.

It is not only a software solution but also a Linguistic solution.
It will take time . If any body is willing to host the project provide linguistics resourses as well as help I can guide.
 

Varewoolf

unread,
Jul 24, 2009, 7:49:42 AM7/24/09
to ilug...@googlegroups.com
i am so much interested to make this happen... i am always interested
in linguistics...
anybody tell me wat r the things we need primarily??

Rajeev J Sebastian

unread,
Jul 24, 2009, 7:59:01 AM7/24/09
to ilug...@googlegroups.com
On Fri, Jul 24, 2009 at 5:19 PM, Varewoolf<vare...@gmail.com> wrote:
>
> i am so much interested to make this happen... i am always interested
> in linguistics...
> anybody tell me wat r the things we need primarily??

How about ...

1) 50+ years of research (actually, 2000 if you consider Panini)
2) Extremely large corpus ... if you want to make a practical system
3) Large and talented team good in computational linguistics
4) a very practical theory that can model language effectively for
your purposes (seriously lacking for even small use cases in even
major languages)
5) since you want to do MT, you need one more theory to handle the
target language ... maybe even an IL model if you go that route
instead of direct translation.

There you go ...


Regards
Rajeev J Sebastian

JAGANADH G

unread,
Jul 24, 2009, 9:32:23 AM7/24/09
to ilug...@googlegroups.com
On Fri, Jul 24, 2009 at 5:29 PM, Rajeev J Sebastian <rajeev.s...@gmail.com> wrote:

On Fri, Jul 24, 2009 at 5:19 PM, Varewoolf<vare...@gmail.com> wrote:
>
> i am so much interested to make this happen... i am always interested
> in linguistics...
> anybody tell me wat r the things we need primarily??

How about ...

1) 50+ years of research (actually, 2000 if you consider Panini)
It is history ? If you can work hard you can reduce the zero from it.

2) Extremely large corpus ... if you want to make a practical system
Only if you adopt copus based model. That is not going to practical in right now in the case of English to Malayalam translation

3) Large and talented team good in computational linguistics
Where is it? We can build up this

4) a very practical theory that can model language effectively for
your purposes (seriously lacking for even small use cases in even
major languages)
A perfect grammar for Malayalam is required. Especially in Sysntax and Morphology. Malayalam really lacks such studies.

5) since you want to do MT, you need one more theory to handle the
target language ... maybe even an IL model if you go that route
instead of direct translation.

First of all we need a good English to Malayalam dict in e-format.  Which gives excat meaning POS, etc. Not like one saying Science - ശാസ്ത്രം, തര്‍ക്കശാസ്ത്രം like.

Varewoolf

unread,
Jul 24, 2009, 11:37:45 AM7/24/09
to ilug...@googlegroups.com
i want to ask Rajiv J .. did u chk Google translator page ?? so u
think they took 50 yrs to enlist those languages??

jinesh kj

unread,
Jul 24, 2009, 11:38:08 AM7/24/09
to ilug...@googlegroups.com, smc-d...@googlegroups.com
hi all,

I have some comments, which i will post later when i have time, but i think its better to move the discussion to smc-discuss

cheers

Jinesh K J
--
My Feelings,Expressions-
http://logbookofanobserver.blogspot.com

My scribblings-
http://logbookofanobserver.wordpress.com

SMC : My computer, My language http://smc.org.in
സ്വതന്ത്ര മലയാളം കമ്പ്യൂട്ടിങ്ങ്, എന്റെ കമ്പ്യൂട്ടറിന് എന്റെ ഭാഷ

Rajeev J Sebastian

unread,
Jul 24, 2009, 3:04:51 PM7/24/09
to ilug...@googlegroups.com
On Fri, Jul 24, 2009 at 9:07 PM, Varewoolf<vare...@gmail.com> wrote:
>
> i want  to ask Rajiv J .. did u chk Google translator page ?? so u
> think they took 50 yrs to enlist those languages??

It took 50+ years of research in MT to get this far, yes. Please look
at some of the academic journals to undestand this. I think even
Rich&Knight elementary textbook in AI has a chapter on why MT is not
possible in the short term.


Regards
Rajeev J Sebastian

Rajeev J Sebastian

unread,
Jul 24, 2009, 3:11:06 PM7/24/09
to ilug...@googlegroups.com
On Fri, Jul 24, 2009 at 7:02 PM, JAGANADH G<jaga...@gmail.com> wrote:
>
>
> On Fri, Jul 24, 2009 at 5:29 PM, Rajeev J Sebastian
> <rajeev.s...@gmail.com> wrote:
>>
>> On Fri, Jul 24, 2009 at 5:19 PM, Varewoolf<vare...@gmail.com> wrote:
>> >
>> > i am so much interested to make this happen... i am always interested
>> > in linguistics...
>> > anybody tell me wat r the things we need primarily??
>>
>> How about ...
>>
>> 1) 50+ years of research (actually, 2000 if you consider Panini)
>
> It is history ? If you can work hard you can reduce the zero from it.

Huh ?

>>
>> 2) Extremely large corpus ... if you want to make a practical system
>
> Only if you adopt copus based model. That is not going to practical in right
> now in the case of English to Malayalam translation

It is not practical to make *anything* without a corpus. Even if you
use a non-corpus based methodology to perform translation, you still
need a large corpus to *validate* that your method works for more than
toy examples. This is the biggest problem that faces any NLP work for
Indic languages, and one that some glorified institutions in India
neither builds up nor shares, most probably because all their systems
are capable of are translating toy examples.

>>
>> 3) Large and talented team good in computational linguistics
>
> Where is it? We can build up this

Best of Luck.

>>
>> 4) a very practical theory that can model language effectively for
>> your purposes (seriously lacking for even small use cases in even
>> major languages)
>
> A perfect grammar for Malayalam is required. Especially in Sysntax and
> Morphology. Malayalam really lacks such studies.

I don't think any language has such an in-depth model that could be
used for generic MT. There are of course, special case models ...
which can be used for special cases.

>>
>> 5) since you want to do MT, you need one more theory to handle the
>> target language ... maybe even an IL model if you go that route
>> instead of direct translation.
>
> First of all we need a good English to Malayalam dict in e-format.  Which
> gives excat meaning POS, etc. Not like one saying Science - ശാസ്ത്രം,
> തര്‍ക്കശാസ്ത്രം like.

POS tagged dataset is just one component of a complete corpus.

Regards
Rajeev J Sebastian

JAGANADH G

unread,
Jul 25, 2009, 1:01:15 AM7/25/09
to ilug...@googlegroups.com
On Sat, Jul 25, 2009 at 12:41 AM, Rajeev J Sebastian <rajeev.s...@gmail.com> wrote:

On Fri, Jul 24, 2009 at 7:02 PM, JAGANADH G<jaga...@gmail.com> wrote:
>
>
> On Fri, Jul 24, 2009 at 5:29 PM, Rajeev J Sebastian
> <rajeev.s...@gmail.com> wrote:
>>
>> On Fri, Jul 24, 2009 at 5:19 PM, Varewoolf<vare...@gmail.com> wrote:
>> >
>> > i am so much interested to make this happen... i am always interested
>> > in linguistics...
>> > anybody tell me wat r the things we need primarily??
>>
>> How about ...
>>
>> 1) 50+ years of research (actually, 2000 if you consider Panini)
>
> It is history ? If you can work hard you can reduce the zero from it.

Huh ?

>>
>> 2) Extremely large corpus ... if you want to make a practical system
>
> Only if you adopt copus based model. That is not going to practical in right
> now in the case of English to Malayalam translation

It is not practical to make *anything* without a corpus. Even if you
use a non-corpus based methodology to perform translation, you still
need a large corpus to *validate* that your method works for more than
toy examples. This is the biggest problem that faces any NLP work for
Indic languages, and one that some glorified institutions in India
neither builds up nor shares, most probably because all their systems
are capable of are translating toy examples.
I know that thre are non -free systems under dvevelopment which is more advanced that Google translate service(English Hindi). But when they will relese it I dont know.
 


>>
>> 3) Large and talented team good in computational linguistics
>
> Where is it? We can build up this

Best of Luck.

>>
>> 4) a very practical theory that can model language effectively for
>> your purposes (seriously lacking for even small use cases in even
>> major languages)
>
> A perfect grammar for Malayalam is required. Especially in Sysntax and
> Morphology. Malayalam really lacks such studies.

I don't think any language has such an in-depth model that could be
used for generic MT. There are of course, special case models ...
which can be used for special cases.
The Sanskrit grammar is a perfect model.
 

>>
>> 5) since you want to do MT, you need one more theory to handle the
>> target language ... maybe even an IL model if you go that route
>> instead of direct translation.
>
> First of all we need a good English to Malayalam dict in e-format.  Which
> gives excat meaning POS, etc. Not like one saying Science - ശാസ്ത്രം,
> തര്‍ക്കശാസ്ത്രം like.

POS tagged dataset is just one component of a complete corpus.
POS Tagged corpus is a variety of corpus.
 

Regards
Rajeev J Sebastian




Varewoolf

unread,
Jul 26, 2009, 12:59:43 PM7/26/09
to ilug...@googlegroups.com
so wat might be the next step??

JAGANADH G

unread,
Jul 27, 2009, 12:56:28 AM7/27/09
to ilug...@googlegroups.com
If you are really interested drop me a mail. Are you familier with Perl programming ?

jinesh kj

unread,
Jul 27, 2009, 2:03:37 AM7/27/09
to ilug...@googlegroups.com, smc-d...@googlegroups.com
hi all,

Machine Translation is one of the toughest Language computing problems and newer ideas and thoughts are coming up every year. Ministry of Communication Information Technology is spending lot of money on the project(along with some other projects). M.T. System for Malayalam is being developed by Tamil University, Tanchavoor. From what i understand, they are using a corpus based approach, tailored for a set of sentences than a generic algorithm.

When i talked to a friend, he pointed out somethings like, we need to think of the deviations from base grammer rules, when designing a system for real translation. I think whatever we do, translation process will remain same(remove all agglutination, identify key words, their POS and using that information, translate). Sandhi splitting and POS tagging are the important steps to tackle in my view.

May be Jagan, Santhosh Rajeev and all can add more to this. From what i understand, a normal rules based system wont work that well for malayalam since rules are not much followed in the normal writing scheme(both are right kind of approach).

cheers

Jinesh K J

JAGANADH G

unread,
Jul 27, 2009, 2:53:05 AM7/27/09
to smc-d...@googlegroups.com, ilug...@googlegroups.com
On Mon, Jul 27, 2009 at 11:33 AM, jinesh kj <jine...@gmail.com> wrote:
hi all,

Machine Translation is one of the toughest Language computing problems and newer ideas and thoughts are coming up every year. Ministry of Communication Information Technology is spending lot of money on the project(along with some other projects). M.T. System for Malayalam is being developed by Tamil University, Tanchavoor. From what i understand, they are using a corpus based approach, tailored for a set of sentences than a generic algorithm.
Ya I know this. Thanjavoor people are working onTamil<-> Malayalam machine translation. They are customizing the anusaarak approach developed by Aksharbharatigroup. That system is a language acquistion system that MT (In the original developers view). The system algo has its own advantages and limitations. A group of C-DAC people are also nvolved in English to Indian languages (Including Malayalam). I dont know any of these systems are Open Or Not. So why I was not mentioning the name.
 

When i talked to a friend, he pointed out somethings like, we need to think of the deviations from base grammer rules, when designing a system for real translation. I think whatever we do, translation process will remain same(remove all agglutination, identify key words, their POS and using that information, translate). Sandhi splitting and POS tagging are the important steps to tackle in my view.
More clearly Sourcelanguage Sentence -> Parsing(For pattern Identification) -> Convert to target language Syntactic pattern --> Taget Language Text generation . This is the broad block view of MT system. Whether POS tagger should be there depend your design.
The harder part in Indian Language to Indian Language (from my experience) is Morphological Analysis as well as Sandhi splitting. Some sort of heuristics is required for Sandhi splitting. Computing Kerala Paniniyam will not solve the problem Even for Sanskrit extensive Sandhi rules are there. But people who engaged in Sanskrit Computing calls it as a baffling problem.Sandhi Splitter is a required component in Morphological analyzer and Morphological analyzer requires a Sandhi splitter (A kind of ded lock).

May be Jagan, Santhosh Rajeev and all can add more to this. From what i understand, a normal rules based system wont work that well for malayalam since rules are not much followed in the normal writing scheme(both are right kind of approach).
If some body really interested we can build a small system with in one year. I will tell the plan with in a day or two.
 

Visakh

unread,
Jul 27, 2009, 11:52:03 AM7/27/09
to Free Software Users Group, Thiruvananthapuram
Hi,

On Jul 27, 9:56 am, JAGANADH G <jagana...@gmail.com> wrote:
> If you are really interested drop me a mail. Are you familier with Perl
> programming ?

If you are planning to create a new team and project for this, I
suggest that you do all communications related to it on a public forum
like a mailing list. Then others will be able to follow the ideas and
progress of the project. And they will be able to join the project
after assessing it and themselves.

Sir, I like your attitude in trying to build an expert team by
mentoring. But you will reduce the chance of that succeeding if you
choose to keep all communication personal. Just so that you know, a
few of us are following your discussions silently, but intently.

Regards,
Gokul Das

Varewoolf

unread,
Jul 27, 2009, 12:28:00 PM7/27/09
to ilug...@googlegroups.com, smc-d...@googlegroups.com
i have read these mails.. oops I dont have these much knowledge abt
MT and corpus etc thing.. but i am more ready to do any volunteer work
to make this happen. i have a good command over Malayalam and
English..so how could be this translation actually work ??
show me the path, i will walk through..

JAGANADH G

unread,
Jul 28, 2009, 1:31:38 AM7/28/09
to ilug...@googlegroups.com
Actually I intented to send mail that much . Not ment for a personal mail. Sorry for my miscommunication. Ok I am preparing the draft doc for working it out. Will get back soon.
Jagan

jeevachaithanyan sivanandan

unread,
Jul 28, 2009, 1:19:19 PM7/28/09
to Free Software Users Group, Thiruvananthapuram
so wat should we do first?? getting an E-dictionary is the basic step
rite??

On Jul 27, 9:28 pm, Varewoolf <varewo...@gmail.com> wrote:
> i have read these mails.. oops I  dont have these much knowledge abt
> MT and corpus etc thing.. but i am more ready to do any volunteer work
> to make this happen. i have a good command overMalayalam andEnglish..so how could be this translation actually work ??
> show me the path, i will walk through..
>
> On Mon, Jul 27, 2009 at 12:23 PM, JAGANADH G<jagana...@gmail.com> wrote:
>
> > On Mon, Jul 27, 2009 at 11:33 AM, jinesh kj <jines...@gmail.com> wrote:
>
> >> hi all,
>
> >> Machine Translation is one of the toughest Language computing problems and
> >> newer ideas and thoughts are coming up every year. Ministry of Communication
> >> Information Technology is spending lot of money on the project(along with
> >> some other projects). M.T. System forMalayalamis being developed by Tamil
> >> University, Tanchavoor. From what i understand, they are using a corpus
> >> based approach, tailored for a set of sentences than a generic algorithm.
>
> > Ya I know this. Thanjavoor people are working onTamil<->Malayalammachine
> > translation. They are customizing the anusaarak approach developed by
> > Aksharbharatigroup. That system is a language acquistion system that MT (In
> > the original developers view). The system algo has its own advantages and
> > limitations. A group of C-DAC people are also nvolved inEnglishto Indian
> > languages (IncludingMalayalam). I dont know any of these systems are Open
> >> On Mon, Jul 27, 2009 at 10:26 AM, JAGANADH G <jagana...@gmail.com> wrote:
>
> >>> If you are really interested drop me a mail. Are you familier with Perl
> >>> programming ?
>
> >>> On Sun, Jul 26, 2009 at 10:29 PM, Varewoolf <varewo...@gmail.com> wrote:
>
> >>>> so wat might be the next step??
>
> >>>> On Sat, Jul 25, 2009 at 10:31 AM, JAGANADH G<jagana...@gmail.com> wrote:
>
> >>>> > On Sat, Jul 25, 2009 at 12:41 AM, Rajeev J Sebastian
> >>>> > <rajeev.sebast...@gmail.com> wrote:
>
> >>>> >> On Fri, Jul 24, 2009 at 7:02 PM, JAGANADH G<jagana...@gmail.com>
> >>>> >> wrote:
>
> >>>> >> > On Fri, Jul 24, 2009 at 5:29 PM, Rajeev J Sebastian
> >>>> >> > <rajeev.sebast...@gmail.com> wrote:
>
> >>>> >> >> On Fri, Jul 24, 2009 at 5:19 PM, Varewoolf<varewo...@gmail.com>
> >>>> > advanced that Google translate service(EnglishHindi). But when they
> >>>> > will
> >>>> > relese it I dont know.
>
> >>>> >> >> 3) Large and talented team good in computational linguistics
>
> >>>> >> > Where is it? We can build up this
>
> >>>> >> Best of Luck.
>
> >>>> >> >> 4) a very practical theory that can model language effectively for
> >>>> >> >> your purposes (seriously lacking for even small use cases in even
> >>>> >> >> major languages)
>
> >>>> >> > A perfect grammar forMalayalamis required. Especially in Sysntax
> >>>> >> > and
> >>>> >> > Morphology.Malayalamreally lacks such studies.
>
> >>>> >> I don't think any language has such an in-depth model that could be
> >>>> >> used for generic MT. There are of course, special case models ...
> >>>> >> which can be used for special cases.
>
> >>>> > The Sanskrit grammar is a perfect model.
>
> >>>> >> >> 5) since you want to do MT, you need one more theory to handle the
> >>>> >> >> target language ... maybe even an IL model if you go that route
> >>>> >> >> instead of direct translation.
>
> >>>> >> > First of all we need a goodEnglishtoMalayalamdict in e-format.
> >>>> >> > Which
> >>>> >> > gives excat meaning POS, etc. Not like one saying Science -
> >>>> >> > ശാസ്ത്രം,
> >>>> >> > തര്‍ക്കശാസ്ത്രം like.
>
> >>>> >> POS tagged dataset is just one component of a complete corpus.
>
> >>>> > POS Tagged corpus is a variety of corpus.
>
> >>>> >> Regards
> >>>> >> Rajeev J Sebastian
>
> >>>> > --
> >>>> > **********************************
> >>>> > JAGANADH G
> >>>> >http://jaganadhg.freeflux.net/blog
>
> >>> --
> >>> **********************************
> >>> JAGANADH G
> >>>http://jaganadhg.freeflux.net/blog
>
> >> --
> >> My Feelings,Expressions-
> >>http://logbookofanobserver.blogspot.com
>
> >> My scribblings-
> >>http://logbookofanobserver.wordpress.com
>
> >> SMC : My computer, My languagehttp://smc.org.in

JAGANADH G

unread,
Aug 29, 2009, 11:03:07 AM8/29/09
to ilug...@googlegroups.com
My initial documentation on this topic is available in my blog. http://jaganadhg.freeflux.net/blog

jeevanism

unread,
Oct 19, 2009, 1:49:42 PM10/19/09
to ilug...@googlegroups.com
any new activity...

JAGANADH G

unread,
Oct 19, 2009, 4:13:47 AM10/19/09
to ilug...@googlegroups.com

jeevanism

unread,
Oct 19, 2009, 4:21:57 PM10/19/09
to ilug...@googlegroups.com
thanx a lot  for the initiation...


<<BnTable.pm is  for handling grammar and BnSonshi.pm is for Sandhi. >>

so wat abt making  two db for malayalam grammar and Sandhi..??

Malayalam is more complex than Bengali( i am not sure!!)..  but  i think we can follow the directory/structure of this project.. please provide  deeper info about it and suggestions..


JAGANADH G

unread,
Oct 19, 2009, 6:03:07 AM10/19/09
to ilug...@googlegroups.com
On Tue, Oct 20, 2009 at 1:51 AM, jeevanism <jeev...@gmail.com> wrote:

<<BnTable.pm is  for handling grammar and BnSonshi.pm is for Sandhi. >>

so wat abt making  two db for malayalam grammar and Sandhi..??

Malayalam is more complex than Bengali( i am not sure!!)..  but  i think we can follow the directory/structure of this project.. please provide  deeper info about it and suggestions..


Before going in to the *.pm files you have to create the English Malayalam dictionary.
Reply all
Reply to author
Forward
0 new messages