GSoc Idea : Translation Tools

71 views
Skip to first unread message

Shagun Sodhani

unread,
Feb 28, 2014, 3:57:17 AM2/28/14
to build...@googlegroups.com
Hi

I am a student from Indian Institute of Technology, Roorkee, India. I am a 3rd year student pursuing B.Tech in Computer Science and Engineering. I am interested to participate in GSoc and I find your project titled "Translation Tools" very interesting and relevant.

I have been coding in Python for many years. I have also worked in the field of Natural Language Processing and I believe that I can take this project a step forward by translating the strings in an "intelligent" way so that our translation is just not a word by word translation. Rather the translation will keep the meaning of the sentence intact. Also if time permits, we can add a check for validating user input sentences for correctness in terms of grammar. We can add sentiment analysis as well so that the output sentence retains the same meaning as the given sentence else a sarcastic sentence when translated could lose its real meaning. This way user will input a sentence (which might have a few grammar mistakes) and a language of his choice. We will return him the best possible translation of his sentence. 
   
Could you guide me as to how do I start for this project and who can I contact to discuss this idea further.

Cheers!

Rahul Ahuja

unread,
Feb 28, 2014, 5:38:02 AM2/28/14
to build...@googlegroups.com
Hi Shagun

I have been coding in Python for many years. I have also worked in the field of Natural Language Processing and I believe that I can take this project a step forward by translating the strings in an "intelligent" way so that our translation is just not a word by word translation. Rather the translation will keep the meaning of the sentence intact. Also if time permits, we can add a check for validating user input sentences for correctness in terms of grammar. We can add sentiment analysis as well so that the output sentence retains the same meaning as the given sentence else a sarcastic sentence when translated could lose its real meaning. This way user will input a sentence (which might have a few grammar mistakes) and a language of his choice. We will return him the best possible translation of his sentence. 

I am not sure if sentiment analysis is required for translating. However, we could use the grammar fixes that you are talking about. Please elaborate on the approach you plan to take to correct grammar for different languages.

Could you guide me as to how do I start for this project and who can I contact to discuss this idea further.

Start by exploring this group for already discussed ideas and post your thoughts / ideas here to discuss further. 

Shagun Sodhani

unread,
Feb 28, 2014, 5:57:50 AM2/28/14
to Rahul Ahuja, build...@googlegroups.com
Hi

Thanks for your feedback on my mail. I have tried to clarify the points you raised : 

>I am not sure if sentiment analysis is required for translating. 

I agree that it is not needed when we are doing a word translation as a word will not have a defined sentiment on its own. But when I use the word in a sentence, it gets a sentiment/context attached to it. Now for the same word I can have different translations. I am not aware of which of the many alternatives are returned by the google/bing API. But we can improve on the utility of the app by returning the most appropriate word given the context and the sentiment of the entire sentence.

>Please elaborate on the approach you plan to take to correct grammar for different languages.

We will obtain the translation for the entire sentence in the desired language. Then we will break down the entire sentence into lexemes(words) and will get translation for each word. Now we will put the words in correct places depending on the grammar of the language. eg a simple rule in sentence formation in English is subject + verb + object eg I(subject) +  am playing(verb) +  football(object) . When translated word by word in say Hindi, our Hindi sentence will also have the same structure while the correct structure for Hindi is subject + object + verb eg : Main(subject) + football (object) + khelta hun (verb). So this way we will map the translated words with the words in the translated sentence in accordance with grammar rules. More rules we consider, more accurate our app would be.

Cheers! 

--
You received this message because you are subscribed to a topic in the Google Groups "BuildmLearn" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/buildmlearn/BVxUFmQQ-K0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to buildmlearn...@googlegroups.com.
To post to this group, send email to build...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/buildmlearn/d85d4858-71a1-4fa5-a7f5-8e33c52f004b%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Rahul Ahuja

unread,
Feb 28, 2014, 6:05:57 AM2/28/14
to build...@googlegroups.com, Rahul Ahuja
Hi Shagun
 
I agree that it is not needed when we are doing a word translation as a word will not have a defined sentiment on its own. 
But when I use the word in a sentence, it gets a sentiment/context attached to it. Now for the same word I can have different translations.

I disagree. We wouldn't need sentiment analysis (which indicates the feeling) in word or sentences too. All we need is grammar correction, if you can provide that.

 
We will obtain the translation for the entire sentence in the desired language. Then we will break down the entire sentence into lexemes(words) and will get translation for each word. Now we will put the words in correct places depending on the grammar of the language. eg a simple rule in sentence formation in English is subject + verb + object eg I(subject) +  am playing(verb) +  football(object) . When translated word by word in say Hindi, our Hindi sentence will also have the same structure while the correct structure for Hindi is subject + object + verb eg : Main(subject) + football (object) + khelta hun (verb). So this way we will map the translated words with the words in the translated sentence in accordance with grammar rules. More rules we consider, more accurate our app would be.

Correct. You can formulate these rules for English and Hindi because you know these languages. So my question is how many languages do you know? 
This isn't a scalable approach since the rules of English and Hindi may not apply to all languages. 

Rahul

Shagun Sodhani

unread,
Feb 28, 2014, 6:26:01 AM2/28/14
to Rahul Ahuja, build...@googlegroups.com
>
I disagree. We wouldn't need sentiment analysis (which indicates the feeling) in word or sentences too.

Ohk lets take an example : "That man is a snake". I do a word by word translation - I reach the word snake. Possible options are (these results are from google translate) :

साँप
snake, reptile, serpent, reptilian, rattlesnake, ophidian
नाग
snake, rattlesnake, viper, ophidian
भुजंग
serpent, snake
पाजी
cad, twerp, skunk, reprobate, stinker, snake
बदमाश
punk, skunk, scoundrel, villain, hoodlum, snake

If I do not use a context/sentiment analysis, I end up using the word साँप while a better choice would be पाजी or बदमाश

And this is just one example. There would be sentences where context and semantics could make all the difference. But its just a suggestion - just another feature the app can have.

>All we need is grammar correction, if you can provide that.

Yes I believe I can help with this work.

>
This isn't a scalable approach since the rules of English and Hindi may not apply to all languages.

Yes! I agree its not scaleable given we would need experts for all the languages. Maybe this approach can not be perfected for even the languages that we speak everyday. This is where "Learning" can come in. We can code a classifier, train it with sentences from different languages so that it can "learn" to classify a given sentence as correct or wrong. I am not sure of its scalability. Another viable option would be to use results of different language translation API's, combine them to get the 'best' translation. Either way taking the grammar syntax into account is important.

--
You received this message because you are subscribed to a topic in the Google Groups "BuildmLearn" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/buildmlearn/BVxUFmQQ-K0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to buildmlearn...@googlegroups.com.
To post to this group, send email to build...@googlegroups.com.

Rahul Ahuja

unread,
Feb 28, 2014, 12:41:16 PM2/28/14
to build...@googlegroups.com, Rahul Ahuja

Hi Shagun
 
Ohk lets take an example : "That man is a snake". I do a word by word translation - I reach the word snake. Possible options are (these results are from google translate) :
And this is just one example. There would be sentences where context and semantics could make all the difference. But its just a suggestion - just another feature the app can have.

Well, that's true but most of the softwares have informative / instructive tone of language and not sentimental. 

Your point is valid if the target audience of the tool was different i.e. conversational translation. Our tool is going to be used for localisation of strings (used in apps) and not for creating translated subtitles of a movie :-)
 
We can code a classifier, train it with sentences from different languages so that it can "learn" to classify a given sentence as correct or wrong. I am not sure of its scalability. Another viable option would be to use results of different language translation API's, combine them to get the 'best' translation
 
I think you need to have some semantics available for all the languages that you are using i.e. what are the grammar rules applying for each language. It is difficult to get it yourself. You'll need a good source that has this information.

. Either way taking the grammar syntax into account is important.

Yes, it is. How you plan to solve this problem is something we would be interesting in reading in your proposal.

Shagun Sodhani

unread,
Feb 28, 2014, 1:39:50 PM2/28/14
to Rahul Ahuja, build...@googlegroups.com
>Our tool is going to be used for localisation of strings (used in apps) and not for creating translated subtitles of a movie :-)

Point taken :)

>You'll need a good source that has this information.

You mean some NLP library(related to grammar structure) or some data-set(related to classifier approach)?

>How you plan to solve this problem is something we would be interesting in reading in your proposal.

I will try to take up this point elaborately in my proposal.

Also is their any other requirement/expectation to be eligible to work on this particular idea?

Cheers!

--
You received this message because you are subscribed to a topic in the Google Groups "BuildmLearn" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/buildmlearn/BVxUFmQQ-K0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to buildmlearn...@googlegroups.com.
To post to this group, send email to build...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages