I’m helping a friend out with a translating project. The work is pretty manual and requires reading lots of strings and translating them manually one by one. I noticed that in several cases there are very common recurring phrases. Like “The answer is:” etc. So I got the idea that one could automate a large part of the translation job if we parse the entire text and find the most common recurring phrases/sentences. If I then translate these manually then we can insert/replace them in the text and save a lot of time.
Problem is, I know nothing about this topic or if its even possible.
Summary of what I want to do:
Write a python script which parses an xml-file containing a bunch of strings, then in an unsupervised manner, finds common phrases like “The answer is:, Click here” etc.
I browsed through the NLTK textbook and found some stuff about frequency plots and concurrence, but this seems to only work for individual words or bigrams. Does the problem get too complex with sentances?
Thank for any helpful info.
--
You received this message because you are subscribed to the Google Groups "nltk-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nltk-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the Google Groups "nltk-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/nltk-users/kwao-veBKfI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to nltk-users+unsubscribe@googlegroups.com.
To unsubscribe from this group and all its topics, send an email to nltk-users+...@googlegroups.com.
To unsubscribe from this group and all its topics, send an email to nltk-users+unsubscribe@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to nltk-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the Google Groups "nltk-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/nltk-users/kwao-veBKfI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to nltk-users+unsubscribe@googlegroups.com.