Predictive Text?

Jack Donn

unread,

Aug 3, 2015, 8:33:03 AM8/3/15

to berkeleylm-discuss

Hello,

For the past couple of weeks I have been reading through your library and trying out a few things, thank you very much for sharing it.

My intention is to use your library to create a predictive text program i.e. predict the most probable next word when given an unfinished sentence. Sadly I have been unable to find an efficient method to do this. After becoming familiar with your library and reading much of the documentation along with the papers, I have come to the conclusion that this may not be possible and just wanted to confirm that here (in case I have missed something).

From what I understand the data structure you have designed to store the language models is designed to work efficiently in the opposite way, i.e. starting with the last word in the N-gram and then finding its preceding words. This is extremely efficient for your intended use and I can see why you have designed it in this way, however I fear this means I will not be able to make use of your library to create a predictive text program.

I have found one way of making such predictions, by using the method getDistributionOverNextWords(NgramLanguageModel<W> lm, List<W> context) from the class edu.berkeley.nlp.lm.NgramLanguageModel.StaticMethods. I have tested this method with the English Google Books binaries you kindly provided, however using this method on my machine (3.2 GHz Intel Core i5, 32 GB RAM) still takes an average of 17 seconds to find the most probable next word in a sentence. As mentioned in the JavaDoc comments, some of the methods in this file (NgramLanguageModel.java) are inefficient convenience methods, with more efficient equivalents to be found in the implementing classes. However, despite many hours of searching, I have not managed to find a more efficient method to return the next probable word in a sentence. I imagine one does not exist as this is not the intended use for your library, but as a last resort I thought it would be worth querying here before giving up completely.

Thank you very much for your time, I look forward to your response, please let me know if you require any more information from me or code samples.

Jack Donn

Adam Pauls

unread,

Aug 3, 2015, 12:37:55 PM8/3/15

to berkeleylm-discuss

Hi Jack,

You are right that the system doesn't support that use efficiently. The method you found is inefficient because it must loop over all words in the vocabulary, and that is unlikely to get much faster. Sorry!

Adam

--
You received this message because you are subscribed to the Google Groups "berkeleylm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to berkeleylm-disc...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jack Donn

unread,

Aug 3, 2015, 2:12:10 PM8/3/15

to berkeleylm-discuss

No problem, thank you for confirming my assumptions and saving me from further digging.

Jack

To unsubscribe from this group and stop receiving emails from it, send an email to berkeleylm-discuss+unsub...@googlegroups.com.

Reply all

Reply to author

Forward