Hello,
For the past couple of weeks I have been reading through your library and trying out a few things, thank you very much for sharing it.
My intention is to use your library to create a predictive text program i.e. predict the most probable next word when given an unfinished sentence. Sadly I have been unable to find an efficient method to do this. After becoming familiar with your library and reading much of the documentation along with the papers, I have come to the conclusion that this may not be possible and just wanted to confirm that here (in case I have missed something).
From what I understand the data structure you have designed to store the language models is designed to work efficiently in the opposite way, i.e. starting with the last word in the N-gram and then finding its preceding words. This is extremely efficient for your intended use and I can see why you have designed it in this way, however I fear this means I will not be able to make use of your library to create a predictive text program.
I have found one way of making such predictions, by using the method getDistributionOverNextWords(NgramLanguageModel<W> lm, List<W> context) from the class edu.berkeley.nlp.lm.NgramLanguageModel.StaticMethods. I have tested this method with the English Google Books binaries you kindly provided, however using this method on my machine (3.2 GHz Intel Core i5, 32 GB RAM) still takes an average of 17 seconds to find the most probable next word in a sentence. As mentioned in the JavaDoc comments, some of the methods in this file (NgramLanguageModel.java) are inefficient convenience methods, with more efficient equivalents to be found in the implementing classes. However, despite many hours of searching, I have not managed to find a more efficient method to return the next probable word in a sentence. I imagine one does not exist as this is not the intended use for your library, but as a last resort I thought it would be worth querying here before giving up completely.
Thank you very much for your time, I look forward to your response, please let me know if you require any more information from me or code samples.
Jack Donn