--
You received this message because you are subscribed to the Google Groups "berkeleylm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to berkeleylm-disc...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
This happens with out of vocabulary words. You need to include an <UNK> token in your data to handle this properly.
On Tue, Jul 15, 2014 at 8:33 AM, Roman Prokofyev <roman.p...@gmail.com> wrote:
Hello,I'm trying to compute log probability on a test collection and notices that around 15 sentences out if 57k (Brown corpus) yield log probability of -Infinity.So I'm wondering why's that? Isn't stupid backoff model is smoothed as well? Or only Knesser-Ney?Thanks.
--
You received this message because you are subscribed to the Google Groups "berkeleylm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to berkeleylm-discuss+unsub...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to berkeleylm-disc...@googlegroups.com.
Why would i ever be that large?
On Wed, Jul 16, 2014 at 2:02 AM, Roman Prokofyev <roman.p...@gmail.com> wrote:
I just did more debugging and figure out that there were no OOV words.It turned out that the problem is in StupidBackoffLm.java, line 72:pow(alpha, i - startPos)alpha is set to 0.4 and if i>120 or so, then pow is 0, which results in logarithm being an infinity.Probably it is better to expand the log calculation to ensure this situation never happens:log(x/y*pow(a,b)) = log(x/y) + log(pow(a,b)) = log(x/y) + b*log(a),this way it will be i*log(0.4), if i = 140, it will be just -128.I could create a patch, but don't know how it's done on Google Code.
On Wednesday, July 16, 2014 3:51:12 AM UTC+2, Adam Pauls wrote:
This happens with out of vocabulary words. You need to include an <UNK> token in your data to handle this properly.
On Tue, Jul 15, 2014 at 8:33 AM, Roman Prokofyev <roman.p...@gmail.com> wrote:
Hello,I'm trying to compute log probability on a test collection and notices that around 15 sentences out if 57k (Brown corpus) yield log probability of -Infinity.So I'm wondering why's that? Isn't stupid backoff model is smoothed as well? Or only Knesser-Ney?Thanks.
--
You received this message because you are subscribed to the Google Groups "berkeleylm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to berkeleylm-discuss+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and stop receiving emails from it, send an email to berkeleylm-disc...@googlegroups.com.
if(probContextOrder>=map.getMaxNgramOrder()-1){//Jesus addition to avoid errors for trying to lookup 6grams
System.out.println("Max ngramorder:"+map.getMaxNgramOrder());
System.out.println("I broke!");
break;
To unsubscribe from this group and stop receiving emails from it, send an email to berkeleylm-disc...@googlegroups.com.
Thanks, looking forward to the fix!