Groups

berkeleylm-discuss

1–30 of 52

0 selected

miluette, … Gena Kukartsev3

3/7/17

NaN-probabilities

sentence boundary symbols are not included in the vocabulary, I am not sure why, and that causes NaNs

unread,

NaN-probabilities

sentence boundary symbols are not included in the vocabulary, I am not sure why, and that causes NaNs

3/7/17

Roman Prokofyev, … Gena Kukartsev7

3/6/17

NaN probability

These NaNs are definitely because of OOV (out-of-vocabulary) words: beginning-of-sentence and end-of-

unread,

NaN probability

These NaNs are definitely because of OOV (out-of-vocabulary) words: beginning-of-sentence and end-of-

3/6/17

李若冰, … jeremy8

11/19/16

google ngram test: LogProbability is NaN

how can you solve this problem?? 在 2015年4月27日星期一 UTC-4上午3:07:05，李若冰写道： I've met a problem that, I

unread,

google ngram test: LogProbability is NaN

how can you solve this problem?? 在 2015年4月27日星期一 UTC-4上午3:07:05，李若冰写道： I've met a problem that, I

11/19/16

2/22/16

Can't make arpa file

I want to make a Ngram model but get this error . Any suggestion ?

unread,

Can't make arpa file

I want to make a Ngram model but get this error . Any suggestion ?

2/22/16

Jack Donn, Adam Pauls3

8/3/15

Predictive Text?

No problem, thank you for confirming my assumptions and saving me from further digging. Jack On

unread,

Predictive Text?

No problem, thank you for confirming my assumptions and saving me from further digging. Jack On

8/3/15

Nicolas Hernandez, Adam Pauls3

6/3/15

maven repository update

Dear Adam Could you ask your (old?) collegues about that please ? There is also a version of the

unread,

maven repository update

Dear Adam Could you ask your (old?) collegues about that please ? There is also a version of the

6/3/15

4/29/15

How to use CompressedNgramMap?

Hi I was trying to load a 3.8 billion tokens, 12GB LM in ARPA format using readArrayEncodedLmFromArpa

unread,

How to use CompressedNgramMap?

Hi I was trying to load a 3.8 billion tokens, 12GB LM in ARPA format using readArrayEncodedLmFromArpa

4/29/15

Matthew Hatem, Adam Pauls2

4/4/15

Plan for when Google Code shuts down?

Yes, I will migrate to github at some point. On Fri, Apr 3, 2015 at 7:32 AM, Matthew Hatem <mhatem

unread,

Plan for when Google Code shuts down?

Yes, I will migrate to github at some point. On Fri, Apr 3, 2015 at 7:32 AM, Matthew Hatem <mhatem

4/4/15

Sidharth Kamboj, Adam Pauls2

3/25/15

Character level n-grams and arpa file generation

Yes, you would need to preprocess, nothing special is implemented. On Wed, Mar 25, 2015 at 1:22 PM,

unread,

Character level n-grams and arpa file generation

Yes, you would need to preprocess, nothing special is implemented. On Wed, Mar 25, 2015 at 1:22 PM,

3/25/15

한정수, Adam Pauls2

3/23/15

Arrayindexoutofbound exception.

Can you show me the exact command you ran, along with the full exception? On Sun, Mar 22, 2015 at 11:

unread,

Arrayindexoutofbound exception.

Can you show me the exact command you ran, along with the full exception? On Sun, Mar 22, 2015 at 11:

3/23/15

한정수, Adam Pauls3

3/19/15

Thanks a lot!!! On Wednesday, March 18, 2015 at 5:20:18 PM UTC+9, 한정수 wrote: there is vocab_cs.gz in

unread,

Thanks a lot!!! On Wednesday, March 18, 2015 at 5:20:18 PM UTC+9, 한정수 wrote: there is vocab_cs.gz in

3/19/15

3/14/15

Re: Questions....

1. Yes, the backoff and smoothing used by these methods are different. I recommend reading http://www

unread,

Re: Questions....

1. Yes, the backoff and smoothing used by these methods are different. I recommend reading http://www

3/14/15

qhdj...@gmail.com, Adam Pauls2

3/1/15

How much memory is needed to load the Google web 1T model?

It should basically be 10GB. I'm sure the JVM has some extra overhead, but probably on he order

unread,

How much memory is needed to load the Google web 1T model?

It should basically be 10GB. I'm sure the JVM has some extra overhead, but probably on he order

3/1/15

2/20/15

Hi, I made (and tested) a small change so that BerkeleyLM is smarter about gzipped files — instead of

unread,

Hi, I made (and tested) a small change so that BerkeleyLM is smarter about gzipped files — instead of

2/20/15

iesus.c...@gmail.com, Adam Pauls8

1/28/15

-Infinity logprobs and unnormalized probs - KneserNey GoogleNgram

Thanks again for your time and help. I've been trying to debug myself. And well, the issue

unread,

-Infinity logprobs and unnormalized probs - KneserNey GoogleNgram

Thanks again for your time and help. I've been trying to debug myself. And well, the issue

1/28/15

fanc...@gmail.com

1/22/15

how to compute the probability of a sentence?

I am new to language model. I want to compute the probability of a sentence using language model.

unread,

how to compute the probability of a sentence?

I am new to language model. I want to compute the probability of a sentence using language model.

1/22/15

Kunal Singhal, Adam Pauls2

12/30/14

Encoding start of a sentence

lm.getWordIndexer().getStartSymbol(). On Tuesday, December 30, 2014 6:05:34 AM UTC-8, Kunal Singhal

unread,

Encoding start of a sentence

lm.getWordIndexer().getStartSymbol(). On Tuesday, December 30, 2014 6:05:34 AM UTC-8, Kunal Singhal

12/30/14

ivan masli, Adam Pauls2

12/6/14

Generate BerkeleyLM Jar

I added a note on the front page of Google code. Basically, you have build the "export"

unread,

Generate BerkeleyLM Jar

I added a note on the front page of Google code. Basically, you have build the "export"

12/6/14

Munazza Jannisar Khan

10/15/14

How to generate N-Grams using Regular Expressions in java and storing them in Csv file?

I have a class named WordListBuilder that reads input from a text file in any language. Output is to

unread,

How to generate N-Grams using Regular Expressions in java and storing them in Csv file?

I have a class named WordListBuilder that reads input from a text file in any language. Output is to

10/15/14

Joseph Turian, … Adam Pauls14

9/19/14

Sample code for Google N-gram with stupid backoff?

Yes, this is fixed at head in SVN. On Wed, Sep 17, 2014 at 7:27 AM, Oren Melamud <oren.melamud@

unread,

Sample code for Google N-gram with stupid backoff?

Yes, this is fixed at head in SVN. On Wed, Sep 17, 2014 at 7:27 AM, Oren Melamud <oren.melamud@

9/19/14

iesus.c...@gmail.com, Adam Pauls12

9/16/14

Google N-gram language model of order 4

Hi, So, just to let you know, I was able to get the language model after all! :D These were the

unread,

Google N-gram language model of order 4

Hi, So, just to let you know, I was able to get the language model after all! :D These were the

9/16/14

Tapan Sharma, Adam Pauls3

9/10/14

How to use this library

I want to use it for ngram frequency calculation. On Wed, Sep 10, 2014 at 9:08 AM, Adam Pauls <

unread,

How to use this library

I want to use it for ngram frequency calculation. On Wed, Sep 10, 2014 at 9:08 AM, Adam Pauls <

9/10/14

Roman Prokofyev, … Adam Pauls12

9/7/14

-Infinity log probability

Please update from the latest SVN and see if your problems are fixed. On Sat, Jul 19, 2014 at 2:55 PM

unread,

-Infinity log probability

Please update from the latest SVN and see if your problems are fixed. On Sat, Jul 19, 2014 at 2:55 PM

9/7/14

Dina Kayumova, … Adam Pauls17

9/7/14

Sample code for Google N-gram with Kneser-Ney language model

Sorry, getting back to debugging. I've lost context here. Can you give me a dataset and command

unread,

Sample code for Google N-gram with Kneser-Ney language model

Sorry, getting back to debugging. I've lost context here. Can you give me a dataset and command

9/7/14

Mohammad Sadegh Rasooli, Adam Pauls2

7/25/14

How to train a new Ngram Model from raw text

https://code.google.com/p/berkeleylm/source/browse/trunk/examples/make-kneserney-arpa-from-raw-text.

unread,

How to train a new Ngram Model from raw text

https://code.google.com/p/berkeleylm/source/browse/trunk/examples/make-kneserney-arpa-from-raw-text.

7/25/14

Oren Melamud, Adam Pauls4

7/21/14

memory consumption

Ok. Thanks for the quick reply. On Monday, July 21, 2014 5:55:20 PM UTC+3, Adam Pauls wrote:

unread,

memory consumption

Ok. Thanks for the quick reply. On Monday, July 21, 2014 5:55:20 PM UTC+3, Adam Pauls wrote:

7/21/14

Adam Pauls, Roman Prokofyev2

7/16/14

Re: Exceptions trying to create model from Google Books N-gram

No, 40 was actually a count separated by tab, by I figured out this, there were again out of

unread,

Re: Exceptions trying to create model from Google Books N-gram

No, 40 was actually a count separated by tab, by I figured out this, there were again out of

7/16/14

pz, … Roman Prokofyev15

7/14/14

Google books ngrams

Ok, I think I got it, the library seems to be hard-coded to build 3gram models, meaning that I need

unread,

Google books ngrams

Ok, I think I got it, the library seems to be hard-coded to build 3gram models, meaning that I need

7/14/14

Andrew Nystrom, Adam Pauls5

6/30/14

Including quotes, semicolons, dashes, etc (not just sentence markers). On Monday, 30 June 2014 14:47:

unread,

Including quotes, semicolons, dashes, etc (not just sentence markers). On Monday, 30 June 2014 14:47:

6/30/14

Andrew Nystrom, Adam Pauls3

6/27/14

Comparing Scores Across Models

Thanks, Adam! I appreciate your quick responses! On Thursday, 26 June 2014 10:06:26 UTC-5, Adam Pauls

unread,

Comparing Scores Across Models

Thanks, Adam! I appreciate your quick responses! On Thursday, 26 June 2014 10:06:26 UTC-5, Adam Pauls

6/27/14

Search

Clear search

Close search

Google apps

Main menu