Groups
Groups
Sign in
Groups
Groups
berkeleylm-discuss
Conversations
About
Send feedback
Help
berkeleylm-discuss
1–30 of 52
Mark all as read
Report group
0 selected
miluette
, …
Gena Kukartsev
3
3/7/17
NaN-probabilities
sentence boundary symbols are not included in the vocabulary, I am not sure why, and that causes NaNs
unread,
NaN-probabilities
sentence boundary symbols are not included in the vocabulary, I am not sure why, and that causes NaNs
3/7/17
Roman Prokofyev
, …
Gena Kukartsev
7
3/6/17
NaN probability
These NaNs are definitely because of OOV (out-of-vocabulary) words: beginning-of-sentence and end-of-
unread,
NaN probability
These NaNs are definitely because of OOV (out-of-vocabulary) words: beginning-of-sentence and end-of-
3/6/17
李若冰
, …
jeremy
8
11/19/16
google ngram test: LogProbability is NaN
how can you solve this problem?? 在 2015年4月27日星期一 UTC-4上午3:07:05,李若冰写道: I've met a problem that, I
unread,
google ngram test: LogProbability is NaN
how can you solve this problem?? 在 2015年4月27日星期一 UTC-4上午3:07:05,李若冰写道: I've met a problem that, I
11/19/16
cao quy vũ
2/22/16
Can't make arpa file
I want to make a Ngram model but get this error . Any suggestion ?
unread,
Can't make arpa file
I want to make a Ngram model but get this error . Any suggestion ?
2/22/16
Jack Donn
,
Adam Pauls
3
8/3/15
Predictive Text?
No problem, thank you for confirming my assumptions and saving me from further digging. Jack On
unread,
Predictive Text?
No problem, thank you for confirming my assumptions and saving me from further digging. Jack On
8/3/15
Nicolas Hernandez
,
Adam Pauls
3
6/3/15
maven repository update
Dear Adam Could you ask your (old?) collegues about that please ? There is also a version of the
unread,
maven repository update
Dear Adam Could you ask your (old?) collegues about that please ? There is also a version of the
6/3/15
YikJiun Lee
4/29/15
How to use CompressedNgramMap?
Hi I was trying to load a 3.8 billion tokens, 12GB LM in ARPA format using readArrayEncodedLmFromArpa
unread,
How to use CompressedNgramMap?
Hi I was trying to load a 3.8 billion tokens, 12GB LM in ARPA format using readArrayEncodedLmFromArpa
4/29/15
Matthew Hatem
,
Adam Pauls
2
4/4/15
Plan for when Google Code shuts down?
Yes, I will migrate to github at some point. On Fri, Apr 3, 2015 at 7:32 AM, Matthew Hatem <mhatem
unread,
Plan for when Google Code shuts down?
Yes, I will migrate to github at some point. On Fri, Apr 3, 2015 at 7:32 AM, Matthew Hatem <mhatem
4/4/15
Sidharth Kamboj
,
Adam Pauls
2
3/25/15
Character level n-grams and arpa file generation
Yes, you would need to preprocess, nothing special is implemented. On Wed, Mar 25, 2015 at 1:22 PM,
unread,
Character level n-grams and arpa file generation
Yes, you would need to preprocess, nothing special is implemented. On Wed, Mar 25, 2015 at 1:22 PM,
3/25/15
한정수
,
Adam Pauls
2
3/23/15
Arrayindexoutofbound exception.
Can you show me the exact command you ran, along with the full exception? On Sun, Mar 22, 2015 at 11:
unread,
Arrayindexoutofbound exception.
Can you show me the exact command you ran, along with the full exception? On Sun, Mar 22, 2015 at 11:
3/23/15
한정수
,
Adam Pauls
3
3/19/15
vocab_cs.gz
Thanks a lot!!! On Wednesday, March 18, 2015 at 5:20:18 PM UTC+9, 한정수 wrote: there is vocab_cs.gz in
unread,
vocab_cs.gz
Thanks a lot!!! On Wednesday, March 18, 2015 at 5:20:18 PM UTC+9, 한정수 wrote: there is vocab_cs.gz in
3/19/15
Adam Pauls
3/14/15
Re: Questions....
1. Yes, the backoff and smoothing used by these methods are different. I recommend reading http://www
unread,
Re: Questions....
1. Yes, the backoff and smoothing used by these methods are different. I recommend reading http://www
3/14/15
qhdj...@gmail.com
,
Adam Pauls
2
3/1/15
How much memory is needed to load the Google web 1T model?
It should basically be 10GB. I'm sure the JVM has some extra overhead, but probably on he order
unread,
How much memory is needed to load the Google web 1T model?
It should basically be 10GB. I'm sure the JVM has some extra overhead, but probably on he order
3/1/15
Matt Post
2/20/15
gzip checking
Hi, I made (and tested) a small change so that BerkeleyLM is smarter about gzipped files — instead of
unread,
gzip checking
Hi, I made (and tested) a small change so that BerkeleyLM is smarter about gzipped files — instead of
2/20/15
iesus.c...@gmail.com
,
Adam Pauls
8
1/28/15
-Infinity logprobs and unnormalized probs - KneserNey GoogleNgram
Thanks again for your time and help. I've been trying to debug myself. And well, the issue
unread,
-Infinity logprobs and unnormalized probs - KneserNey GoogleNgram
Thanks again for your time and help. I've been trying to debug myself. And well, the issue
1/28/15
fanc...@gmail.com
1/22/15
how to compute the probability of a sentence?
I am new to language model. I want to compute the probability of a sentence using language model.
unread,
how to compute the probability of a sentence?
I am new to language model. I want to compute the probability of a sentence using language model.
1/22/15
Kunal Singhal
,
Adam Pauls
2
12/30/14
Encoding start of a sentence
lm.getWordIndexer().getStartSymbol(). On Tuesday, December 30, 2014 6:05:34 AM UTC-8, Kunal Singhal
unread,
Encoding start of a sentence
lm.getWordIndexer().getStartSymbol(). On Tuesday, December 30, 2014 6:05:34 AM UTC-8, Kunal Singhal
12/30/14
ivan masli
,
Adam Pauls
2
12/6/14
Generate BerkeleyLM Jar
I added a note on the front page of Google code. Basically, you have build the "export"
unread,
Generate BerkeleyLM Jar
I added a note on the front page of Google code. Basically, you have build the "export"
12/6/14
Munazza Jannisar Khan
10/15/14
How to generate N-Grams using Regular Expressions in java and storing them in Csv file?
I have a class named WordListBuilder that reads input from a text file in any language. Output is to
unread,
How to generate N-Grams using Regular Expressions in java and storing them in Csv file?
I have a class named WordListBuilder that reads input from a text file in any language. Output is to
10/15/14
Joseph Turian
, …
Adam Pauls
14
9/19/14
Sample code for Google N-gram with stupid backoff?
Yes, this is fixed at head in SVN. On Wed, Sep 17, 2014 at 7:27 AM, Oren Melamud <oren.melamud@
unread,
Sample code for Google N-gram with stupid backoff?
Yes, this is fixed at head in SVN. On Wed, Sep 17, 2014 at 7:27 AM, Oren Melamud <oren.melamud@
9/19/14
iesus.c...@gmail.com
,
Adam Pauls
12
9/16/14
Google N-gram language model of order 4
Hi, So, just to let you know, I was able to get the language model after all! :D These were the
unread,
Google N-gram language model of order 4
Hi, So, just to let you know, I was able to get the language model after all! :D These were the
9/16/14
Tapan Sharma
,
Adam Pauls
3
9/10/14
How to use this library
I want to use it for ngram frequency calculation. On Wed, Sep 10, 2014 at 9:08 AM, Adam Pauls <
unread,
How to use this library
I want to use it for ngram frequency calculation. On Wed, Sep 10, 2014 at 9:08 AM, Adam Pauls <
9/10/14
Roman Prokofyev
, …
Adam Pauls
12
9/7/14
-Infinity log probability
Please update from the latest SVN and see if your problems are fixed. On Sat, Jul 19, 2014 at 2:55 PM
unread,
-Infinity log probability
Please update from the latest SVN and see if your problems are fixed. On Sat, Jul 19, 2014 at 2:55 PM
9/7/14
Dina Kayumova
, …
Adam Pauls
17
9/7/14
Sample code for Google N-gram with Kneser-Ney language model
Sorry, getting back to debugging. I've lost context here. Can you give me a dataset and command
unread,
Sample code for Google N-gram with Kneser-Ney language model
Sorry, getting back to debugging. I've lost context here. Can you give me a dataset and command
9/7/14
Mohammad Sadegh Rasooli
,
Adam Pauls
2
7/25/14
How to train a new Ngram Model from raw text
https://code.google.com/p/berkeleylm/source/browse/trunk/examples/make-kneserney-arpa-from-raw-text.
unread,
How to train a new Ngram Model from raw text
https://code.google.com/p/berkeleylm/source/browse/trunk/examples/make-kneserney-arpa-from-raw-text.
7/25/14
Oren Melamud
,
Adam Pauls
4
7/21/14
memory consumption
Ok. Thanks for the quick reply. On Monday, July 21, 2014 5:55:20 PM UTC+3, Adam Pauls wrote:
unread,
memory consumption
Ok. Thanks for the quick reply. On Monday, July 21, 2014 5:55:20 PM UTC+3, Adam Pauls wrote:
7/21/14
Adam Pauls
,
Roman Prokofyev
2
7/16/14
Re: Exceptions trying to create model from Google Books N-gram
No, 40 was actually a count separated by tab, by I figured out this, there were again out of
unread,
Re: Exceptions trying to create model from Google Books N-gram
No, 40 was actually a count separated by tab, by I figured out this, there were again out of
7/16/14
pz
, …
Roman Prokofyev
15
7/14/14
Google books ngrams
Ok, I think I got it, the library seems to be hard-coded to build 3gram models, meaning that I need
unread,
Google books ngrams
Ok, I think I got it, the library seems to be hard-coded to build 3gram models, meaning that I need
7/14/14
Andrew Nystrom
,
Adam Pauls
5
6/30/14
Stopwords
Including quotes, semicolons, dashes, etc (not just sentence markers). On Monday, 30 June 2014 14:47:
unread,
Stopwords
Including quotes, semicolons, dashes, etc (not just sentence markers). On Monday, 30 June 2014 14:47:
6/30/14
Andrew Nystrom
,
Adam Pauls
3
6/27/14
Comparing Scores Across Models
Thanks, Adam! I appreciate your quick responses! On Thursday, 26 June 2014 10:06:26 UTC-5, Adam Pauls
unread,
Comparing Scores Across Models
Thanks, Adam! I appreciate your quick responses! On Thursday, 26 June 2014 10:06:26 UTC-5, Adam Pauls
6/27/14