Groups
Conversations
All groups and messages
Send feedback to Google
Help
Sign in
Groups
Gensim
Conversations
About
Gensim
1–30 of 3731
Welcome to the mailing list of
Gensim, topic modelling for humans
. Please read the
FAQ
before asking. Supporting Gensim helps us support you:
https://github.com/
sponsors/piskvorky
Mark all as read
Report abusive group
0 selected
Danilo Tomasoni
,
Gordon Mohr
3
Aug 28
Hard limit on vocab size?
Glad it's sorted. If you *did* want to cap the number of words loaded, you can supply a `limit`
unread,
Hard limit on vocab size?
Glad it's sorted. If you *did* want to cap the number of words loaded, you can supply a `limit`
Aug 28
Jaden Rodriguez
,
Gordon Mohr
2
Aug 23
Fix Proposals and Troubles with Source
Without more details, unsure what specific source errors you're having. A general guide to
unread,
Fix Proposals and Troubles with Source
Without more details, unsure what specific source errors you're having. A general guide to
Aug 23
Felix Goldberg
,
Gordon Mohr
2
Aug 22
Noob question - how to train a doc2vec model using a built-in corpus?
The Gensim project source code (https://github.com/RaRe-Technologies/gensim/) contains in its `docs/
unread,
Noob question - how to train a doc2vec model using a built-in corpus?
The Gensim project source code (https://github.com/RaRe-Technologies/gensim/) contains in its `docs/
Aug 22
Jonathan Peters
Aug 1
Negative log_perplexity
Hello, I created an LDA model from control data and I am trying to calculate the perplexity of my
unread,
Negative log_perplexity
Hello, I created an LDA model from control data and I am trying to calculate the perplexity of my
Aug 1
Jeff Winchell
,
Gordon Mohr
2
Jul 19
Need tokenizer/preprocessor for popular pretrained embeddings models
I made a feature-request item in our issue-tracker for this – https://github.com/RaRe-Technologies/
unread,
Need tokenizer/preprocessor for popular pretrained embeddings models
I made a feature-request item in our issue-tracker for this – https://github.com/RaRe-Technologies/
Jul 19
pradeep t
,
Gordon Mohr
5
Jul 7
Add custom words to GoogleNews-vectors-negative300.bin pretrained model
Thank you so much for the updates On Fri, Jul 7, 2023 at 2:52 AM Gordon Mohr <goj...@gmail.com
unread,
Add custom words to GoogleNews-vectors-negative300.bin pretrained model
Thank you so much for the updates On Fri, Jul 7, 2023 at 2:52 AM Gordon Mohr <goj...@gmail.com
Jul 7
Danilo Tomasoni
,
Gordon Mohr
5
Jul 4
Load of FastText binary format with mmap='r'
thank you very much!! it works! Il giorno venerdì 30 giugno 2023 alle 19:59:27 UTC+2 Gordon Mohr ha
unread,
Load of FastText binary format with mmap='r'
thank you very much!! it works! Il giorno venerdì 30 giugno 2023 alle 19:59:27 UTC+2 Gordon Mohr ha
Jul 4
Thanos Tasakos
,
Gordon Mohr
5
Jun 30
Gensim KeyedVector load from s3
What a legend! I needed to also monkey-patch the numpyio module , to use smart_open instead of open,
unread,
Gensim KeyedVector load from s3
What a legend! I needed to also monkey-patch the numpyio module , to use smart_open instead of open,
Jun 30
pradeep t
,
Gordon Mohr
2
Jun 29
Pretrained model for doc2vec
I don't know of any I'd recommend, & that work with recent Gensim versions. (When I'
unread,
Pretrained model for doc2vec
I don't know of any I'd recommend, & that work with recent Gensim versions. (When I'
Jun 29
Laura
,
Gordon Mohr
2
Jun 27
Doc2vec with small corpus
That approach seems within the realm of reason - but ultimately whether it's better for your
unread,
Doc2vec with small corpus
That approach seems within the realm of reason - but ultimately whether it's better for your
Jun 27
Peter Mayhew
,
Gordon Mohr
11
Jun 13
Saving Wikidump corpus into Memory map
Note that even training the exact same corpus twice won't result in the *same* vectors.
unread,
Saving Wikidump corpus into Memory map
Note that even training the exact same corpus twice won't result in the *same* vectors.
Jun 13
jeff yang
,
Gordon Mohr
4
May 31
Is there anyway to adjust the weight of the node?
I'm not really sure why one would want to "reduce the density around a node". Do you
unread,
Is there anyway to adjust the weight of the node?
I'm not really sure why one would want to "reduce the density around a node". Do you
May 31
TRIXIA MAY BELGA
May 29
LDA topics for Clustering
My goal is to cluster the resulting LDA topics to reduce dimensionality. However I am not sure what
unread,
LDA topics for Clustering
My goal is to cluster the resulting LDA topics to reduce dimensionality. However I am not sure what
May 29
Yan Xu
,
Gordon Mohr
4
May 18
Add the similarity threshold to gensim.models.keyedvectors.KeyedVectors.most_similar
That's a good point, given the extra memory required to return the list-of-(word, score) tuples.
unread,
Add the similarity threshold to gensim.models.keyedvectors.KeyedVectors.most_similar
That's a good point, given the extra memory required to return the list-of-(word, score) tuples.
May 18
Fred R
,
Gordon Mohr
2
May 9
How to get context words in gensim word2vec models
Can you clarify with a bit more detail what you mean by "context words"? I ask because once
unread,
How to get context words in gensim word2vec models
Can you clarify with a bit more detail what you mean by "context words"? I ask because once
May 9
nicolas valderrama
,
Gordon Mohr
3
Apr 25
"Lazily" add documents to TfIdf
Oh we didn't knew this was possible. I'm glad I asked here before doing any change. Thanks a
unread,
"Lazily" add documents to TfIdf
Oh we didn't knew this was possible. I'm glad I asked here before doing any change. Thanks a
Apr 25
Gabriel L
, …
Gordon Mohr
12
Apr 20
Implementation of Correlated Topic Model
I can understand why you might prefer techniques that exist over those that are purely imaginary,
unread,
Implementation of Correlated Topic Model
I can understand why you might prefer techniques that exist over those that are purely imaginary,
Apr 20
Danilo Tomasoni
,
Gordon Mohr
16
Apr 12
Very different performances if streaming data or reading data from disk
On Wednesday, April 12, 2023 at 5:36:04 AM UTC-7 danilot.l...@gmail.com wrote: Performance in my
unread,
Very different performances if streaming data or reading data from disk
On Wednesday, April 12, 2023 at 5:36:04 AM UTC-7 danilot.l...@gmail.com wrote: Performance in my
Apr 12
Tedo Vrbanec
, …
Benedict Holland
8
Mar 20
Doc2Vec loss function
For me, dynamic stopping is what I am looking for. As for the reviewer, I am not sure. :) Dana
unread,
Doc2Vec loss function
For me, dynamic stopping is what I am looking for. As for the reviewer, I am not sure. :) Dana
Mar 20
Oliver Gordon
,
Gordon Mohr
2
Mar 15
GPL being violated
Note: Gensim is licensed under the "Lesser" GPL (aka "LGPL" https://www.gnu.org/
unread,
GPL being violated
Note: Gensim is licensed under the "Lesser" GPL (aka "LGPL" https://www.gnu.org/
Mar 15
Tedo Vrbanec
,
Gordon Mohr
2
Mar 9
GloVe native support in Gensim?
Other than the current ability to load GloVE vectors, GloVe-style training hasn't been planned (
unread,
GloVe native support in Gensim?
Other than the current ability to load GloVE vectors, GloVe-style training hasn't been planned (
Mar 9
日出間健本社総合企画部
Mar 3
Why does increasing the number of topics increase perplexity?
I'm trying to calculate the perplexity using the LDA model log_perplexity. The official
unread,
Why does increasing the number of topics increase perplexity?
I'm trying to calculate the perplexity using the LDA model log_perplexity. The official
Mar 3
santosh.b...@gmail.com
,
Gordon Mohr
2
Feb 22
citations for evaluate_word_analogies()
As far as I know, the analogy-solving evaluation was introduced with the original word2vec papers
unread,
citations for evaluate_word_analogies()
As far as I know, the analogy-solving evaluation was introduced with the original word2vec papers
Feb 22
Tedo Vrbanec
,
Gordon Mohr
4
Feb 21
ModuleNotFoundError: No module named 'ot'
I see. Thank you! Dana nedjelja, 19. veljače 2023. u 20:26:47 UTC+1 korisnik Gordon Mohr napisao je:
unread,
ModuleNotFoundError: No module named 'ot'
I see. Thank you! Dana nedjelja, 19. veljače 2023. u 20:26:47 UTC+1 korisnik Gordon Mohr napisao je:
Feb 21
Tedo Vrbanec
,
Gordon Mohr
3
Feb 21
Word Mover's Distance in Gensim (semi-normalized results?)
Thank you very much, Gordon! Dana nedjelja, 19. veljače 2023. u 20:58:25 UTC+1 korisnik Gordon Mohr
unread,
Word Mover's Distance in Gensim (semi-normalized results?)
Thank you very much, Gordon! Dana nedjelja, 19. veljače 2023. u 20:58:25 UTC+1 korisnik Gordon Mohr
Feb 21
Gabriela Zuniga
,
Gordon Mohr
2
Feb 13
word2vec api to keyedvectors
If you have any instance of `KeyedVectors`, you can save its full-word vectors out, into the same
unread,
word2vec api to keyedvectors
If you have any instance of `KeyedVectors`, you can save its full-word vectors out, into the same
Feb 13
Julien Petot
, …
Gordon Mohr
3
Feb 9
inverse operation of word2vec - vec2word
Gensim's `.most_similar()` method will accept arbitrary vectors as the 'origin' from
unread,
inverse operation of word2vec - vec2word
Gensim's `.most_similar()` method will accept arbitrary vectors as the 'origin' from
Feb 9
Olivier GRACIANNE
,
Gordon Mohr
6
Feb 8
Segmentation fault in a new function
Hello again, I finally managed to identify what caused my segfault and thought it might be
unread,
Segmentation fault in a new function
Hello again, I finally managed to identify what caused my segfault and thought it might be
Feb 8
Ronald Benz Zhang
,
Gordon Mohr
3
Feb 1
Which corpus is used as reference for NPMI calculation in Gensim?
Thank you for your reply Gordon! :) On Thursday, February 2, 2023 at 3:53:21 AM UTC+8 Gordon Mohr
unread,
Which corpus is used as reference for NPMI calculation in Gensim?
Thank you for your reply Gordon! :) On Thursday, February 2, 2023 at 3:53:21 AM UTC+8 Gordon Mohr
Feb 1
Ankit
,
Gordon Mohr
2
Jan 13
Gensim Doc2vec error:
Are you sure the word 'senseless' appears in your data at least `min_count=5` times? These
unread,
Gensim Doc2vec error:
Are you sure the word 'senseless' appears in your data at least `min_count=5` times? These
Jan 13