Hi Seva,
this model is pretty big, how long did you wait for it to finish loading? I also noticed that reading even a much smaller model can take some time.
Can you see memory being occupied, using a system monitor, while loading?
Best, Christoph
Hi Seva,
this model is pretty big, how long did you wait for it to finish loading? I also noticed that reading even a much smaller model can take some time.
Can you see memory being occupied, using a system monitor, while loading?
The link you provided (<https://github.com/ncbi-nlp/BioSentVec/wiki>) shows the pre-trained vectors being loaded as either vectors-only (via `KeyedVectors.load_word2vec_format()`) or as a `sent2vec` model (via `model = sent2vec.Sent2vecModel(); model.load_model('model.bin')`) – `sent2vec` being a separate package unrelated to `gensim`. There's no suggestion there that the files are loadable as a plain FastText model.As the file you're trying to load (as viewed at <https://ftp.ncbi.nlm.nih.gov/pub/lu/Suppl/BioSentVec/>) is 26GB on disk, I wouldn't expect any useful loading success, from any library, on a 16GB machine. (Even if it could succeed by using lots of virtual-memory, that could take a lot of time, and then be uselessly slow once loaded.)
Hi Seva,it looks like you're already using the latest version of Gensim, right?If that's the case, do you mind opening a reproducible (with full context) report on Github, https://github.com/RaRe-Technologies/gensim/issues?
We'll try to have a look why Gensim is behaving so differently compared to FB's fastText.
On Wednesday, April 24, 2019 at 9:22:24 PM UTC+2, Gordon Mohr wrote:The link you provided (<https://github.com/ncbi-nlp/BioSentVec/wiki>) shows the pre-trained vectors being loaded as either vectors-only (via `KeyedVectors.load_word2vec_format()`) or as a `sent2vec` model (via `model = sent2vec.Sent2vecModel(); model.load_model('model.bin')`) – `sent2vec` being a separate package unrelated to `gensim`.
There's no suggestion there that the files are loadable as a plain FastText model.
As the file you're trying to load (as viewed at <https://ftp.ncbi.nlm.nih.gov/pub/lu/Suppl/BioSentVec/>) is 26GB on disk, I wouldn't expect any useful loading success, from any library, on a 16GB machine. (Even if it could succeed by using lots of virtual-memory, that could take a lot of time, and then be uselessly slow once loaded.)
But this contradicts Seva's report of "When I load ithe model with fastText interface (from https://github.com/facebookresearch/fastText/tree/master/python) it takes aprox. 90 secs to load."(which I take at face value; I'm not familiar with this particular model or its format)
On Wednesday, April 24, 2019 at 9:22:24 PM UTC+2, Gordon Mohr wrote:The link you provided (<https://github.com/ncbi-nlp/BioSentVec/wiki>) shows the pre-trained vectors being loaded as either vectors-only (via `KeyedVectors.load_word2vec_format()`) or as a `sent2vec` model (via `model = sent2vec.Sent2vecModel(); model.load_model('model.bin')`) – `sent2vec` being a separate package unrelated to `gensim`. There's no suggestion there that the files are loadable as a plain FastText model.As the file you're trying to load (as viewed at <https://ftp.ncbi.nlm.nih.gov/pub/lu/Suppl/BioSentVec/>) is 26GB on disk, I wouldn't expect any useful loading success, from any library, on a 16GB machine. (Even if it could succeed by using lots of virtual-memory, that could take a lot of time, and then be uselessly slow once loaded.)But this contradicts Seva's report of "When I load ithe model with fastText interface (from https://github.com/facebookresearch/fastText/tree/master/python) it takes aprox. 90 secs to load."(which I take at face value; I'm not familiar with this particular model or its format)
Without evidence that load-by-Facebook's-Fasttext was successfully usable for something, that report is suspect.
(What kind of interface loads a model but then can't even return the vectors for individual words, as was also reported?) The attempt may have loaded garbage, or errored in a way that wasn't recognized.
import fastText as fasttext
ftModel = fasttext.load_model(rpath/to/model/+'BioWordVec_PubMed_MIMICIII_d200.bin')
Hi Gordon,On Thursday, April 25, 2019 at 9:57:40 PM UTC+2, Gordon Mohr wrote:Without evidence that load-by-Facebook's-Fasttext was successfully usable for something, that report is suspect.What kind of evidence would you prefer/need? Would it not be easier to just try for yourself and skip this back and forth?
(What kind of interface loads a model but then can't even return the vectors for individual words, as was also reported?) The attempt may have loaded garbage, or errored in a way that wasn't recognized.Where was this reported? Not by me.
Nice!
Bummer. Which users?
Sean Bethard
--
You received this message because you are subscribed to the Google Groups "Gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gensim/a287947f-b2b5-4847-802b-21ccb415fda1%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gensim/5e387e14-e6ce-385a-6782-27e352e0abb9%40seanbethard.info.
fasttext.load_model