texts = [[word for word in document.lower().split() if word not in stoplist] for document in documents]
and
stoplist = set('for a of the and to in'.split())
documents = ["Human machine interface for lab abc computer applications",
"A survey of user opinion of computer system response time",
"The EPS user interface management system",
"System and human system engineering testing of EPS",
"Relation of user perceived response time to error measurement",
"The generation of random binary unordered trees",
"The intersection graph of paths in trees",
"Graph minors IV Widths of trees and well quasi ordering",
"Graph minors A survey"]
--
You received this message because you are subscribed to the Google Groups "Gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
But I want to change just documents part,I want to read from file memory friendly cause the program is KILLED every time on terminal :Sdictionary = corpora.Dictionary(texts) it is correct but,texts = [[word for word in document.lower().split() if word not in stoplist] for document in documents]and
stoplist = set('for a of the and to in'.split())
documents = ["Human machine interface for lab abc computer applications", "A survey of user opinion of computer system response time", "The EPS user interface management system", "System and human system engineering testing of EPS", "Relation of user perceived response time to error measurement", "The generation of random binary unordered trees", "The intersection graph of paths in trees", "Graph minors IV Widths of trees and well quasi ordering", "Graph minors A survey"]
Gordon Mohr <>, 14 Şub 2019 Per, 21:03 tarihinde şunu yazdı:
Note this tutorial assumes you will be stepping through all its code in order, so that the variables from earlier steps are still available. About 8 blocks/paragraphs up from <https://radimrehurek.com/gensim/tut1.html#corpus-streaming-one-document-at-a-time>, there's a line which assigns to `dictionary`:--dictionary = corpora.Dictionary(texts)If you are adapting this code for other uses, you'll have to make sure there are similarly, appropriately-initialized variables available. (You might do this, in the same style of the tutorial, by preparing a `dictionary` variable before defining your iterable class. Or you might further enhance that class with an `__init__()` method that takes argument and does such preparation in the class, as in the `TxtSubdirsCorpus` example in this longer article about iterators & iterable objects: <https://rare-technologies.com/data-streaming-in-python-generators-iterators-iterables/>.- Gordon
On Thursday, February 14, 2019 at 8:58:50 AM UTC-8, Tansu Taşçıoğlu wrote:Hello,I am trying to apply LSA and LDA with Gensim for my own corpus.I followed the instructions in https://radimrehurek.com/gensim/tutorial.html with the title 'Corpus Streaming-One Document at a time' but I get an error:line 12, in __iter__
yield dictionary.doc2bow(line.lower().split())
NameError: name 'dictionary' is not definedI will apply LSA and LDA to Turkish Wikipedia for my master thesis.I applied with python read file methods to other text files because of that the file size is small I didnt get an memory error.However,wikipedia file is huge the program is killed :S Does anybody know that how I can solve this problem?Thank you..
You received this message because you are subscribed to the Google Groups "Gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+unsubscribe@googlegroups.com.