I'm doing Topic Modelling in Gensim I successfully find the document_id and similarity_percentage.
Here is what I'm trying.
documents = ["Say to other what you feel",
"Speak truth from your heart and tell people",
"what this book say and tell about lying"]
texts = # remove common words and tokenize
dictionary = corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]
tfidf = models.TfidfModel(corpus)
corpus_tfidf = tfidf[corpus]
lsi = models.LsiModel(corpus_tfidf, id2word=dictionary, num_topics=2)
corpus_lsi = lsi[corpus_tfidf]
index = similarities.MatrixSimilarity(lsi[corpus])
doc = "Always tell people what in your heart"
vec_bow = dictionary.doc2bow(doc.lower().split())
vec_lsi = lsi[vec_bow]
sims = index[vec_lsi]Output
[(0, 0.74419993), (1, 0.99159265), (2, 0.35600105)]
| |
| |
| |
index similarity percentage
number
in
documents
arrayI want result something like below
I want this
[(myid_123, 0.74419993), (abc_1, 0.99159265), (id_3, 0.35600105)]
| |
| |
| |
string similarity percentage
id
in
documents
arrayI tried something like this but not working
documents = {"myid_123": "Say to other what you feel",
"abc_1": "Speak truth from your heart and tell people",
"id_3": "what this book say and tell about lying"}How can I specify my on ids to documents. Is it possible in Gensim. If yes how. Do you have any example or something.
There are some other thing for every document (for example likes, comments, data etc) which are saved in database. That's why I want to attach custom id to every document So late on I can find related stuff to this document.