Expose TextRank as API - gensim.summarization.summarizer

31 views
Skip to first unread message

furte...@gmail.com

unread,
Jan 4, 2019, 10:04:07 AM1/4/19
to Gensim
Dear list,

I'm using gensim to create text summaries. I would like to extract the TextRank which gensim.summarization.summarizer calculates from the summarization method.
This would help to visualize important blocks from a text.

Unfortunately, this is not exposed as an API function. I only can access the most_important_docs via the following code:

corpus = gensim.summarization.summarizer._build_corpus(sentences)
most_important_docs = gensim.summarization.summarizer.summarize_corpus(corpus, ratio=1)

Most_important_docs contains then a list of lists of tuples which seem to identify words in the corpus, something like this:
<class 'list'>: [[(3, 1), (4, 1)], [(3, 1), (7, 1)], [(3, 1)], [(3, 1)], [(3, 1)], [(3, 1), (5, 1)], [(3, 1), (6, 1)], [(0, 1)], [(1, 1), (2, 1)], [(8, 1)]]
 I'm not able to make sense of this encoding and format the sentences again.

Would it be possible to expose this function of reconstructing the sentence with its TextRank as a function?
Or is there another possibility to determine this?

Thanks in advance!

Best regards,
Philip Gillißen

Radim Řehůřek

unread,
Feb 3, 2019, 11:51:20 AM2/3/19
to Gensim
Hi Philip,

unfortunately I'm not familiar with the summarizer subpackage (it's an external contribution), but if you manage to refactor the API to be cleaner / more flexible / modular, we'll welcome your PR!

Note that we expect PRs to be clearly motivated, so if you do, please include a motivating example from your use case or workflow.

Cheers,
Radim
Reply all
Reply to author
Forward
0 new messages