Extractive summarization with rephrasing

55 views
Skip to first unread message

Jyothish Vidyadharan

unread,
Apr 29, 2020, 8:57:56 AM4/29/20
to Gensim
Hello,
   I am interested in doing extractive summarization. What I had read about extractive summarization is that it just extracts important sentences in text and reproduces them verbatim in the order they appear in the original text. I am aware that the gensim.summarization.summarizer.summarize function does this. But, I came across a different way of doing extractive summarization here : https://towardsdatascience.com/a-quick-introduction-to-text-summarization-in-machine-learning-3d27ccf18a9f .
Essentially, they have :

Source text: Joseph and Mary rode on a donkey to attend the annual event in Jerusalem. In the city, Mary gave birth to a child named Jesus.

Extractive summary: Joseph and Mary attend event Jerusalem. Mary birth Jesus.


Can this kind of extractive summarization be done with gensim? If yes, please tell me how this can be done.

Gordon Mohr

unread,
Apr 29, 2020, 2:30:29 PM4/29/20
to Gensim
Gensim's only summarization algorithm is the extractive approach in `gensim.summarization`, that selects a subset of full sentences. It's sufficiently disappointing in typical results & code flexibility that there's been discussion of removing or replacing it: https://github.com/RaRe-Technologies/gensim/issues/2592

- Gordon

Jyothish Vidyadharan

unread,
Apr 30, 2020, 2:32:28 AM4/30/20
to Gensim
Is there some other python package which can do the kind of extractive summarization I mentioned in my original post?

Gordon Mohr

unread,
Apr 30, 2020, 1:38:01 PM4/30/20
to Gensim
I on't know one offhand; I'd have to Google to find candidates, and as your original article didn't name the precise algorithm used by that example, it could be hard to find an exact match. 

It did look like that example used some part-of-speech (POS) determined word elision, and the 'SpaCy' library is very often used for Python POS labeling. So while I haven't heard of SpaCy itself offering extractive summarization, it does one of the necessary steps – so I'd view if there's any Python library doing that kind of summarization – & doing it well with a modern base – it likely uses SpaCy. So searching for SpaCy-driven extractive summarization would be something I'd try.

- Gordon
Reply all
Reply to author
Forward
0 new messages