KEA & MAUI Keyphrase Selection

100 views
Skip to first unread message

iamge...@gmail.com

unread,
Aug 12, 2016, 7:35:12 AM8/12/16
to Kea and Maui Support
Hi Alyona Medelyan,

I have been using KEA to extract keyphrases from a bunch of documents, so far it has been working fine. But recently I included the phrase "one time" in my vocabulary, the phrase is unique to certain documents and thus will help me identify later when I go through the ".key" files to make an inference what kind of a document it was.

But the issue here is there are few documents which do not have the phrase "one time" instead they have phrases such as "one hundred days" or "times" or "timely". KEA here is picking up these phrases also and thus in the ".key" file of these documents, KEA seems to include the phrase "one time" as the keyphrase, thus my idea of identifying the documents using the ".key" file fails as documents without the phrase "one time" also have it included in their ".key" file.

As I said to me, the phrase "one time" is an unique one. So my question is: Can I include the phrase in the vocabulary in such a manner that it retains both the words as a single entity and picks up from documents only where it is present as "one time" and not as "one hundred days" or "times" or "timely".

I did read about this issue on your MAUI blog, hence I downloaded MAUI and set it up and had it run the same test on the same dataset with the same requirement. But unfortunately I got the same result.

I request you to help me out with this.

Thank you,
Kushal B Kusram

Alyona

unread,
Aug 12, 2016, 8:08:00 AM8/12/16
to Kea and Maui Support
Hi Kushal,

As a quick fix, did you try removing 'one' and 'time' from the stopwords list?

Cheers,
Alyona
Message has been deleted
Message has been deleted
Message has been deleted

kushal...@outlook.com

unread,
Aug 16, 2016, 12:53:54 AM8/16/16
to Kea and Maui Support
Hi Alyona,

I tried removing 'one' from the stopwords list as you said (Note: 'time' is not present in the stopwords list.) but after 'one' was removed from the stopwords list, "one time" stopped appearing in any of the ".key" files.

Kushal B Kusram

Alyona

unread,
Aug 16, 2016, 3:12:33 PM8/16/16
to Kea and Maui Support
Did you try the NoStemmer setting?
Or, maybe I misunderstood you: Do you mean that you want to use a specific keyword that you purposely put into some documents, but because the keyword you use is quite common Kea picks it up from more documents than needed? This doesn't sound like the intended use for Kea. I think you are better off using something more specific, e.g. "onetimedocument" and then simply check if the text contains that specific keyword. 
Sorry I'm a bit confused about what's happening here.

kushal...@outlook.com

unread,
Aug 17, 2016, 2:22:27 AM8/17/16
to Kea and Maui Support
I shall probably explain the entire problem from a top level approach, this might help you overcome your confusion:
I have a set of legal documents wherein there are two categories: Fixed legal documents and Ongoing legal documents.
The only way I can identify a document is "Fixed" is by looking for the phrase "one time" and a few other phrases along with this. But certain documents have this phrase "one time" as the only distinguishing phrase. Meanwhile, the other category "Ongoing legal documents" have another set of phrases which KEA is able to identify. What I want to achieve is, use KEA to pick up keyphrases for me and then read the ".key" file to come to a conclusion if the document is an ongoing one or a fixed one. But currently, "one time" appears in all the ".key" files because these legal documents have phrases such as "one hundred days" or "time" or "timely". If I were to remove the word "one" from "one hundred days" or "time", I started to get the results I wanted, as KEA was selecting the phrase "one time" only where "one" appeared, hence I came to conclusion that it is selecting phrases from "one hundred days" "time" and "timely".

Alyona

unread,
Aug 18, 2016, 3:01:17 PM8/18/16
to Kea and Maui Support
As I mentioned before, this doesn't sound like the intended use for Kea. If you are after checking the exact match of 'one time', why not just do a simple check?
Removing 'one' and 'time' from StopwordsEnglish and choosing 'NoStemmer' can be an alternative solution, but I would still not recommend it.

kushal...@outlook.com

unread,
Aug 21, 2016, 3:40:17 AM8/21/16
to Kea and Maui Support
Well this was just one use case scenario. There are other scenarios where Kea is used as per what it was designed for and is doing really good, wanted to know if I could use the same framework of Kea to perform this and I guess that's not possible. Hence, I have come up with an alternative solution for this.

Thank you,

Kushal B Kusram

Reply all
Reply to author
Forward
0 new messages