Maui on very short sentences -- poor results?

72 views
Skip to first unread message

Gautam Shine

unread,
Jun 2, 2016, 10:17:27 PM6/2/16
to Kea and Maui Support
Hi,

I'm trying to use Maui to extract keywords from news headlines. I used the standalone jar file and train/run commands from this tutorial: https://www.airpair.com/nlp/keyword-extraction-tutorial.

I assembled a training set of ~15k samples for this by scraping. Here's an example:

.txt file:
Denzel and Pauletta Washington to host fundraiser for African American museum

.key file:
Denzel Washington
Washington
museums

However, most .key/.txt pairs are qualitatively worse than this, perhaps containing just 1 keyword when a human would put down 3. The underlying tag set is of high quality and human annotated, but I had to whittle down the tags to ones that appear in the headline so the quality is reduced (all examples have at least one keyword in the .key file).

I tried to train Maui on these, but the model doesn't perform well. It usually outputs no results for similarly short inputs and when it does output something, it's often incorrect (e.g. selecting words like 'who').

Any thoughts on what to do? Training on standard datasets (e.g. SemEval 2010) results in a model that just doesn't output anything at all for short sentences. I thought Maui could work for headlines since it worked on Twitter for this ACL 2015 paper: http://www.cs.cmu.edu/~lingwang/papers/acl2015-3.pdf. If anything, that seems like a harder problem because of Twitter's informal English

Do I just not have enough training data? Or is the data quality perhaps too poor?

Regards,

Gautam Shine

Alyona

unread,
Jul 21, 2016, 3:04:45 PM7/21/16
to Kea and Maui Support
Maui doesn't work well on short sentences because of the features it uses.
Reply all
Reply to author
Forward
0 new messages