Using the UKB Similarity module (Mac OS Terminal)

Yassine Karimi

unread,

Jun 27, 2016, 9:19:09 PM6/27/16

to ukblist

Hi All,

Im doing my master thesis at the CLTL (Piek Vossen’s group) and im currently trying to implement the UKB Similarity on a set of sentences.

From the UKBSim package Im interested in using the two programs similarity.pl and similarity.pre.pl.

The aim is to run these scripts on my dataset of which you can below see an example

Cost of cverage under obamacare to increase in 2015, Americans with health insurance bought under the affordable are

the price will rice 5 percent, About a quarter of counties with one or two insurers saw an increase in rates of more than 10 percent

Increases in insurance costs, there are notable increases

Premiums rising faster than eight years before, Health insurance premiums have risen more

My aim is to check all the sentences in the left column with all the sentences in the right column and get a score per pair checked on similarity.

So I have some specific questions on this matter:

- How can I run the similarity.pl script on this set sentence columns

- Is there a specified format for the input data?

- I am using the UKBSim in the Mac OS terminal/command line; what steps do i need to take to run the script on the dataset in the terminal with the data set (in the right format)

Thanks in advance,

Kind regards,

Yassine

Eneko Agirre

unread,

Jun 29, 2016, 10:19:28 AM6/29/16

to ukb...@googlegroups.com

Hi Yassine,

thanks for your interest!

UKB similarity scripts have been developed with word similarity in mind.

The input format is related the ukb_wsd and ukb_ppv scripts, see the documentation about the input format in ukb/src/README. There is some documentation in the header of the script itself, or run $ ./similarity.pl --help.

If you are doing sentence similarity, I suggest you do word similarity first, align words according to maximum pairwise similarity and then compute an overall similarity measure. Please refer to section 2.2 of this paper: http://aclweb.org/anthology/S/S15/S15-2032.pdf

In fact, instead of running ukb for each pair of words, you could do cosine of the word embedding which we provide in the ukb page (please refer to the corresponding papers):

Word Embeddings:

Embeddings for English WordNet 3.0 (plus gloss relations): text, binary [11]
Concatenated embedding for Text Corpora and English WordNet 3.0 (plus gloss relations): text [14]

I hope this was helpful

best

eneko

06/28/2016 03:19 AM(e)an, Yassine Karimi igorleak idatzi zuen:

--
You received this message because you are subscribed to the Google Groups "ukblist" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ukblist+u...@googlegroups.com.
To post to this group, send email to ukb...@googlegroups.com.
Visit this group at https://groups.google.com/group/ukblist.
For more options, visit https://groups.google.com/d/optout.

--

Eneko Agirre
Euskal Herriko Unibertsitatea
Universidad del Pais Vasco
University of the Basque Country
http://ixa2.si.ehu.eus/eneko

Yassine Karimi

unread,

Jun 29, 2016, 1:28:00 PM6/29/16

to ukb...@googlegroups.com

Dear mr Agirre,

I will follow the instructions and see if I can get it to work.

Thank you very much for your help.

Kind regards,

Yassine

You received this message because you are subscribed to a topic in the Google Groups "ukblist" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ukblist/G0U9G7nc1iM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ukblist+u...@googlegroups.com.

Reply all

Reply to author

Forward