Using the UKB Similarity module (Mac OS Terminal)

Yassine Karimi

Jun 27, 2016, 9:19:09 PM6/27/16
to ukblist
Hi All,

Im doing my master thesis at the CLTL (Piek Vossen’s group) and im currently trying to implement the UKB Similarity on a set of sentences.
From the UKBSim package Im interested in using the two programs  and

The aim is to run these scripts on my dataset of which you can below see an example

Cost of cverage under obamacare to increase in 2015,     Americans with health insurance bought under the affordable are 
the price will rice 5 percent,                           About a quarter of counties with one or two insurers saw an increase in rates of more than 10 percent
Increases in insurance costs,                            there are notable increases 
Premiums rising faster than eight years before,          Health insurance premiums have risen more      

My aim is to check all the sentences in the left column with all the sentences in the right column and get a score per pair checked on similarity. 
So I have some specific questions on this matter:
     - How can I run the script on this set sentence columns
     - Is there a specified format for the input data?
     - I am using the UKBSim in the Mac OS terminal/command line; what steps do i need to take to run the script on the dataset in the terminal with the    data set (in the right format)

Thanks in advance,

Kind regards,

Eneko Agirre

Jun 29, 2016, 10:19:28 AM6/29/16

Hi Yassine,

thanks for your interest!

UKB similarity scripts have been developed with word similarity in mind.

The input format is related the ukb_wsd and ukb_ppv scripts, see the documentation about the input format in ukb/src/README. There is some documentation in the header of the script itself, or run $ ./ --help.

If you are doing sentence similarity, I suggest you do word similarity first, align words according to maximum pairwise similarity and then compute an overall similarity measure. Please refer to section 2.2 of this paper:

In fact, instead of running ukb for each pair of words, you could do cosine of the word embedding which we provide in the ukb page (please refer to the corresponding papers):

Word Embeddings:
  • Embeddings for English WordNet 3.0 (plus gloss relations): text, binary [11]
  • Concatenated embedding for Text Corpora and English WordNet 3.0 (plus gloss relations): text [14]
I hope this was helpful



06/28/2016 03:19 AM(e)an, Yassine Karimi igorleak idatzi zuen:
Eneko Agirre
Euskal Herriko Unibertsitatea
Universidad del Pais Vasco
University of the Basque Country

Yassine Karimi

Jun 29, 2016, 1:28:00 PM6/29/16
Dear mr Agirre,

I will follow the instructions and see if I can get it to work. 

Thank you very much for your help.

Kind regards,

