Semantic similarity / relatedness of two single words

45 views
Skip to first unread message

Sam Heather

unread,
Mar 18, 2015, 12:50:48 PM3/18/15
to dand...@spaziodati.eu
Hi,

I'm trying to identify the semantic similarity of two single words using the SIM API, however am always been returned a similarity value of 0.

The two words I am trying are: 'long' and 'length', which obviously have some similarity, however this always returns 0% for both measures.  You can see my query here.

Please can someone tell me what I am doing wrong here? Is there a better / correct way to make this query, or does Dandelion just not support this at-all?

Thanks,

Sam

Ugo Scaiella

unread,
Mar 19, 2015, 1:02:13 PM3/19/15
to dand...@spaziodati.eu
Hi Sam,

I'm sorry, but the SIM API is meant for a different use case, where inputs are short text fragments where usually you have one or more entities mentioned in it (Eg. tweets, short blog posts, product reviews, headline news, title of a web page, etc...)

However, we have already built several language models that we use internally and that could solve your use case, but we haven't yet created any API for it. We will evaluate this possibility in the near future, but if you want a quote for having it shortly, don't hesitate to contact us.
Any feedback is always really appreciated!

Ciao,
-- Ugo

Sam Heather

unread,
Mar 19, 2015, 3:40:01 PM3/19/15
to dand...@spaziodati.eu
Hi Ugo,

Thanks for the reply.  I'm trying to use this for my final-year dissertation project, and am having to use a cross between your similarity API and Cortical's similarity API.  I was hoping to just be able to use one to improve the quality of the data and mean I was using a single scale rather than trying to work out the difference between your scale and Corticals :)  Any chance you would consider granting free access to these models for my research in return for an accreditation and thanks in the report as to the semantic API I am using?

Thanks,

All the best,

Sam

Ugo Scaiella

unread,
Apr 16, 2015, 3:52:50 AM4/16/15
to dand...@spaziodati.eu
Hi Sam,

Sorry for my late reply.
Unfortunately, we are still not ready to release this kind of data, but if you are still interested, I suggest to take a look at Word2Vec.
It is a relatively new algorithm to model text data leveraging co-occurrences, that may suit your use case: https://code.google.com/p/word2vec/
There are also pre-trained models available.

Ciao,
-- Ugo
Reply all
Reply to author
Forward
0 new messages