Text Similarity - How to add acronyms , synonyms

23 views
Skip to first unread message

Vidya Nayak

unread,
Apr 21, 2020, 1:56:02 PM4/21/20
to Dandelion Support Forum
Hello,

I see that text similarity doesnt take into account , the acronyms that are used. Is there a way where we can add synonym mappings, acronyms ?

Regards,
Vidya

Vidya Nayak

unread,
Apr 23, 2020, 3:41:58 PM4/23/20
to Dandelion Support Forum
Hi , Can anybody help me with this question ?

Vidya Nayak

unread,
Apr 23, 2020, 3:42:09 PM4/23/20
to Dandelion Support Forum

Giacomo Berardi

unread,
Apr 24, 2020, 5:29:24 AM4/24/20
to Dandelion Support Forum
Hi Vidya,
it depends on several factors. Are these acronyms popular or related to some well known entity? Are they short and ambiguous? What type of similarity (semantic or syntactic) are you performing?

Giacomo Berardi
Dandelion team

Vidya Nayak

unread,
Apr 24, 2020, 5:35:37 AM4/24/20
to Dandelion Support Forum
1. Acronyms --> can be elated to domain specific or company specific ones used for certain products, processes etc. Ex: "WAS" might be an acronym for "Websphere Application Server", "PDD" is used for "Product Document Description".   So while comparing i want to make sure some acronyms can be fed as dictionaries which it can use to match similarity.

2. Semantic Similarity --> I am looking for semantic similarity . For example : "Opening store for Vegans" and "Looking for Food " can be associated teogether based on word simiarities like "Vegan" , "Food" etc ?

Giacomo Berardi

unread,
Apr 24, 2020, 12:54:02 PM4/24/20
to Dandelion Support Forum
For acronyms you can use Custom Spots. The documentation is here: https://dandelion.eu/docs/api/datatxt/custom-spots/v1/ , with custom spots you can force specific spots to be linked to specific Wikipedia entities. They are usually used for entity extraction, bot it is possibile to use entity extraction parameters also in Text Similarity, by appending the parameter name to `nex`, so in this case `nex.custom_spots` (see https://dandelion.eu/docs/api/datatxt/sim/v1/)

Regarding semantic similarity you can force the Text Similarity to use only it, by setting the parameter `bow` to `never` (custom spots will work exclusively on the semantic similarity)

cheers

Giacomo Berardi
Dandelion team
Reply all
Reply to author
Forward
0 new messages