Text Similarity Length

16 views
Skip to first unread message

L. Dee Miller

unread,
Apr 24, 2018, 1:32:13 PM4/24/18
to Dandelion Support Forum
Greetings,

I was just wondering about the performance of the text similarity API metrics as related to text length...

The documentation says that the system works "better" on text of 5-20 words, but our project wants to compare wiki pages 100+ words in length.

Would text similarity still be effective for 100+ words?

Also, are there any results which evaluate metric performance in terms of variable text length?

Regards,

Dr. Lee Miller

san...@dandelion.eu

unread,
Apr 26, 2018, 9:42:59 AM4/26/18
to Dandelion Support Forum
Dear Dr. Lee,

in our experience the performance on longer texts can vary greatly depending on the (type of) texts you are trying to compare and on the use case. The Text Similarity API has already been used for longer texts, with very good results.
We suggest that you experiment with it: a free account gives you 1000 daily units, which should be more than enough to perform some tests. Please take a look at the documentation, in particular the "bow" parameter, and try different values to understand what value gives the best results in your use case.

Let us know if you have any other questions.
Regards,

Roberto Santoro
Dandelion API team
Reply all
Reply to author
Forward
0 new messages