score threshold on textMatch

0 views
Skip to first unread message

Tze-John Tang

unread,
May 8, 2014, 10:55:51 PM5/8/14
to sta...@clarkparsia.com
How does the score threshold affect the textMatch function? In your examples, you have:

SELECT DISTINCT ?s ?score WHERE { ?s ?p ?l. ( ?l ?score ) <http://jena.hpl.hp.com/ARQ/property#textMatch> ( 'mac' 0.5 50 ). }

Where 0.5 is a score threshold. Playing around with this with my data, I don't see where the threshold has any bearing. With values between 0 and 1, the same results are returned each time. Also there does not seem to be a relation between the ?score that is returned and the score threshold. I have tried to find this in the Lucene and Jena docs, but not much is shed on this. Can someone share how the score threshold is useful? At this point I have just decided to not use it at all.

Mike Grove

unread,
May 9, 2014, 7:57:50 AM5/9/14
to stardog
With a threshold of .5, you should not see results with results/scores lower than that threshold.  If you're seeing a case where that is not happening, can you please provide the query & data so that we can take a look?

Cheers,

Mike
 

--
-- --
You received this message because you are subscribed to the C&P "Stardog" group.
To post to this group, send email to sta...@clarkparsia.com
To unsubscribe from this group, send email to
stardog+u...@clarkparsia.com
For more options, visit this group at
http://groups.google.com/a/clarkparsia.com/group/stardog?hl=en

Tze-John Tang

unread,
May 9, 2014, 8:03:12 AM5/9/14
to sta...@clarkparsia.com
The ?score that is coming back has values like 2.145, 16.824, 20.32. I don't understand how that relates the threshold, which has values between 0 and 1? Because if I set the threshold to something larger than 1, then that value is taken as the number of records I want returned.

-tj

Mike Grove

unread,
May 9, 2014, 8:18:35 AM5/9/14
to stardog
On Fri, May 9, 2014 at 8:03 AM, Tze-John Tang <tzejoh...@gmail.com> wrote:
The ?score that is coming back has values like 2.145, 16.824, 20.32. I don't understand how that relates the threshold, which has values between 0 and 1? Because if I set the threshold to something larger than 1, then that value is taken as the number of records I want returned.

The threshold can be any valid float or decimal value, such as .5 or 5.0.  It cannot be an integer value, like 5; we use the datatype of the inputs to disambiguate which parameter is which.  The limit cannot be a float/decimal, only an integer, so that's how we tell the two apart.  If you're using "mac" 5 5, we don't know which is which, you'd have to do "mac" 5.0 5

Cheers,

Mike 

Tze-John Tang

unread,
May 9, 2014, 8:22:26 AM5/9/14
to sta...@clarkparsia.com
Thanks Mike. That explains it.

-tj
Reply all
Reply to author
Forward
0 new messages