How to do efficient textMatch?

13 views
Skip to first unread message

Ajay Kamble

unread,
Jan 29, 2016, 9:40:02 AM1/29/16
to Stardog
Hi,

I want to query specific property in Stardog database given a search term or phrase.

What is the most efficient way to do this in Stardog? FILTER(regex(?title, 'foo', 'i')) turned out to be very slow.

I tried textMatch but it is confusing and I am not sure how it works. Also the documentation does not give details of this feature.

Here is the query that I tried,
 

SELECT *
WHERE {
  ?var1 a co:item ;
    co:name ?title .
  (?title ?score) <tag:stardog:api:property:textMatch> ('foo' 0.5 20) .
}

The confusing part is this query returns only 6 results, although there are many results for the term 'foo' (these are returned by FILTER/regex but it is slow). I tried to use wildcards but leading wildcard gave an error that it is not supported. When I used trailing wildcard I got the same 6 results.

How to properly use textMatch? What configuration is required to setup text search?

I just specified [search.enabled=true] option when creating database. Is it enough? Are there more ways to make queries efficient like configuration for indexes?


Michael Grove

unread,
Jan 29, 2016, 9:59:11 AM1/29/16
to stardog
On Fri, Jan 29, 2016 at 9:40 AM, Ajay Kamble <ajay.ri...@gmail.com> wrote:
Hi,

I want to query specific property in Stardog database given a search term or phrase.

What is the most efficient way to do this in Stardog? FILTER(regex(?title, 'foo', 'i')) turned out to be very slow.

I tried textMatch but it is confusing and I am not sure how it works. Also the documentation does not give details of this feature.

What do you feel is missing from the docs?
 

Here is the query that I tried,
 

SELECT *
WHERE {
  ?var1 a co:item ;
    co:name ?title .
  (?title ?score) <tag:stardog:api:property:textMatch> ('foo' 0.5 20) .
}

The confusing part is this query returns only 6 results, although there are many results for the term 'foo' (these are returned by FILTER/regex but it is slow).

Full-text search is not exact in the same way query answering is, when combining the two, it's easy to get confused.  In the above query, which is similar to what's in the docs [1], you're searching the full-text index for all literals which match ''foo" and have a score above 0.5. Of whatever is returned from that set, you're taking the top 20 scoring results.  Those results are then joined with the two other Triple Patterns in your query. This is quite a different query from using a regex to filter literals after the fact.  Note that you can use the entire Lucene query syntax [2] with textMatch.
 
I tried to use wildcards but leading wildcard gave an error that it is not supported. When I used trailing wildcard I got the same 6 results.

As mentioned in [3] the configuration options for search are included with the other database options [4].  Leading wildcard search is disabled by default in Lucene, and we inherit that default.  You must explicitly enabled it if you want to use it.  But as noted in the Lucene documentation [5], it's expensive 
 

How to properly use textMatch? What configuration is required to setup text search?

I just specified [search.enabled=true] option when creating database. Is it enough? Are there more ways to make queries efficient like configuration for indexes?

You only need to enabled search to use it.  As I mentioned, other search options are provided and detailed in [4], but simply enabling them is sufficient for most cases. 

What queries are you trying that you feel are inefficient?

Cheers,

Mike 




--
-- --
You received this message because you are subscribed to the C&P "Stardog" group.
To post to this group, send email to sta...@clarkparsia.com
To unsubscribe from this group, send email to
stardog+u...@clarkparsia.com
For more options, visit this group at
http://groups.google.com/a/clarkparsia.com/group/stardog?hl=en

Ajay Kamble

unread,
Jan 29, 2016, 10:26:04 AM1/29/16
to sta...@clarkparsia.com
Hi Mike,

Thank you for quick reply.

1. What do you think is missing from the docs?
    In my opinion more examples with different scenarios would help. For example, a full text search query, one combined with normal sparql query. But this question is tricky for me because I am not sure what all is possible with this feature.

2. About full text search:
    Perhaps it would help if I explain what I want to do. Given a search term I want to scan a particular property (for example title) for matches. Can I use textMatch in this scenario or is it designed to do a search on all data? FILTER/regex does not return under 1 second for this kind of query.

3. Score:
    How score is calculated? Does it only consider match in triple or does it take into account occurrences across all related triples?

Michael Grove

unread,
Jan 29, 2016, 11:50:01 AM1/29/16
to stardog
Ajay,

On Fri, Jan 29, 2016 at 10:26 AM, Ajay Kamble <ajay.ri...@gmail.com> wrote:
Hi Mike,

Thank you for quick reply.

1. What do you think is missing from the docs?

Looks like your response was cut off.

Cheers,

Mike

Ajay Kamble

unread,
Jan 29, 2016, 11:51:58 AM1/29/16
to Stardog
Hi Mike,

Here are my comments,

1. What do you think is missing from the docs?
    In my opinion more examples with different scenarios would help. For example, a full text search query, one combined with normal sparql query. But this question is tricky for me because I am not sure what all is possible with this feature.

2. About full text search:
    Perhaps it would help if I explain what I want to do. Given a search term I want to scan a particular property (for example title) for matches. Can I use textMatch in this scenario or is it designed to do a search on all data? FILTER/regex does not return under 1 second for this kind of query.

3. Score:
    How score is calculated? Does it only consider match in triple or does it take into account occurrences across all related triples?

Michael Grove

unread,
Jan 29, 2016, 12:02:37 PM1/29/16
to stardog
On Fri, Jan 29, 2016 at 11:51 AM, Ajay Kamble <ajay.ri...@gmail.com> wrote:
Hi Mike,

Here are my comments,

1. What do you think is missing from the docs?
    In my opinion more examples with different scenarios would help. For example, a full text search query, one combined with normal sparql query. But this question is tricky for me because I am not sure what all is possible with this feature.

Do you find the example query that uses full-text search + an additional TriplePattern in [1] to be too generic?
 

2. About full text search:
    Perhaps it would help if I explain what I want to do. Given a search term I want to scan a particular property (for example title) for matches. Can I use textMatch in this scenario or is it designed to do a search on all data? FILTER/regex does not return under 1 second for this kind of query.

As briefly discussed in [2], only the literal values in the database are indexed. So textMatch is only matching against literals.  If you want to find the matching literals for a specific property, you then have to join the matching literals returned from textMatch with a triple pattern which will select only the literals for the property you're interested in.  You've done this in the query you provided via `?var1 co:name ?title`.

If your previous query, if you are not seeing some answers that you might be expecting, it's likely that they're below the score threshold, our outside of the limit you've specified, which in your case, was `20`.


3. Score:
    How score is calculated? Does it only consider match in triple or does it take into account occurrences across all related triples?

The score is provided by Lucene [3].

Ajay Kamble

unread,
Jan 29, 2016, 10:54:19 PM1/29/16
to Stardog
Is it possible to do exact match with textMatch? For example match exactly 'foo'?

The query that I tried by default does a wildcard match?

Michael Grove

unread,
Feb 1, 2016, 7:56:13 AM2/1/16
to stardog
On Fri, Jan 29, 2016 at 10:54 PM, Ajay Kamble <ajay.ri...@gmail.com> wrote:
Is it possible to do exact match with textMatch? For example match exactly 'foo'?

If you want exact match, use SPARQL.
 

The query that I tried by default does a wildcard match?


Yes, basically. The Lucene docs [1] explain how their algorithms work in more detail.

Cheers,

Mike

 
Reply all
Reply to author
Forward
0 new messages