Solr Integration Questions

301 views
Skip to first unread message

altruist

unread,
Feb 13, 2013, 4:04:21 PM2/13/13
to open-semant...@googlegroups.com

I find this project to be very interesting ,before I ask any questions , I would like to give a quick background of what I am trying to do.

  1. I collect the RDF data over the web in different ontologies and stores it in RDF format in Jena TDB.
  2. I then index it i.e the free-text part of it (literals) using Lucene.
  3. I then query it using the magic predicates in SPARQL.
  4. Lucene Indexes are stored separately from the TDB indices and need to be in synch.


As you might know that Lucene does not provide the advanced features that Solr provides like, faceting, key work highlighting, auto completion etc. When I saw that OSF integrates the Solr framework with a triple store i.e Virtuoso in this case, I was very interested in it. I have the following questions to make my understanding clear.

  1. conStruct implements the integration with Solr and Virtuoso , can it be used as an API in this case to replace Lucene inexing?
  2. I am assuming I will need to migrate to Virtuoso from TDB inorder to accomplish this ?
  3. will conStruct allow me use to use magic predicates in SPARQL within Virtuoso?
  4. will the sparql results from Virtuoso be represented in a Solr format so as to use the faceting and other advanced features in Drupla ?

I would appreciate if you could provide answers to these questions any examples would be greatly appreciated as well.


Thanks .

Frederick Giasson

unread,
Feb 14, 2013, 11:07:40 AM2/14/13
to open-semant...@googlegroups.com
Hi!


I find this project to be very interesting ,before I ask any questions , I would like to give a quick background of what I am trying to do.
  1. I collect the RDF data over the web in different ontologies and stores it in RDF format in Jena TDB.
  2. I then index it i.e the free-text part of it (literals) using Lucene.
  3. I then query it using the magic predicates in SPARQL.
  4. Lucene Indexes are stored separately from the TDB indices and need to be in synch.


As you might know that Lucene does not provide the advanced features that Solr provides like, faceting, key work highlighting, auto completion etc. When I saw that OSF integrates the Solr framework with a triple store i.e Virtuoso in this case, I was very interested in it. I have the following questions to make my understanding clear.

  1. conStruct implements the integration with Solr and Virtuoso , can it be used as an API in this case to replace Lucene inexing?

Not sure to understand the question here. conStruct does send queries to the structWSF API. It is the structWSF API that uses Virtuoso and Solr. the structWSF (Web Services Framework) is the abstraction and access layer. The only thing you have to care about, is to properly use the endpoints, anything else is management by the endpoints themselves (data synchronization between Virtuoso and Solr, etc).


  1. I am assuming I will need to migrate to Virtuoso from TDB inorder to accomplish this ?
Yes, the only thing you have to do is to do a RDF dump of your data, and to import it into structWSF by using the CRUD: Create web service endpoint. Then all the RDF will be properly indexed in both Virtuoso and Solr.


  1. will conStruct allow me use to use magic predicates in SPARQL within Virtuoso?

What you means by "magic predicates"? You can access the Virtuoso SPARQL endpoint anytime.


  1. will the sparql results from Virtuoso be represented in a Solr format so as to use the faceting and other advanced features in Drupla ?

In this case, you have to use the Search endpoint, which uses Solr. The Search endpoint has *several* options and parameters, you can do anything you want with it in terms of full text search and faceting.

Hope it helps,

Thanks,

Fred

altruist

unread,
Feb 14, 2013, 11:25:41 AM2/14/13
to open-semant...@googlegroups.com
Thank you for the answer Fred! I am sorry I was not very clear in my previous post.

I am aware that there are two different services Search and SPARQL, when I said "magic predicates"  I meant combining the free-text search and regular SPARQL in a single SPARQL  query like LARQ and then possibly returning the results in Solr XML form or in any other form where the Solr data can be leveraged.

Thanks.

altruist

unread,
Feb 14, 2013, 11:26:21 AM2/14/13
to open-semant...@googlegroups.com

Thank you for the answer Fred! I am sorry I was not very clear in my previous post.

I am aware that there are two different services Search and SPARQL, when I said "magic predicates"  I meant combining the free-text search and regular SPARQL in a single SPARQL  query like LARQ and then possibly returning the results in Solr XML form or in any other form where the Solr data can be leveraged.

Thanks.


Frederick Giasson

unread,
Feb 14, 2013, 11:33:37 AM2/14/13
to open-semant...@googlegroups.com
Hi


Thank you for the answer Fred! I am sorry I was not very clear in my previous post.

I am aware that there are two different services Search and SPARQL, when I said "magic predicates"  I meant combining the free-text search and regular SPARQL in a single SPARQL  query like LARQ and then possibly returning the results in Solr XML form or in any other form where the Solr data can be leveraged.

If you want to use the bif:contains SPARQL predicate in Virtuoso, you will have to configure it properly. Once done, then it will work no issues.

If you use the structWSF SPARQL endpoint, you have a series of format that are supported.

Right now, you can leverage Solr's power by using the Search endpoint, however the available format supported by the Search endpoint doesn't include the "Solr XML format". However, I am not sure why you would want that, since what matters is to be able to search your content, and then get what you are searching for no? Here are the supported return formats by the Search endpoint:

  • text/xml (structXML)
  • application/json (structJSON)
  • application/rdf+xml (RDF+XML)
  • application/rdf+n3 (N3/Turtle)
  • application/iron+json (irJSON)
  • application/iron+csv (commON)

Everything is explained here: http://techwiki.openstructs.org/index.php/Search

Also, you may be interested in the structWSF-PHP-API to help you querying the structWSF endpoints:

    https://github.com/structureddynamics/structWSF-PHP-API


Thanks,

Fred


altruist

unread,
Feb 14, 2013, 11:44:20 AM2/14/13
to open-semant...@googlegroups.com
Thanks again Fred!

The reason I was looking for a Solr response from text search was to leverage the UIs that are compatibe with SOLR returned data.

Will bif:contains be used to query the solr index too ? the reason I ask is that my requirement is not for a simple free text search but a free text search combined with the SPARQL is part of the SPARQL query predicate.

Frederick Giasson

unread,
Feb 14, 2013, 12:10:09 PM2/14/13
to open-semant...@googlegroups.com
Hi!

> The reason I was looking for a Solr response from text search was to
> leverage the UIs that are compatibe with SOLR returned data.

Ok make sense. In this case, what would be required to to modify the
Search endpoint to accept a new mime which would be Solr's returned
resultset mime, and then it would be to return that resultset as-is
instead of converting it to something else.

This can easily be done, no problem. That way, if you use the proper
mime for you request, then you will get that Solr resultset.

However, I would suggest you to check what solr returns directly on the
server, not sure these UIs will work perflectly with the kind of Solr
documents you will get from that Solr schema.

> Will bif:contains be used to query the solr index too ? the reason I
> ask is that my requirement is not for a simple free text search but a
> free text search combined with the SPARQL is part of the SPARQL query
> predicate.

SPARQL only queries Virtuoso. As far as I know, there is no SPARQL
querier for Solr.

However, this doesn't matter. If you check the documentation page for
the Search endpoint, you will see you can do all kind of filtering on
datasets, types, inferred types, attributes and attributes/values. Also
the wide range of boolean & lucene operators are available.


Thanks,

Fred
Reply all
Reply to author
Forward
0 new messages