LDCache

54 views
Skip to first unread message

Ivan Ruiz Rube

unread,
Oct 8, 2013, 5:26:50 AM10/8/13
to lmf-...@googlegroups.com
Dear LMF Users,

I have a problem using the cache capabilities of LMF. I can not retrieve resources from the external endpoints configurated in my LMF installation, neither with the DBPedia SPARQL sample endpoint.

The property ldcache.enabled is activated and I have found interesting nothing about it in the logs marmotta-main, catalina and access-log.

Please, any idea?

Thanks in advance.

Regards.



Jakob Frank

unread,
Oct 8, 2013, 5:53:22 AM10/8/13
to users, lmf-...@googlegroups.com
[ cross-posting to us...@marmotta.incubator.apache.org as LDCache has
moved to Apache Marmotta ]

Hi Ivan,

did you try explicitly requesting a resource from LDCache, e.g. by
querying to "live" webservice?
e.g.
<http://localhost:8080/LMF/cache/live?uri=http://dbpedia.org/resource/Berlin>

Plus: which version are you currently using?

Best,
Jakob
--
DI Jakob Frank
Knowledge and Media Technologies

Salzburg Research Forschungsgesellschaft mbH
Jakob Haringer-Strasse 5/3 | 5020 Salzburg, Austria
T: +43.662.2288-419
jakob...@salzburgresearch.at
http://www.salzburgresearch.at

Ivan Ruiz Rube

unread,
Oct 8, 2013, 6:46:07 AM10/8/13
to lmf-...@googlegroups.com, users
Thank you Jakob.

I am using the version 3.0.0 of LMF. I have seen how the triples are well received by the webservice "live". However, when I use the SNORQL interface I can not retrieve this triples. It seems like the collected triples are not stored in the cache.



Ivan Ruiz Rube

unread,
Oct 18, 2013, 6:20:53 AM10/18/13
to lmf-...@googlegroups.com, users
Hello,

I have checked that when I use the webservice /{BASE}/cache/cached the resources are correctly fetched and saved into the cache context. However, when I use the SNORQL interfaces the triples are not received. I have modified the params ldcache.expiry and ldcache.minexpiry but I am not be able to get it working.

Thanks.

Sergio Fernández

unread,
Oct 18, 2013, 6:47:19 AM10/18/13
to us...@marmotta.incubator.apache.org, Ivan Ruiz Rube, lmf-...@googlegroups.com
Hi Ivan,

On 18/10/13 12:20, Ivan Ruiz Rube wrote:
> I have checked that when I use the webservice /{BASE}/cache/cached the
> resources are correctly fetched and saved into the cache context. However,
> when I use the SNORQL interfaces the triples are not received.

Jakob may correctly me if I'm wrong, but I think the Marmotta SPARQL
implementation is not cache-aware; i.e., SPARQL queries would not
automatically cache resources not locally stored before query time.

Try to browse the data:

http://host/marmotta/resource?uri=http://example.org/your/uri

which should make use of LDCache to transparently retrieve missing
resources.

If you'd need to query, LDPath does cache in query time.

Hope this helps.

Cheers,

--
Sergio Fern�ndez
Senior Researcher
Knowledge and Media Technologies
Salzburg Research Forschungsgesellschaft mbH
Jakob-Haringer-Stra�e 5/3 | 5020 Salzburg, Austria
T: +43 662 2288 318 | M: +43 660 2747 925
sergio.f...@salzburgresearch.at
http://www.salzburgresearch.at

Ivan Ruiz Rube

unread,
Oct 18, 2013, 8:27:56 AM10/18/13
to Sergio Fernández, us...@marmotta.incubator.apache.org, lmf-...@googlegroups.com
Dear Sergio,

I think LD Caching is intended for SPARQL queries, too. In the introduction of "Linked Data Caching" appears: "Apache Marmotta can optionally be extended with a Linked Data Caching Module that transparently retrieves resources from the Linked Data Cloud when they are needed (e.g. when querying a relation to a remote resource) and caches them transparently. Linked Data Caching is integrated at the triple store level and thus available to all services accessing the triple store, including the SPARQL endpoint."

Besides, when I see the existing resources in the DataView page appear, for example, the resource with identifier http://localhost:8080/LMF/resource/Book has http://localhost:8080/resource/Book as the real URI, resulting hence in a 404 HTTP Error. It is rare. I have checked that the parameter kiwi.context is set to http://localhost:8080/LMF/

Could be this the source of the problem with the LDCache?

Regards

Sebastian Schaffert

unread,
Oct 18, 2013, 8:55:25 AM10/18/13
to lmf-...@googlegroups.com, Sergio Fernández, us...@marmotta.incubator.apache.org
Hi Ivan,

let me first say that a complete implementation of SPARQL over the Linked Data Cloud is obviously not possible due to conceptual discrepancies: SPARQL is a closed-world database-inspired query language while Linked Data is an open-world concept with the additional restriction that a a resource only knows its outgoing RDF links but not its incoming RDF links. Even a simple query like SELECT * WHERE {?s ?p ?o} can obviously not be answered with a conceptually complete result. What will work in certain cases (and this is what LDCache implements) is retrieval of triples where the subject is a known URI resource, so a SPARQL query like SELECT * WHERE {<http://...> ?p1 ?o1 . ?o1 ?p2 ?o2 } can in theory be answered correctly and completely. There are even some other more complex cases where this might be possible (this has been described also in a paper by Olaf Hartig). However, we explicitly make no such guarantees, because the question if the result of a certain SPARQL query will be complete or not is not easy to answer from a user side, cannot be decided on purely syntactical characteristics and requires expert knowledge. We therefore provide our own custom query language (called LDPath), which is capable of simulating the way a user would navigate through the Linked Data Cloud by following paths starting at a certain subject. LDPath fully works together with LDCache, you can even use it on the command line without the whole Marmotta platform.

Now to the concrete question you have: in theory, LMF/Marmotta can work in the way you describe. In practice, it does not because in the default distribution we are taking a "short path" for SPARQL queries by translating them directly into SQL on the underlying database. The disadvantage is that certain repository layers (like the Linked Data Caching) will not be triggered, the advantage is that SPARQL queries do not run over the expensive in-memory API and are therefore much faster. However, it is possible to turn off this kind of optimization by setting the configuration option "sparql.strategy" to "sesame" instead of "native". Just don't do it with too big amounts of data and too complex SPARQL queries :-)

Greetings,

Sebastian


2013/10/18 Ivan Ruiz Rube <ruiz...@gmail.com>

--
Sie erhalten diese Nachricht, weil Sie Mitglied der Google Groups-Gruppe "LMF Users" sind.
Um Ihr Abonnement für diese Gruppe zu beenden und keine E-Mails mehr von dieser Gruppe zu erhalten, senden Sie eine Email an lmf-users+...@googlegroups.com.
Weitere Optionen: https://groups.google.com/groups/opt_out

Sergio Fernández

unread,
Oct 18, 2013, 9:26:03 AM10/18/13
to us...@marmotta.incubator.apache.org, Ivan Ruiz Rube, lmf-...@googlegroups.com
Hi Ivan,

On 18/10/13 14:27, Ivan Ruiz Rube wrote:
> I think LD Caching is intended for SPARQL queries, too. In the introduction
> of "Linked Data Caching" appears: "Apache Marmotta can optionally be
> extended with a Linked Data Caching Module that transparently retrieves
> resources from the Linked Data Cloud when they are needed (e.g. when
> querying a relation to a remote resource) and caches them transparently.
> Linked Data Caching is integrated at the triple store level and thus
> available to all services accessing the triple store, including the SPARQL
> endpoint."

Well, this is an erroneous interpretation, maybe because a wrong wording
in the documentation. We need to improve it a lot to help our users, we
are aware of such issue. So such kind a comments are welcomed!

But what that actually tries to say is, whatever data was previously
retrieved by the cache system, could be used in your SPARQL queries.
Internally all the cached triples are stored at an independent named
graph, so fully available by SPARQL.

What it is less true, and Sebastian has already provided you some
details, is that SPARQL is able to lively cache missing resources used
in queries. This would be possible by switching the SPARQL strategy, but
it may have a big impact on the performance.

> Besides, when I see the existing resources in the DataView page appear, for
> example, the resource with identifier
> http://localhost:8080/LMF/resource/Book has
> http://localhost:8080/resource/Book as the real URI, resulting hence in a
> 404 HTTP Error. It is rare. I have checked that the parameter kiwi.context
> is set to http://localhost:8080/LMF/

The question here would be: how did you import that resource? Because it
just looks like a wrong base URI when parsing. By default Marmotta,
hence LMF, takes the URI where it's running to parse files.

Hope we've clarified some of your questions.

Have a nice weekend.

Cheers,

--
Sergio Fernández
Senior Researcher
Knowledge and Media Technologies
Salzburg Research Forschungsgesellschaft mbH
Jakob-Haringer-Straße 5/3 | 5020 Salzburg, Austria

Ivan Ruiz Rube

unread,
Oct 18, 2013, 1:20:45 PM10/18/13
to Sergio Fernández, us...@marmotta.incubator.apache.org, lmf-...@googlegroups.com
Sebastian, Sergio, thank you very much for your explanations.

I don't finish to understand it well. When you say that SPARQL is able to lively cache missing resources used in queries by switching the SPARQL strategy to "sesame", is this feature implemented? or not?  Because, I have just checked using the sesame strategy how the external triples are not available in SPARQL until the LDPath engine doesn't ask for them. It is a shame, because I thought LDCache would offer a behavior similar to the SERVICE extension of SPARQL, but avoiding include the endpoint URI in the query itself.

Nice weekend

Regards


Iván


Sergio Fernández

unread,
Oct 19, 2013, 5:13:26 AM10/19/13
to lmf-...@googlegroups.com, Ivan Ruiz Rube, us...@marmotta.incubator.apache.org
Hi Ivan,

On 18/10/13 19:20, Ivan Ruiz Rube wrote:
> I don't finish to understand it well. When you say that SPARQL is able to
> lively cache missing resources used in queries by switching the SPARQL
> strategy to "sesame", is this feature implemented? or not? Because, I have
> just checked using the sesame strategy how the external triples are not
> available in SPARQL until the LDPath engine doesn't ask for them.

It should. Personally I did no switch to such strategy for quite a long,
so we may have an issue with it. Would be nice to have some more details
from your logs as well as the query and remote resource your are missing.

> It is a shame, because I thought LDCache would offer a behavior similar
> to the SERVICE extension of SPARQL, but avoiding include the endpoint URI
> in the query itself.

Well, you could have a similar behaviour, but SPARQL Federated Query
works over the endpoint, while LDCache works a a resource level. Not in
all cases both methods are available, and each method has its pros and cons.

Cheers,

--
Sergio Fern�ndez
Senior Researcher
Knowledge and Media Technologies
Salzburg Research Forschungsgesellschaft mbH
Jakob-Haringer-Stra�e 5/3 | 5020 Salzburg, Austria
sergio.f...@salzburgresearch.at
http://www.salzburgresearch.at

Ivan Ruiz Rube

unread,
Oct 21, 2013, 3:37:11 AM10/21/13
to Sergio Fernández, lmf-...@googlegroups.com, us...@marmotta.incubator.apache.org
Hi Sergio,

The example case is the following:

First, I added the triple (subject is local resource and object is a remote resource:

owl:sameAs 

Then, I requested the SPARQL query:

SELECT DISTINCT ?externalResource
WHERE 
  { <http://localhost:8080/LMF/resource/380014300> owl:sameAs ?externalResource 
  }

obtaining the expected result:

?externalResource
------------------------
  

Then, I requested this new query:

SELECT DISTINCT ?externalResource ?name
WHERE 
  { <http://localhost:8080/LMF/resource/380014300> owl:sameAs ?externalResource .
     ?externalResource rdfs:label ?name
  }
  
obtaining no results.

There are no error messages in the log files. In a next step, I asked for the following LDPath query

    owl:sameAs / rdfs:label :: xsd:string


obtaining these expected results:
Field                                                           Results
----------------------------------------------------------------------------
http://www.w3.org/2002/07/owl#sameAs   Proyecto MPS-G3


Meanwhile marmotta-main.log throws the following proper message:

09:06:04.478 INFO  o.a.m.l.s.p.AbstractHttpProvider - retrieved 10 triples for resource http://spi-fm.uca.es/d2r4ea/resource/projects/1377; expiry date: Mon Oct 21 09:06:14 CEST 2013


Finally, I requested the same query 

SELECT DISTINCT ?externalResource ?name
WHERE 
  { <http://localhost:8080/LMF/resource/380014300> owl:sameAs ?externalResource .
     ?externalResource rdfs:label ?name
  }

obtaining in this time, the expected results:

externalResource name
----------------------                                                            -------


I am using SESAME strategy, but it seems that the remote resource retrieval is only triggered with LDPath queries, but not with SPARQL.

Thank you very much.



Iván


On Sat, Oct 19, 2013 at 11:13 AM, Sergio Fernández <sergio.f...@salzburgresearch.at> wrote:
Hi Ivan,


On 18/10/13 19:20, Ivan Ruiz Rube wrote:
I don't finish to understand it well. When you say that SPARQL is able to
lively cache missing resources used in queries by switching the SPARQL
strategy to "sesame", is this feature implemented? or not?  Because, I have
just checked using the sesame strategy how the external triples are not
available in SPARQL until the LDPath engine doesn't ask for them.

It should. Personally I did no switch to such strategy for quite a long, so we may have an issue with it. Would be nice to have some more details from your logs as well as the query and remote resource your are missing.


It is a shame, because I thought LDCache would offer a behavior similar
to the SERVICE extension of SPARQL, but avoiding include the endpoint URI
in the query itself.

Well, you could have a similar behaviour, but SPARQL Federated Query works over the endpoint, while LDCache works a a resource level. Not in all cases both methods are available, and each method has its pros and cons.


Cheers,

--
Sergio Fernández

Senior Researcher
Knowledge and Media Technologies
Salzburg Research Forschungsgesellschaft mbH
Jakob-Haringer-Straße 5/3 | 5020 Salzburg, Austria

Sergio Fernández

unread,
Oct 21, 2013, 5:08:29 AM10/21/13
to Ivan Ruiz Rube, lmf-...@googlegroups.com, us...@marmotta.incubator.apache.org
Hi Ivan,

On 21/10/13 09:37, Ivan Ruiz Rube wrote:
> Then, I requested this new query:
>
> SELECT DISTINCT ?externalResource ?name
> WHERE
> { <http://localhost:8080/LMF/resource/380014300> owl:sameAs
> ?externalResource .
> ?externalResource rdfs:label ?name
> }
>
> obtaining no results.

OK. Please, read bellow further explanation.

> There are no error messages in the log files. In a next step, I asked for
> the following LDPath query
>
>...
>
> Meanwhile marmotta-main.log throws the following proper message:
>
> 09:06:04.306 INFO o.a.m.l.s.p.AbstractHttpProvider - retrieving resource
> data for http://spi-fm.uca.es/d2r4ea/resource/projects/1377 from 'SPARQL'
> endpoint, request URI is <
> http://spi-fm.uca.es/d2r4ea/sparql?query=SELECT+%3Fp+%3Fo+WHERE+%7B+%3Chttp%3A%2F%2Fspi-fm.uca.es%2Fd2r4ea%2Fresource%2Fprojects%2F1377%3E+%3Fp+%3Fo+%7D
>>
> 09:06:04.478 INFO o.a.m.l.s.p.AbstractHttpProvider - retrieved 10 triples
> for resource http://spi-fm.uca.es/d2r4ea/resource/projects/1377; expiry
> date: Mon Oct 21 09:06:14 CEST 2013
>
>
> Finally, I requested the same query
>
> SELECT DISTINCT ?externalResource ?name
> WHERE
> { <http://localhost:8080/LMF/resource/380014300> owl:sameAs
> ?externalResource .
> ?externalResource rdfs:label ?name
> }
>
> obtaining in this time, the expected results:
>
> externalResource name
> ----------------------
> -------
> <http://spi-fm.uca.es/d2r4ea/resource/projects/1377> "Proyecto MPS-G3"

That's expected, because the cache previously retrieved the resource,
and store it in a specific named graph, so it's completely transparent
to the SPARQL query evaluation

You can see the cache resource, for instance, by browsing it:

<http://localhost:8080/LMF/resource?uri=http://spi-fm.uca.es/d2r4ea/resource/projects/1377

> I am using SESAME strategy, but it seems that the remote resource retrieval
> is only triggered with LDPath queries, but not with SPARQL.

I've just have a discussion with Sebastian about that. And now I'd
rephrase what we initially said: "by switching to the SPARQL
Sesame-strategy, Marmotta could be able to lively cache missing
resources referenced in query time". Besides the impact on performance
we commented, the LDCache trigger is not warranted because it depends on
how Sesame optimizes and evaluates the query.

In short, I'd not recommend to depend on such strategy for querying
Marmotta. If you need cache, query with LDPath, for the actual query or
just to trigger the caching and then use SPARQL, as you prefer.

Hope we have clarified the issue. I'll update the documentation at the
web site to avoid confusions.

Cheers,

--
Sergio Fernández
Senior Researcher
Knowledge and Media Technologies
Salzburg Research Forschungsgesellschaft mbH
Jakob-Haringer-Straße 5/3 | 5020 Salzburg, Austria
T: +43 662 2288 318 | M: +43 660 2747 925
sergio.f...@salzburgresearch.at
http://www.salzburgresearch.at

Ivan Ruiz Rube

unread,
Oct 21, 2013, 5:50:29 AM10/21/13
to Sergio Fernández, lmf-...@googlegroups.com, us...@marmotta.incubator.apache.org
ok.

I will trigger the caching using a LDPath such that, owl:sameAs / * / * / * / * :: xsd:string

:-)

Thank you for your attention.

Regards

Iván


Reply all
Reply to author
Forward
0 new messages