RDFLib to access Virtuoso open source edition

312 views
Skip to first unread message

fabian...@gmail.com

unread,
Dec 18, 2013, 4:52:34 AM12/18/13
to rdfli...@googlegroups.com
Hi all,

May I know what is currently the best way to interact with Virtuoso open source edition using RDFLib, for querying and/or writing ?

Is the generic solution of accessing the SPARQL end-point the current way to do it, or are there some optimized libraries which are tool specific (Virtuoso, OWLIM, etc.) ?

The information here seems outdated: http://pythonhosted.org/virtuoso/

We did some tests for adding triples one by one on different back-end, and the Virtuoso/SPARQL gave bad performances compared to Berkeley DB, MySQL, PostgreSQL.

We would think that for large uploads, using the Virtuoso bulk loader features would be the way to go.

Then, for querying performances, we would expect that SPARQL queries on a SPARQL end-point (with a native triple store) should give better performances than queries on Berkeley DB, MySQL or PostgreSQL. Is there any information available about that ?

Thank you for any help or pointers
Fabian


Marc-Antoine Parent

unread,
Dec 18, 2013, 7:38:03 AM12/18/13
to rdfli...@googlegroups.com
Good day!
It is still work in progress, but mostly operational.
I have taken over William Waites' excellent virtuoso bindings for rdflib and sqlalchemy.
They can be found at 
This depends on a fork of pyodbc here:
(Use the v3-virtuoso branch)
Note: in odbc.ini, you need to refer to a virtuoso-odbc binding library. Make sure to use one of the unicode variants: virtodbcu.so or virtodbcu_r.so, as opposed to virtodbc.so or virtodbc_r.so
I am actively working on this, among other projects: expect continued evolution. Patches are welcome.
Cheers,
Marc-Antoine Parent

--
http://github.com/RDFLib
---
You received this message because you are subscribed to the Google Groups "rdflib-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rdflib-dev+...@googlegroups.com.
To post to this group, send email to rdfli...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rdflib-dev/c14b8271-0935-4467-8ec3-ba2cdcdb484c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Fabian Cretton

unread,
Dec 18, 2013, 8:02:37 AM12/18/13
to rdfli...@googlegroups.com
Marc-Antoine,

Thank you very much for your reply, merci.

We will have a look at your tools.

What can you say about the assumption I made in my first post, do they make sense ?
And thus, using the Virtuoso binding we would get better performances for writing mainly, but also querying ?

Thanks
Fabian

Marc-Antoine Parent

unread,
Dec 18, 2013, 8:35:57 AM12/18/13
to rdfli...@googlegroups.com
Avec plaisir!

To be 100% honest, I don't feel comfortable giving answers to this question, as I still have a lot of benchmarking work to do myself ;-)
Still, there is no doubt that some of the RDF queries are much faster in virtuoso. I did benchmarking of various RDF databases (in java, then) and virtuoso was more than holding its own.
One thing you would want to know is that virtuoso allows one to define RDF views of relational data (much different from storing RDF in a relational table!) and by their own benchmarks, this performs better than pure RDF. I am planning to mix this with graph data, I may report back on the performance when I have done some more work...
Cheers,
Marc-Antoine

Fabian Cretton

unread,
Dec 19, 2013, 1:34:53 AM12/19/13
to rdfli...@googlegroups.com
What you are telling me is that a native triple store is faster than storing RDF in relational tables, but that doing an RDF view on real relational data is still faster for querying ? of course then we loose the flexibility of a triple store, but that is interesting as I am currently having a look at D2R (SPARQL to SQL rewriting) which don't compete so far with native SQL performances from what I can see.

If you do your own performances tests, would you agree to notify me once they are done ?

Thanks, merci
Fabian

Marc-Antoine Parent

unread,
Dec 19, 2013, 8:11:51 AM12/19/13
to rdfli...@googlegroups.com
Le 2013-12-19 à 01:34, Fabian Cretton <fabian...@gmail.com> a écrit :

What you are telling me is that a native triple store is faster than storing RDF in relational tables, but that doing an RDF view on real relational data is still faster for querying ?

No, I was not making such a general claim. I was saying this was true of virtuoso's linked data views (which are distinct from D2R.)
I thought this was in a open-link originated article, but cannot find it again and may well have misremembered;
But I also remembered reading about it in benchmark results. 
Compare Virtuoso RV to Virtuoso TS in this article:
Note that more recent versions of the Berlin sparql benchmarks stop testing both cases, I am not sure why.
Also note that the virtuoso FAQ is less sanguine about the difference.

The recent Berlin SPARQL Benchmark shows some figures comparing Virtuoso SQL and SPARQL and SPARQL in front of relational representation. However, the test workload is heavily biased in favor of relational. See also BSBM: MySQL vs Virtuoso.

of course then we loose the flexibility of a triple store,

In theory, you should be able to mix the linked data view and the triple store view. 
This is what I still have to test and benchmark.

but that is interesting as I am currently having a look at D2R (SPARQL to SQL rewriting) which don't compete so far with native SQL performances from what I can see.

The opposite would have surprised me, actually.

If you do your own performances tests, would you agree to notify me once they are done ?

I will send this to the list, it is obviously of general interest.

Thanks, merci
Fabian

Avec plaisir,
Marc-Antoine

Reply all
Reply to author
Forward
0 new messages