rdflib and 4Store

162 views
Skip to first unread message

Daniele Varrazzo

unread,
Jan 20, 2015, 11:35:31 AM1/20/15
to rdfli...@googlegroups.com
Hello,

I'm studying RDF and rdflib for an experimental project and I'm evaluating a persistent storage for our knowledge base.

Among the options, I'm evaluating 4Store, which has been easy to set up (it is packaged in Ubuntu and doesn't require all the java/eclipse nonsense of Jena et al.). However it doesn't seem supported by rdflib out-of-the-box. Couldn't google for anything specific about the topic, which suggests a limited users base.

After some debugging, I managed to query it as a SPARQLStore after setting "SPARQLWrapper.Wrapper._returnFormatSetting = []" (otherwise it was returning an empty response) and updating it has failed the first attempts (calling Graph.parse() fails the assert self.graph.store.formula_aware).

Before I start investing too much in it: does anyone have opinions about 4Store as a quad storage? Is there any other that would be advisable to use instead? I'm somewhat of a Postgres guy (slight understatement: I'm the current psycopg2 maintainer) but the SQLAlchemy wrapper I've tried seems bitrotten (it fails with a traceback suggesting it's not compatible with the current SA version, but there's no hint about what versions it supports) and I'm happy to use other more specific storage solutions, with preferences to ones compatible with rdflib and ones not requiring a whole javacentric environment to run.

Any hint is the most welcome, thank you very much.

-- Daniele

Alexey Zakhlestin

unread,
Jan 20, 2015, 1:36:25 PM1/20/15
to rdfli...@googlegroups.com
Opensource version of Virtuoso is a traditional choice.
It has its share of issues, but its the most popular one

--
Alexey Zakhlestin
CTO at Grids.by/you
https://github.com/indeyets
PGP key: http://indeyets.ru/alexey.zakhlestin.pgp.asc



signature.asc

Marc-Antoine Parent

unread,
Jan 20, 2015, 1:46:16 PM1/20/15
to rdfli...@googlegroups.com, Daniele Varrazzo
FYI:
I’m maintaining the SQLAlchemy/RDFlib wrapper for virtuoso; you may want to know there is one and who to contact.
Virtuoso indeed has its share of issues, but those I have struggled with are mostly in the relational layer. ymmv with the RDF layer.
I have not looked at 4store in a long time, and I do not have an opinion, but the non-java rdf storage options are limited.
I direct you to Bordercloud's triple store benchmarking site:
Best,
Marc-Antoine Parent

--
http://github.com/RDFLib
---
You received this message because you are subscribed to the Google Groups "rdflib-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rdflib-dev+...@googlegroups.com.
To post to this group, send email to rdfli...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rdflib-dev/fdb7f0d8-2e01-475a-9cb4-4ce7dd5a333b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sergio Fernández

unread,
Jan 21, 2015, 7:00:21 AM1/21/15
to rdfli...@googlegroups.com
Hi,

I'm one of the core developers of the SPARQLWrapper

On 20 January 2015 at 17:35, Daniele Varrazzo
<daniele....@gmail.com> wrote:
> After some debugging, I managed to query it as a SPARQLStore after setting
> "SPARQLWrapper.Wrapper._returnFormatSetting = []" (otherwise it was
> returning an empty response) and updating it has failed the first attempts
> (calling Graph.parse() fails the assert self.graph.store.formula_aware).

Then it looks that 4store is nor compliant with the SPARQL 1.1 Protocol...

> Before I start investing too much in it: does anyone have opinions about
> 4Store as a quad storage? Is there any other that would be advisable to use
> instead? I'm somewhat of a Postgres guy (slight understatement: I'm the
> current psycopg2 maintainer) but the SQLAlchemy wrapper I've tried seems
> bitrotten (it fails with a traceback suggesting it's not compatible with the
> current SA version, but there's no hint about what versions it supports) and
> I'm happy to use other more specific storage solutions, with preferences to
> ones compatible with rdflib and ones not requiring a whole javacentric
> environment to run.
>
> Any hint is the most welcome, thank you very much.

Try out Marmotta, Fuseki or Virtuoso, those 3 I'm confident enough
with the SPARQL 1.1 support.

Hope that helps.

Cheers,

--
Sergio Fernández <ser...@wikier.org>

Daniele Varrazzo

unread,
Jan 21, 2015, 9:16:30 AM1/21/15
to rdfli...@googlegroups.com
On Tue, Jan 20, 2015 at 4:35 PM, Daniele Varrazzo
<daniele....@gmail.com> wrote:
> the SQLAlchemy wrapper I've tried seems bitrotten (it fails with a traceback
> suggesting it's not compatible with the current SA version, but there's no
> hint about what versions it supports)

To further on this, just trying to run the following snippet:

rdflib.plugin.get('SQLAlchemy',
rdflib.store.Store)(rdflib.URIRef("rdflib_test"))

with SA versions 0.5 and 0.6, I get a traceback with: "ArgumentError:
schema.Column object expected" (from the __create_table_definitions()
method). With versions from 0.7 I get AttributeError: 'SQLAlchemy'
object has no attribute 'engine' instead.

Is there some prerequisite I'm missing?

Thank you, and thank you for the other suggestions about the different
storage options too.

-- Daniele

Daniele Varrazzo

unread,
Jan 21, 2015, 11:48:17 AM1/21/15
to Marc-Antoine Parent, rdflib-dev
On Tue, Jan 20, 2015 at 6:46 PM, Marc-Antoine Parent <mapa...@acm.org> wrote:
> I’m maintaining the SQLAlchemy/RDFlib wrapper for virtuoso; you may want to
> know there is one and who to contact.
> https://github.com/maparent/virtuoso-python

Thank you for the hint, I was taking a look at it today but trying to
connect produces a segfault. Testing:

from rdflib.store import Store
from rdflib.plugin import get as plugin
Virtuoso = plugin("Virtuoso", Store)
store = Virtuoso("DSN=VOS;UID=dba;PWD=***;WideAsUTF16=Y")

I get a segfault with:

#0 0x00000000004fd622 in PyDict_SetItem ()
#1 0x00007fd16db03cc2 in GetConnectionInfo
(pConnectionString=pConnectionString@entry=0x7fd16f511bd0,
cnxn=cnxn@entry=0x7fd16dd6f1b0) at
/home/piro/src/rdf/pyodbc-virtuoso-2.1.9-beta14/src/cnxninfo.cpp:177
177 PyDict_SetItem(map_hash_to_info, hash, info);
...

This happens with virtuoso-0.12.6, pyodbc-virtuoso-2.1.9-beta14 (and
with 2.1.8 patched as per your link too) and the system packages
installed with ubuntu 14.04 (unixodbc 2.2.14, virtuoso open source
6.1.6).

-- Daniele

Marc-Antoine Parent

unread,
Jan 21, 2015, 1:48:06 PM1/21/15
to Daniele Varrazzo, rdfli...@googlegroups.com
Hello, Daniele!

Out of curiosity, which driver are you using?
In my odbc.ini, I have
Driver      = /usr/local/virtuoso-opensource/lib/virtodbcu_r.so
location does not matter, but I think using the « ...u_r.so » version might matter.

Otherwise: You mention using the patch to pyodbc. I have not relied in the patch for a long time, the pyodbc I use is this branch of my fork:
Or, if you want it as a tarball
My current environment is running on MacOS, but I am also using my toolchain on a Ubuntu 14.04.1 LTS,
with unixodbc 2.2.14p2-5ubuntu5, virtuoso 7 head (actually we pegged to 5bdca4da81018ef72788394db7bbc5946bd788f1, but I test HEAD regularly on my mac).
I had used it with virtuoso 6.something, and it should still work, but I advise against it. A lot of the sparql 1.1 stuff is in 7 only, and I recommend the current HEAD, or at least 7.1 (7.0 was a nightmare.) It seems the next version (7.2?) will be out any day now. 
Best, 
Marc-Antoine

--
http://github.com/RDFLib
---
You received this message because you are subscribed to the Google Groups "rdflib-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rdflib-dev+...@googlegroups.com.
To post to this group, send email to rdfli...@googlegroups.com.

gjh

unread,
Jan 22, 2015, 5:00:38 AM1/22/15
to rdfli...@googlegroups.com
On Wed, 2015-01-21 at 14:16 +0000, Daniele Varrazzo wrote:
> On Tue, Jan 20, 2015 at 4:35 PM, Daniele Varrazzo
> <daniele....@gmail.com> wrote:
> > the SQLAlchemy wrapper I've tried seems bitrotten

The resource for the MySQL connector changed versions. Travis CI has
removed support for Py2.5 & Py3.2. Other than that, the CI appears to be
coherent.

https://travis-ci.org/RDFLib/rdflib-sqlalchemy

I've not been able to replicate Travis' Py3 syntax error locally, "tox
-e py34" executes successfully.

Cheers

Graham

signature.asc

Daniele Varrazzo

unread,
Jan 22, 2015, 7:42:02 AM1/22/15
to rdflib-dev
Uhm, thank you for making me insisting on that. I've started again
from the "illustrative unit test" in the project readme, and the
surprise is that:

x = plugin.get('SQLAlchemy', Store)(identifier=ident)

works while

plugin.get('SQLAlchemy', Store)(identifier=ident)

fails: it was the repr() in the interactive shell fails. Opened bug
<https://github.com/RDFLib/rdflib-sqlalchemy/issues/10>.

I'll keep taking a look at this option too, thank you.

-- Daniele

Daniele Varrazzo

unread,
Jan 22, 2015, 9:18:54 AM1/22/15
to Marc-Antoine Parent, rdflib-dev
On Wed, Jan 21, 2015 at 6:48 PM, Marc-Antoine Parent <mapa...@acm.org> wrote:
> Hello, Daniele!
>
> Out of curiosity, which driver are you using?
> In my odbc.ini, I have
> Driver = /usr/local/virtuoso-opensource/lib/virtodbcu_r.so
> location does not matter, but I think using the « ...u_r.so » version might
> matter.

That seems to solve the segfault yes. Connection fails but at least in
a clean way. I'll try to debug it more as soon as I'll be able to take
a look at Virtuoso again: for a while I'll be busy with other aspects
of the project.

By the way, connection fails in a silent way, only with an error
logged in the python logger, hence swallowed if no handler is
configured.

In [1]: import logging
In [2]: logging.basicConfig()
....
In [6]: store = Virtuoso("DSN=VOS;UID=dba;PWD=***;WideAsUTF16=Y")
ERROR:virtuoso.vstore:Virtuoso Connection Failed:
Traceback (most recent call last):
File "/home/piro/src/rdfgambit/env/lib/python2.7/site-packages/virtuoso-0.12.6-py2.7.egg/virtuoso/vstore.py",
line 118, in open
self._connection = pyodbc.connect(dsn)
Error: ('2', '[2] [unixODBC][ (-1) (SQLDriverConnectW)')

Not a big deal but not even the most idiomatic way to fail.


> Otherwise: You mention using the patch to pyodbc. I have not relied in the
> patch for a long time, the pyodbc I use is this branch of my fork:
> pip install -e
> git+https://github.com/maparent/pyodbc.git@v3-virtuoso#egg=pyodbc
> Or, if you want it as a tarball
> http://github.com/maparent/pyodbc/tarball/v3-virtuoso#egg=pyodbc
> My current environment is running on MacOS, but I am also using my toolchain
> on a Ubuntu 14.04.1 LTS,
> with unixodbc 2.2.14p2-5ubuntu5, virtuoso 7 head (actually we pegged to
> 5bdca4da81018ef72788394db7bbc5946bd788f1, but I test HEAD regularly on my
> mac).

Well, in this case there is a lot of out of date information still
flying around. Your project docs as well as the PyPI page and the
project readme point to this blog article:

http://river.styx.org/ww/2010/10/pyodbc-spasql/index

which suggests to patch pyodbc version 2.1.8.

Further googling led me to download the package:

http://river.styx.org/ww/2010/10/pyodbc-spasql/pyodbc-virtuoso-2.1.9-beta14.tar.gz

I haven't found anywhere else the hint that the best version is the
one from your github branch (from the readme it seems the two
solutions seem equivalent): you may well kill any reference to the
2010 article and amend it to the most up-to-date instructions.


> I had used it with virtuoso 6.something, and it should still work, but I
> advise against it. A lot of the sparql 1.1 stuff is in 7 only, and I
> recommend the current HEAD, or at least 7.1 (7.0 was a nightmare.) It seems
> the next version (7.2?) will be out any day now.

Thank you very much for this bit of information too. I went for the
lazy road of using the packaged version to try at least to connect and
communicate with it, but should I start needing it for a more serious
usage I'll definitely switch to 7.1 or whatever the current version
is.

Cheers,

-- Daniele

gjh

unread,
Jan 22, 2015, 10:05:24 AM1/22/15
to rdfli...@googlegroups.com
On Thu, 2015-01-22 at 12:41 +0000, Daniele Varrazzo wrote:
> fails: it was the repr() in the interactive shell fails. Opened bug
> <https://github.com/RDFLib/rdflib-sqlalchemy/issues/10>.

Noted, thanks.

My usual modus operandi:

$ ./.tox/py34/bin/python
Python 3.4.0 (default, Apr 11 2014, 13:05:11)
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import rdflib
INFO:rdflib:RDFLib Version: 4.2.0-dev
>>> import rdflib_sqlalchemy
>>> g = rdflib.ConjunctiveGraph('SQLAlchemy')
>>> g.open('postgresql://DBUSER:DBPASS@localhost/test', create=False)
1
>>> list(g.triples((None, None, None)))
[(rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#Property'),
rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'),
rdflib.term.URIRef('http://www.w3.org/2000/01/rdf-schema#Class')),
(rdflib.term.URIRef('http://www.w3.org/2000/01/rdf-schema#Class'),
rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'),
rdflib.term.URIRef('http://www.w3.org/2000/01/rdf-schema#Class'))]
>>>

Please bear in mind that the only "supported" back-end for RDFLib is
SleepyCat. It's a *lot* more robust and waaay faster than this
quick'n'dirty SQL lash-up which is really only intended for fooling
around when BerkeleyDB isn't available for rights reasons.

Cheers

Graham

signature.asc

Daniele Varrazzo

unread,
Jan 22, 2015, 10:16:16 AM1/22/15
to rdflib-dev
On Thu, Jan 22, 2015 at 3:05 PM, gjh <g...@bel-epa.com> wrote:
> Please bear in mind that the only "supported" back-end for RDFLib is
> SleepyCat. It's a *lot* more robust and waaay faster than this
> quick'n'dirty SQL lash-up which is really only intended for fooling
> around when BerkeleyDB isn't available for rights reasons.

That's surprising, but I'll keep that in mind, thanks. I wanted to use
postgres as a backend to peek inside the generated schema and see the
inside better, as I don't have a lot of practice with bdb.

-- Daniele

gjh

unread,
Jan 22, 2015, 10:59:29 AM1/22/15
to rdfli...@googlegroups.com
The core Sleepycat key-value calls map well to a Python dict approach,
this simplifies the i/f to the back-end.

Some time ago, I scribbled down some anatomical notes on the SleepyCat
RDFLib model:

https://rdfextras.readthedocs.org/en/latest/store/anatomy.html

and ditto for the AbstractSQL store model:

https://rdfextras.readthedocs.org/en/latest/store/abstract_sql_store.html

both should be read after:

https://rdflib.readthedocs.org/en/latest/univrdfstore.html

There's a worked example of the SleepyCat key-value implementation using
KyotoCabinet:

https://github.com/RDFLib/rdflib-kyotocabinet/blob/master/rdflib_kyotocabinet/KyotoCabinet.py

and another for leveldb. This is currently suffering from process lock
problems

https://github.com/RDFLib/rdflib-leveldb

Maybe datastore has something to offer:
http://datastore.readthedocs.org/en/latest/index.html

or there's PostgreSQL's hstore key-value implementation which also looks
promising:

http://www.postgresql.org/docs/current/static/hstore.html

Basically, if you have a key-value backend which behaves pretty much the
same as a Python dict, there's a good chance it can be pressed into
service. In practice, SleepyCat remains a hard target to beat.

The original AbstractSQL back-ends were necessarily tuned for
performance. SQLA is probably similarly tunable-per-backend but that'd
be another story.

Cheers

Graham

signature.asc
Reply all
Reply to author
Forward
0 new messages