are you taking note of the graph nature of rdf data when storing it into the database?

187 views
Skip to first unread message

dacresni

unread,
May 26, 2008, 2:38:22 PM5/26/08
to Django-RDF
there is a paper here on storing rdf data in relational databases.
http://infolab.stanford.edu/~melnik/rdf/db.html

dacresni

unread,
May 26, 2008, 2:42:21 PM5/26/08
to Django-RDF
actually, here is a better reference, http://www.databasecolumn.com/2008/01/databases-and-rdf.html

stefan

unread,
May 27, 2008, 1:06:16 AM5/27/08
to Django-RDF
There are two ways to materialize a graph of statements in a Django
database with Django-RDF installed -

First, the Django-RDF triple store uses the obvious three-column
statements table, and the graph can be traversed with self-joins. A
separate table of resources records a mapping from an internal
numerical identifier to the (string) URI, so the statements table
contains only integers. This is a slight optimization but the self-
joins are still inefficient. Don't put anything hot on top of them,
they'll melt... if you did want to serve up a high volume of RDF
results using the built-in Django-RDF triple store, you'd need to prop
something like memcached in front and figure out how to deal with
staleness.

But, second, Django-RDF doesn't require data to be stored in the
internal triple store. You can build your whole site with Django
without ever giving a thought to Django-RDF, then point the syncvb
command at your database and it will generate an RDF ontology that
matches the graph nature of your relations. Every table generates a
concept (e.g. RDF class), every column generates a verb and every cell
value becomes the object of a statement consisting of the row
identifier (subject), the column name (verb) and the cell value
(object). Foreign key columns generate statements with resource
objects, other columns generate statements with literal objects. If
you then write SPARQL queries over the resulting ontology, the SPARQL
compiler should generate the same efficient SQL that would be
generated by the Django ORM. This works rather nicely, although the
Django ORM is better at using outer joins.

There are pros and cons for each... The built-in data store lets you
evolve the ontology using user input, on the fly, but it's slow. Using
Django models to store all the data and then generate the ontology
with syncvb results in better performance, but the models are harder
to evolve. I like mixing the two, which the SPARQL compiler will do
without any problems.

Thanks for the links, I'd not seen them and enjoyed browsing through.
I'm just one guy playing in my spare time, I'm not out to build a
column-oriented database or anything like that. It might not really
help anyways, since column-oriented databases excel at read accesses
but don't work so well for writes - I bet a lot of Django sites do a
lot of writes, eh?

Cheers, Stefan

On May 26, 11:42 am, dacresni <vivacar...@gmail.com> wrote:
> actually, here is a better reference,http://www.databasecolumn.com/2008/01/databases-and-rdf.html
Reply all
Reply to author
Forward
0 new messages