Django Newbie question ( using Python/Django for Semantic Web Apps) - Query Jena Fuseki Triplestore.

783 views
Skip to first unread message

Bruce Whealton

unread,
Apr 21, 2016, 5:01:35 AM4/21/16
to rdflib-dev
Hello all,
        I was wondering about a couple things related to using Django with triple stores written in completely different programming languages.  
I know that rdflib includes support for SPARQL queries.  My first question is, #1) "Do I need to setup any models if I was going to use SPARQL 
to query Jena Fuseki (Triple store built in Java).  For simplicity sake in this discussion, let's assume all the data will come from Jena Fuseki.
It does not appear that there is any python driver for the various triple stores.  
       #2) If one was using a python database to store RDF triples, which one would be a common choice?  I suppose that for relatively 
small datasets using an SQL database engine is doable and would not cause any noticeable performance problems.  

Aside: I read "Programming the Semantic Web," by Toby Segram, Colin Evans & Jamie Taylor, years ago, prior to the existence of SPARQL 1.1.  Early chapters dealt with the start of a 
Triple store implementation in python.  It's too bad the authors didn't take that work one step further and create a production 
Triple Store in Python.

       I am definitely open to any tips on further learning in this area, such as with Django, or RDFLIB.  


      #3) I think the triple store database is very elegant and flexible.  I even prefer it over other NoSQL options.  But is it appropriate for all
applications - specifically of curiosity to me, is wanting to build a Personal Information Manager ( Contacts app plus much more ).  It is 
the one type of application where one might not want to expose all the data to the public.  Certainly, other apps I've explored and started building
are intended to be linked open data apps.  I'm just wondering if it is appropriate to use Semantic Web technologies if any part of the data
must be kept private?

Thanks,
Bruce         

Graham Klyne

unread,
May 12, 2016, 5:05:00 PM5/12/16
to rdflib-dev
If you're using Fuseki, then you would access it via HTTP, so if you can assemble and post the appropriate SPARQL queries and handle the results that come back, I don't think you'd need any special Python interface specifically  for Fuseki.

You might find https://github.com/RDFLib/sparqlwrapper to be helpful.

I see no problem in using semweb technologies for private data, even if they are designed to facilitate open data sharing.  Even in an open data environment there will often be parts that should be kept private.

FWIW, I've been using RDF with Django and find no problem.  There's no specific support in Django, but that's fine because Django doesn't force its model components onto an application.

#g

abhe...@gmail.com

unread,
Jul 26, 2016, 2:49:12 PM7/26/16
to rdflib-dev
You might also be interested in RDFLib-SQLAlchemy (https://github.com/RDFLib/rdflib-sqlalchemy), which tucks a triplestore into a relational database. The master branch includes some changes I proposed to allow the tables it uses to double as (unmanaged) Django models. I have Django app that is basically functional in that respect: https://github.com/aaronhelton/skjold From there it should be easy enough to extend into something pretty (on my to-do list, once I figure out a direction).

Re: privacy of semantic web data: I see no inherent issue with this. URIs need not all dereference to a publicly available entity. They should lead to entities that EXIST, but I see no reason why they can't lead to a "You are not authorized to access this resource."

Bruce Whealton

unread,
Aug 1, 2016, 5:30:48 PM8/1/16
to rdflib-dev, abhe...@gmail.com
Thanks for the tips.  
I just happened to have returned to this project (or projects) I had in mind earlier when I first posted the question.  It is interesting that two months after the original post, an additional post just recently.  

So, I had started with a book written in 2007 called "Programming the Semantic Web."  The APIs are vastly different now.  Many times I'd read steps from the text and wonder,
"did I miss something?"  

So, here are some of my questions now, having learned much more:
1) When working with Django and rdflib, have you found it to be a good practice to save federated SPARQL query results within some kind of local server database ( meaning the same server where my app runs )?
2) Alternatively, I could use either a Relational DB, or Sleepycat, or a Java based triple store like Jena Fuseki.  That seems to be a good idea if one were relying upon data from several federated triple stores.  If my triple store is local, and it holds results from a combination of other triple
stores with the triples returned to a local triple store, being faster than a federated query.  In other words, instead of waiting for a response from a SPARQL query to dbedia, Freebase.com and a movie triple store, I could put it all together into one triple store
on my server?  Does that make sense?  The problem with this approach is one isn't getting the most recent data.
3) Graham, you said you use RDF with Django - that can mean different things... using data from SPARQL queries to triple stores over which one has no control.  Or using a RDF or triple store as the only database serving content, e.g. no Postgresql or Mysql db but instead a 
triple store, the one and only db used for a web app is a triple store.  

Lastly, On Ubuntu, I do now have Postgresql, and Sqlite3.  I have a RDF/XML file on Github.  I want to import the triples from the RDF/XML file into both of Sqlite3 and  Postgresql.  What do I need to do to make that happen?
In other words, first I need to do a pip3 install of the python drivers for both dbs, actually, let's add Sleepycat too  -> then inside my app I have to import the store I want to use, or the stores.  -> Then I want to write the triples from my RDF/XML file into the database.  
I've had a hard time finding a snippet of code that describes this scenario.  Can someone show a snippet of code that one might execute inside IDLE to make this work.  

I ran into problems with various syntax errors.  I know I get a graph by using the rdflib.parse("http://path/to/some/file.rdf"), no actually it is 
g = Graph(store = 'Sleepycat')
then -> 
result = g.parse("http://path/to/some/file.rdf")
I found that in the docs, but result is never used.  
Isn't result holding the triples from the remote file.rdf?  
No, that intuition is wrong - it doesn't work.  How, do I take all the triples from my remote rdf file and save them in a database on my server?

Thanks,
Bruce

abhe...@gmail.com

unread,
Aug 1, 2016, 6:21:39 PM8/1/16
to rdflib-dev
I'll have to look at the rest of your post in more detail later.

> I ran into problems with various syntax errors.  I know I get a graph by using the rdflib.parse("http://path/to/some/file.rdf"), no actually it is 
> g = Graph(store = 'Sleepycat')
> then -> 
> result = g.parse("http://path/to/some/file.rdf")
> I found that in the docs, but result is never used.  
> Isn't result holding the triples from the remote file.rdf?  
> No, that intuition is wrong - it doesn't work.  How, do I take all the triples from my remote rdf file and save them in a database on my server?

If you're using Sleepycat, then your graph is being stored on your filesystem in a Berkeley DB. Here's how you initialize everything:

path = os.path.join(settings.BASE_DIR, 'db')
graph = Graph('Sleepycat')
graph.open(path, create=True)

Then when you do, e.g., graph.parse('some_file.ttl', format='turtle') it stores the loaded triples in the Berkeley DB, which in the case of the above is located in ./db

BDB is an embedded nosql type database. If you want to store the triples in a relational database, your better bet is rdflib-sqlalchemy.
Reply all
Reply to author
Forward
0 new messages