Problems traversing instance data

38 views
Skip to first unread message

Leo Breebaart

unread,
Jul 26, 2012, 1:28:34 PM7/26/12
to surfrdf
Hi all,

I suspect I have a deep misunderstanding of how SuRF (and/or RDF)
is supposed to work. I have created a minimal example that
illustrates my problem, and I hope that a kind soul reading this
can explain to me what is going wrong.

I am trying to parse the following instance data representing a
tree with a root and two nodes:

--------------------------------------------------
:Top
a proc:ProcedureDefinition;
rdfs:label "Top";
proc:step [ a proc:ProcedureCall;
proc:definition :Sub1;
];
proc:step [ a proc:ProcedureCall;
proc:definition :Sub2;
]
.

:Sub1
a proc:ProcedureDefinition;
rdfs:label "Sub1"
.

:Sub2
a proc:ProcedureDefinition;
rdfs:label "Sub2"
.
-----------------------------------------------------

All I want to do using SuRF is (a) find the top node, and (b)
print the labels of the children. (a) is no problem, (b) is
what's driving me insane

My Python program contains the following relevant code:

-------------------------------------------------------
ProcedureDefinition =
session.get_class(surf.ns.PROC['ProcedureDefinition'])

topdef = ProcedureDefinition.get_by(rdfs_label="Top").first()
print topdef.rdfs_label
step = topdef.proc_step.first
print step.rdfs_label
---------------------------------------------------------

The first rdfs:label print is fine: it prints out "Top", just as
I expect. The second print statement, where I am expecting it to
print out the label of whatever child node of my top instance
happened to be picked as 'first', instead prints out:

[rdflib.term.Literal(u'Top'), rdflib.term.Literal(u'Sub2'),
rdflib.term.Literal(u'Sub1')]

i.e. it's a list of *all* the labels associated with *any*
ProcedureDefinition in the store!

As I said: I don't understand at all why this is happening, and
I'd be enormously grateful for some assistance, either on why
it's not doing what I expect, but also on how I could change
things so that it does. If it helps, I've put the full files on
DropBox: <http://dl.dropbox.com/u/31054689/min-test.tar.gz>

Regards, and many thanks in advance,

--
Leo

cuu...@gmail.com

unread,
Jul 26, 2012, 5:19:10 PM7/26/12
to sur...@googlegroups.com
The part that prints all three labels looks like a bug, should have
been empty list. Either that, or I too lack understanding how BNodes
work...

But to get to definition labels, you have to dig deeper:

print step.proc_definition.first.rdfs_label.first

Peteris


2012/7/26 Leo Breebaart <l...@lspace.org>:
> --
> You received this message because you are subscribed to the Google Groups "surfrdf" group.
> To post to this group, send email to sur...@googlegroups.com.
> To unsubscribe from this group, send email to surfrdf+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/surfrdf?hl=en.
>

Leo Breebaart

unread,
Jul 27, 2012, 3:29:00 AM7/27/12
to sur...@googlegroups.com


On Thursday, 26 July 2012 23:19:10 UTC+2, Pēteris Caune wrote:

The part that prints all three labels looks like a bug, should have
been empty list. Either that, or I too lack understanding how BNodes
work...

But to get to definition labels, you have to dig deeper:

print step.proc_definition.first.rdfs_label.first
 
Oops, you are right -- I think I was trying out so
many different things yesterday, that in the end I just
got too confused about what I was looking at, sorry.

Unfortunately, the problem remains. In my data,
every BNode step instance has a single proc:definition
instance attached to it, just as it has a single rdfs:label. But just as
rdfs:label in my example returned *all* the labels, if you try
to print or iterate over step.proc_definition, it returns a sequence
of *all* the ProcedureDefinition instances in the graph, not just the
one(s) associated with the BNode.

Note that iterating over the top level topdef.proc_step goes perfectly
fine, and gives me only the step instances actually associated with topdef.
It is just when I try to follow any predicate hanging off a BNode 'step'
instance that things seem to go haywire...

--
Leo

Cosmin Basca

unread,
Aug 2, 2012, 11:12:18 AM8/2/12
to sur...@googlegroups.com
Hi Leo,

I've been trying to track down what's going on with your example. At one given point the following query is issued to rdflib by surf:

SELECT DISTINCT ?v ?c   WHERE {  _:_ac6b19ab-513e-477e-a3a6-fbc977cfab1d <http://www.w3.org/2000/01/rdf-schema#label> ?v .  OPTIONAL { ?v <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?c }  } 

(the bnode id is going to be different every time you run it)

rdflib returns all 3 labels back to surf:
(rdflib.term.Literal(u'Top'),)
(rdflib.term.Literal(u'Sub2'),)
(rdflib.term.Literal(u'Sub1'),)

however if I attempt to find all triples directly that match such a pattern (ignore the Optional pattern for now) I get no results (as expected):

s = BNode('_:%s'%str(step.subject))
o = None
for t in store.reader.graph.triples((s, p, o)):
    print 'TRIPLE = ',t

this will print nothing, which is correct. 

This behavior leads me to believe there is a query processing bug in rdflib which is outside the scope of surf. I will let the rdflib team know about this issue and hopefully it will get fixed soon. 

In the mean time try to "skolemize" your blank nodes (simply turn them into resources by assigning proper uri ids as their subjects) , this should fix your problem. In most cases it's better to try to avoid bnodes due to poor support in the storage layer. You could for example try to load your test data into a triple store and see if this issue persists, I'd be very interested to know that. Hope this helps and thanks for pointing this out!

Cheers,
Cosmin  

Leo Breebaart

unread,
Aug 9, 2012, 11:25:23 AM8/9/12
to sur...@googlegroups.com
> This behavior leads me to believe there is a query processing bug in rdflib which is outside the scope of surf. I will let the rdflib team know about this issue and hopefully it will get fixed soon.

Many thanks for your answer! I never expected this to actually be a bug somewhere rather than me just getting it wrong. :-)


In the mean time try to "skolemize" your blank nodes

Understood, I'll do just that.

Thanks again,

-- Leo

Pēteris Caune

unread,
Oct 2, 2012, 10:02:57 AM10/2/12
to sur...@googlegroups.com
I'm working on small piece of demo code and encountered this same problem. Looks like that my understanding of BNodes *is* wrong:


"A blank node can appear in a query pattern. It behaves as a variable; a blank node in a query pattern may match any RDF term."

"Finally, BNodes emphatically do not make sense in the context of a query - since they become infinitely resolvable variables: which is not very useful. "

I don't know what the right thing to do for RDFLib and SuRF is, but to get my demo code working, I wrote a routine to skolemize my loaded data, as suggested by Cosmin:

def skolemize(store, session):
    rep = {}
    Dummy = session.get_class(surf.ns.SURF.Dummy)
    for s, p, o in store.reader.graph.triples((None, None, None)):
        ss = s
        oo = o
        need_fixup = False
        if isinstance(s, rdflib.BNode):
            ss = rep.setdefault(s, Dummy().subject)
            need_fixup = True
        if isinstance(o, rdflib.BNode):
            oo = rep.setdefault(o, Dummy().subject)
            need_fixup = True
        
        if need_fixup:
            store.writer.add_triple(ss, p, oo)
            store.writer.remove_triple(s, p, o)
    
    print "Did %d fixups" % len(rep)
            
It replaces BNodes with generated resources in surf.ns.SURF namespace. These then work as I expected BNodes to work :-)

Peteris
Reply all
Reply to author
Forward
0 new messages