sorting rdfMultiple?

4 views
Skip to first unread message

Bruce D'Arcus

unread,
Apr 6, 2008, 1:21:01 PM4/6/08
to rdfalchemy-dev
Bear with me, newbie question:

I want to define a model with an attribute that returns its contents
sorted on a particular attribute. How best to do that?

Thanks,
Bruce

Philip Cooper

unread,
Apr 6, 2008, 4:11:43 PM4/6/08
to rdfalch...@googlegroups.com
Not sure if you are talking about returning
a) a list of somethings (i.e. from SomeClass.filter_by(this="that"))
or:
b) a list of things returned in an attribute call (e.g.
someInstance.non_unique_attribute)


a) would require your SomeClass to def the filter_by classmethod (or
create a new classmethod)

I assume that b) is what you are actually asking about since you
reference rdfMultiple

So, how best to do that?

I don't know if you are working with your own schema or one already in
existence.
If it is of your own design, you must ask yourself if it is necessarily
in some order, perhaps it should be in either an RDF:List or a RDF:Seq,
both of which maintain order and are returned in order.

If you decide that is not appropriate (or the predicate is one already
designed so you have no choice) then your best bet is probably to create
a custom descriptor. Descriptors are a more advanced python concept but
try the following.

Most of what you want is probably in rdfMultiple.
Look around:
http://www.openvest.com/trac/browser/rdfalchemy/trunk/rdfalchemy/descriptors.py?rev=108#L159

Subclass rdfMultiple and modify the __get__ method. What is does is:
1. check to see if it's already cached ( obj.__dict__[self.name])
2. if not get the values from the db. That is the line that reads:
val=[o for o in obj.db.objects(obj.resUri, self.pred)]

val then has the values (sort of) that will be returned. If you know
you are NOT dealing with a list or container (if you are, then you
should upgrade to the descriptors designed for them) you could delete
the next couple of lines which fetch their vals).

After that is the line which makes the return be a friendlier sort (e.g.
a python int or a rdfSubject descendant) :
val=[(isinstance(v, (BNode,URIRef)) and self.range_class(v) or
v.toPython()) for v in val]

Right after that line, simply insert a sort:
val.sort(sortfunction_or_lambda)

Sorry for the long description, It's just one line but that's where it
goes. You could also just subclass your descriptor from rdfMultiple and
call super(rdfMultiple,self).__get__(...... and then sort the return in
your new __get__

Not sure about your exact application, so hope this helps.

--
Phil


Bruce D'Arcus

unread,
Apr 6, 2008, 6:03:55 PM4/6/08
to rdfalchemy-dev
On Apr 6, 4:11 pm, Philip Cooper <philip.coo...@openvest.com> wrote:
> Bruce D'Arcus at about 4/6/08 11:21 AM said:> Bear with me, newbie question:
>
> > I want to define a model with an attribute that returns its contents
> > sorted on a particular attribute. How best to do that?
>
> Not sure if you are talking about returning
> a) a list of somethings (i.e. from SomeClass.filter_by(this="that"))
> or:
> b) a list of things returned in an attribute call (e.g.
> someInstance.non_unique_attribute)

The latter.

> a) would require your SomeClass to def the filter_by classmethod (or
> create a new classmethod)
>
> I assume that b) is what you are actually asking about since you
> reference rdfMultiple
>
> So, how best to do that?
>
> I don't know if you are working with your own schema or one already in
> existence.
> If it is of your own design, you must ask yourself if it is necessarily
> in some order, perhaps it should be in either an RDF:List or a RDF:Seq,
> both of which maintain order and are returned in order.

The use case I'm talking about is this one:

class Person(Agent):
rdf_type = FOAF.Person
interests = rdfMultiple(FOAF.interest)

.. where I want to be able to get a list of interest labels returned
in alpha order, where that related label is defined in SKOS.

Aside: more broadly, I'm also modeling a new bibliographic ontology
that I am helping to design, where contributors (authors, etc.) are
typically understood as ordered properties. For a variety of reasons,
though--including a common suggestion I hear from various RDF experts
I trust not to use rdf:Seq or rdf:List--we're just modeling it as one
would in a relational database, where position in a list is (if
needed) explicitly encoded; e.g.:

<http://ex.net/1> a bibo:Book ;
bibo:contribution [
bibo:position "1" ;
bibo:role bibo_roles:author ;
bibo:contributor [foaf:name "Jane Doe"
] .

Yes, I know, not ideal, and I'm not thrilled with it. But it *is*
flexible, and it seems the best among not good options :-(

So if I go farther with this, I'll probably need to adopt the solution
for the FOAF case to this as well.

Thanks for your explanation. I'll see if I can work it out.

Bruce

Philip Cooper

unread,
Apr 6, 2008, 7:39:08 PM4/6/08
to rdfalch...@googlegroups.com
Bruce D'Arcus at about 4/6/08 4:03 PM said:
> On Apr 6, 4:11 pm, Philip Cooper <philip.coo...@openvest.com> wrote:
>
>>
>> I don't know if you are working with your own schema or one already in
>> existence.
>> If it is of your own design, you must ask yourself if it is necessarily
>> in some order, perhaps it should be in either an RDF:List or a RDF:Seq,
>> both of which maintain order and are returned in order.
>>
>
> The use case I'm talking about is this one:
>
> class Person(Agent):
> rdf_type = FOAF.Person
> interests = rdfMultiple(FOAF.interest)
>
> .. where I want to be able to get a list of interest labels returned
> in alpha order, where that related label is defined in SKOS.
>

The hack I described should work. Subclass rdfMultiple and place the
one line sort as descried.


> Aside: more broadly, I'm also modeling a new bibliographic ontology
> that I am helping to design, where contributors (authors, etc.) are
> typically understood as ordered properties.

I recently did a biblio site and I was going to point you to the
schema's I used. But then saw that the schema at
http://purl.org/net/biblio (which I used) was authored by .... well by
you. So I guess you're up to speed on it.

I have working code of a rdfalchemy/pylons/genshi biblio site. Contact
me directly for some pointers. I can probably opensource most of the
code. I set it up to handle an import of a Zotero rdf export (flawed
rdf that took some hacks to work)

> For a variety of reasons,
> though--including a common suggestion I hear from various RDF experts
> I trust not to use rdf:Seq or rdf:List--we're just modeling it as one
> would in a relational database, where position in a list is (if
> needed) explicitly encoded; e.g.:
>
> <http://ex.net/1> a bibo:Book ;
> bibo:contribution [
> bibo:position "1" ;
> bibo:role bibo_roles:author ;
> bibo:contributor [foaf:name "Jane Doe"
> ] .
>

Wow, Now sure I agree with those experts. Seriously, if it's a listing
of authors, why not use rdf:Seq.
If you were warned against it, it was probably because it's a real pain
to put all of those "extra" bnodes and triples into the db. I agree you
would NOT want to do it with rdflib. (three triples per list item added)

Rdfalchemy exists to abstract away those headaches. You can say things
like:

book.authors = [Person(first="Billy", last="Williams"),
Person(first="Robby",last="Robinson")]

Please please do not use your model above. They are not returned in
order as they would be for a list or seq. You would have to sort them
(or use a custom accessor as you originally asked about).

It doesn't matter if you never use rdfalchemy again. If you "roll your
own" to get a orderd list, No one else will be able to handle the info
without extra effort or coding. SPARQL will not be able to use common
logic or extensions to search.

Users who rely on standard search, logic and presentation algorithms
will all be confounded. The Semantic web community will weep (and then
go out and create yet another biblio ontology).

.......OOPS.....sorry about the rant. Just my $.02 worth.

--
Phil

FYI
Some of the base code I statred with::


BIB = Namespace('http://purl.org/net/biblio#')
DC = Namespace('http://purl.org/dc/elements/1.1/')
DCTERMS = Namespace('http://purl.org/dc/terms/')
FOAF = Namespace('http://xmlns.com/foaf/0.1/')
PRISM = Namespace('http://prismstandard.org/namespaces/1.2/basic/')
VCARD = Namespace('http://nwalsh.com/rdf/vCard#')
Z = Namespace('http://www.zotero.org/namespaces/export#')


class BibAuthor(rdfSubject):
rdf_type = FOAF.Agent
firstName = rdfSingle(FOAF.givenname)
lastName = rdfSingle(FOAF.surname)
fn = rdfSingle(FOAF.name)

@property
def namefl(self):
if self.lastName and self.firstName:
return self.firstName+" "+self.lastName
else:
return self.fn

@property
def namelf(self):
if self.lastName and self.firstName:
return self.lastName+", "+self.firstName
else:
return self.fn
name = namelf


@classmethod
def ClassInstances (cls):

for s,o in cls.db.subject_objects(Z.itemType):
if o <> 'attachment':
yield cls(s)


class BibItem(rdfSubject):
rdf_type = BIB.BibItem
title = rdfSingle( DC['title'] )
abstract = rdfSingle( DCTERMS['abstract'])
annotation = rdfSingle( BIB['note'])
attachment = rdfSingle( Z.attachment)
collectedBy= rdfSingle( KA['collectedBy'])
dcdate = rdfSingle( DC['date'])
grouping = rdfSingle( KA.grouping, range_type=KA.Grouping)
identifier = rdfSingle( DC['identifier'])
pages = rdfSingle( BIB['pages'])
publisher = rdfSingle( DC['publisher'], range_type=FOAF.Agent)
ztype = rdfSingle( Z.itemType )
authors = rdfMultiple( BIB.authors, range_type=FOAF.Agent)
editors = rdfMultiple( BIB.editors, range_type=FOAF.Agent)
contributors= rdfMultiple( BIB.contributors, range_type=FOAF.Agent)
subs = rdfMultiple( DC['subject'])

@property
def isbn(self):
"""get the isbn if it exists"""
if self.identifier and str(self.identifier).startswith('ISBN '):
return self.identifier.split()[-1]
elif self.resUri.startswith('urn:isbn:'):
return self.resUri.split(':')[-1]

@property
def date(self):
if self.dcdate:
return self.dcdate
try:
y, m, d = map(int,self[DCTERMS.dateSubmitted].split('
')[0].split('-'))
return datetime.datetime(y,m,d).strftime("%B %e, %Y")
except:
pass

@classmethod
def fetch_by(cls, **kwargs):
"""drop-in replacement for get_by"""
try:
if 'isbn' in kwargs:
return cls.get_by(identifier="ISBN %s" % kwargs['isbn'])
return cls.get_by(**kwargs)
except LookupError, msg:
raise LookupError("%s Not Found\nI even tried ISBN" % kwargs)

@classmethod
def ClassInstances (cls):
#if not cls.__dict__.get('type2class'):
# mapBase(cls)
for s,o in cls.db.subject_objects(Z.itemType):
if o <> 'attachment':
#realclass =
cls.type2class.get(str(cls.db.value(s,rdf.type)),cls)
#yield realclass(s)
yield cls(s)

class Journal(rdfSubject):
rdf_type = BIB.Journal
title = rdfSingle( DC['title'] )
issue = rdfSingle( PRISM.number )
volume = rdfSingle( PRISM.volume )

class Conference(rdfSubject):
rdf_type = BIB.Conference
title = rdfSingle( DC['title'] )

class BookBibItem(BibItem):
rdf_type = BIB.Book
isbn = rdfSingle(DC.identifier)
edition = rdfSingle(PRISM.edition)

class ReportBibItem(BibItem):
rdf_type = BIB.Report

class DocumentBibItem(BibItem):
"generaic sounding class but used here for web page documents"
rdf_type = BIB.Document

class ProceedingsBibItem(BibItem):
rdf_type = BIB.Proceedings
presentedAt = rdfSingle( BIB['presentedAt'], range_type=BIB.Conference)

class ArticleBibItem(BibItem):
rdf_type = BIB.Article
isPartOf = rdfSingle( DCTERMS['isPartOf'], range_type=BIB.Journal)

Bruce D'Arcus

unread,
Apr 6, 2008, 9:15:37 PM4/6/08
to rdfalchemy-dev


On Apr 6, 7:39 pm, Philip Cooper <philip.coo...@openvest.com> wrote:

...

> I recently did a biblio site and I was going to point you to the
> schema's I used. But then saw that the schema athttp://purl.org/net/biblio(which I used) was authored by .... well by
> you. So I guess you're up to speed on it.
>
> I have working code of a rdfalchemy/pylons/genshi biblio site. Contact
> me directly for some pointers. I can probably opensource most of the
> code. I set it up to handle an import of a Zotero rdf export (flawed
> rdf that took some hacks to work)

Convenient, as this is all tied together (including Zotero, which will
be upgrading its RDF support to use the new model). I'd love if you
could indeed open up that code. In fact, as a general rule, I find far
too little useful code in the RDF world for me to look it (to learn
from), borrow, etc.
Except ... one of the reasons people cautioned against using rdf:Seq
and rdf:List is neither are supported in SPARQL per se. So how do you
answer that? If I can't query it with SPARQL, then what good is it?

> Users who rely on standard search, logic and presentation algorithms
> will all be confounded. The Semantic web community will weep (and then
> go out and create yet another biblio ontology).
>
> .......OOPS.....sorry about the rant. Just my $.02 worth.

No problem; I expected someone would feel this way, and good to see
the pull-no-punches argument.

I suppose this is a little off-topic for this list, but the short
background version is this:

We need to be able to support a wider range of contributors than just
authors: translators, editors, directors, etc., etc. It just becomes
difficult to juggle these demands.

But this isn't (yet) set in stone. If you have the time, it'd be great
if you might help me move this discussion to the ontology dev list [1]
so we could finalize this. As it happens, we're working towards
releasing a formal draft "real soon" (in part b/c Zotero needs it).

Bruce

[1] <http://groups.google.com/group/bibliographic-ontology-
specification-group>

Rajeev J Sebastian

unread,
Apr 7, 2008, 9:17:58 AM4/7/08
to rdfalch...@googlegroups.com
On Mon, Apr 7, 2008 at 6:45 AM, Bruce D'Arcus <bdarcu...@gmail.com> wrote:
> Except ... one of the reasons people cautioned against using rdf:Seq
> and rdf:List is neither are supported in SPARQL per se. So how do you
> answer that? If I can't query it with SPARQL, then what good is it?
I absolutely agree with this. I do something similiar in our ontology
for maintaining sequences. Definitely, Seq and List are useful
concepts, but no practical tools are there to query them or to extract
them from the triple store other than issuing one query per node in
the List.

Regards
Rajeev J Sebastian

Philip Cooper

unread,
Apr 7, 2008, 12:00:03 PM4/7/08
to rdfalch...@googlegroups.com
Bruce D'Arcus at about 4/6/08 7:15 PM said:
>
> Convenient, as this is all tied together (including Zotero, which will
> be upgrading its RDF support to use the new model). I'd love if you
> could indeed open up that code. In fact, as a general rule, I find far
> too little useful code in the RDF world for me to look it (to learn
> from), borrow, etc.
>
>
If there were more paying gigs out there the code would come faster.
Uptake is still slow :-(
I'll get some code together and contact you offline.

> Except ... one of the reasons people cautioned against using rdf:Seq
> and rdf:List is neither are supported in SPARQL per se. So how do you
> answer that? If I can't query it with SPARQL, then what good is it?
>
>

To answer the last question first, perhaps it's partially my fault for
not giving a decent tutorial or sample code to show how rdfalchemy makes
this issue go away. The dot notation like `bibItem.authors` returns a
list if the predicate points to multiple values, a bonde which is a
container (rdf:Seq) or a bnode which is a collection (rdf:List). How
the list is retrieved or saved is abstracted away and is easy to use,
without having to sort it yourself or do other intermediate processing.

That makes it easy to use, but only by saying that it's good because
rdfalchemy does the heavy lifting. As for SPARQL,

If the model is in Jena there is an ARQ extension that allows you to use
a `list:member` predicate. see:
http://jena.sourceforge.net/ARQ/extension.html#propertyFunctions . For
that framework, you can use SPARQL "naturally".

Beyond Jena there is Fresnel, a presentation vocabulary which similarly
defines a `fresnel:member` property. I actually have a basic Fresnel
engine that I use as part of the biblio site I referenced before which
can very efficiently use a fresnel.n3 config file for presentation (I
should release it as part of rdfalchemy but the code needs more love first).

That covers Jena and Fresnel and I think supports that in a future
release of SPARQL, we will probably see a standard emerge. You can
see a bit of the discussion here:
http://thefigtrees.net/lee/sw/sparql-faq#transitiv8

That discussion of transitive closure applies to lists (again, already
handled by my rdfList descriptor). You should be able to see that for
containers, (like rdf:Seq) you CAN get the members from a SPARQL
query...something like

select ?auth ?last ?first
where {
<urn:isbn:1928892019> bib:authors ?aseq.
?aseq ?rdfpred ?auth.
?auth foaf:surname ?last.
?auth foaf:givenname ?first.
}

...The "extra" triple of "bnode rdf:type rdf:Seq" falls off since there
is no foaf:surname.

> If you have the time, it'd be great
> if you might help me move this discussion to the ontology dev list

I'll check it out.

BTW, looks like I used bib:authors as a Seq, which is not part of your
schema. Did that cause it came from Zotero that way and it performed
the important task of keeping the authors/contributors/editors in order.

--
Phil

Reply all
Reply to author
Forward
0 new messages