I've run into an issue when migrating large data sets from an old
Fedora 2.x repository that used an early version of Islandora where
the RDF <hasModel xmlns="info:fedora/fedora-system:def/model#"
rdf:resource="info:fedora/islandora:collectionCModel"></hasModel>
statement was not used to build collections in the Islandora
repository views. Without this RDF statement in the RELS-EXT
datastreams of my collection objects (which are not named in a way
that makes them easy to identify), I cannot see these collection
objects in Drupal, nor any of their child (read: member) objects.
Adding the missing RDF statement in the collection objects will fix my
problem, however I am seeking a way to use the Fedora REST APIs to
pull out the information I need to build the appropriate RDF
statements and update the RELS-EXT datastreams on the appropriate
objects. I'm working with some Ruby libraries that will let me do what
I need to do via the Fedora REST APIs, once I acquire a list of the
unidentified collection objects.
The only way I can think of to identify these collection objects
easily is by searching the Resource Index (Mulgara) for triples of
objects in a specific namespace with a
info:fedora/fedora-system:def/relations-external#isMemberOfCollection
predicate with a subject matching one of my
info:fedora/problematic:collections. I can then use that list to sort
out the unique instances of info:fedora/problematic:collections to
generate a reasonably accurate list of the collection objects needing
to be updated.
Finally, my question is whether or not there is a more efficient way
to pull down this list of collection objects PIDs than just getting a
flat list of all triples across all namespaces and locally filtering
or matching by my parameters? Either by using some sort of pattern
matching query language on the Resource Index or via some other
mechanism that would allow me to determine which objects should be
collections without the hasModel predicate. Ideally, pattern matching
or namespace filtering in the queries to the Resource Index would help
me get the job done without locally parsing all RDF statements from my
repository demonstrating membership in a collection.
Thanks,
Jonathan M. Lane
Information Technology Coordinator
-----
AIRS: Advancing Interdisciplinary Research in Singing (SSHRC MCRI)
Department of Psychology, University of Prince Edward Island
Charlottetown, Prince Edward Island, Canada, C1A 4P3
-----
+1 (902) 566-6023—Lab
+1 (902) 940-2320—Mobile
http://www.airsplace.ca/
--
You received this message because you are subscribed to the Google Groups "islandora" group.
To post to this group, send email to isla...@googlegroups.com.
To unsubscribe from this group, send email to islandora+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/islandora?hl=en.
Indeed you are right, Alexander. The root of the problem is that the
collection objects' PIDs are unknown, where as the subject objects
PIDs are easily to access (a long list of PIDs in a specific
namespace), effectively the inverse of your scenario. This is because
the "member" objects all point to the PID of their parent
(collections), but those parents lack the proper RDF statement to
assert their collection status directly (which is causing a problem in
the Islandora scope).
I'd like to avoid pulling down a list of all RDF triples across all
namespaces, by doing something like this in risearch, represented by
pseudo-SPO: "<info:fedora/myns:*>
<info:fedora/fedora-system:def/relations-external#:isMemberOfCollection
*".
I am starting to suspect that filtering the Resource Index the way I
wish to do is only possible with some custom code.
Thanks for your response.
I am doing something similar (iTQL query with a template to give me
triple results). Thanks for the advice.
I think some of the matching functionality I'd like to be able to do
in RISearch is available in SPARQL's regex function
<http://www.w3.org/TR/rdf-sparql-query/#funcex-regex> though I am
uncertain if SPARQL support is adequate enough in Fedora-buddled
Mulgara to expect this functionality to be present.
From what I can tell, SPARQL seems to be the future of RDF query
languages, so perhaps I shouldn't rely on iTQL or SPO queries in the
future.