Reconciling application id's and fedora pid's

19 views
Skip to first unread message

onelson

unread,
May 6, 2010, 10:28:27 AM5/6/10
to Fedora Commons Create
I'm wondering about strategies for relating fedora pid's and models in
an application.

It seems to me that maintaining a table of "fedora objects" would be
beneficial from a number of viewpoints, but it also seems to be a
duplicated effort. I've not yet peeked at what fedora itself stores
in its database, so perhaps that's the place to start.
I've also noted the altId element in the DC record and wonder if
that's an appropriate place to stash an application specific id
(something along the lines of appname_modelname_int). If I were to go
that route, I'm not sure there would be an efficient way to "get" an
object based on altId, since I'm not even sure fedora enforces unique
values for them... could be a data integrity mess.

What strategies have you guys employed, and how were they effective?

--
You received this message because you are subscribed to the Google Groups "Fedora Commons Create" group.
To post to this group, send email to fedora-com...@googlegroups.com.
To unsubscribe from this group, send email to fedora-commons-c...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/fedora-commons-create?hl=en.

onelson

unread,
May 6, 2010, 12:30:36 PM5/6/10
to Fedora Commons Create
I suppose, another option to explore is to simply not attempt to
reconcile a single thing, and to simply outsource *all* of the object
management to fedora.
The main advantage in this case, being having a completely client-
agnostic repository (which is ideal). Secondary gains would be in the
fact that there would be zero "syncing" responsibility for the
application, which would simply (and continually) refer to the
authority of the repo itself.
I'm a little concerned about going this route since it could
potentially lead to overworking the repository with relationship-
probing queries (for example, looking up content contributed by
application user FOO, or belonging to collection BAR). Perhaps
there's a balance to be had in terms of caching this kind of
information on the application side?

Matthew Zumwalt

unread,
May 6, 2010, 1:06:47 PM5/6/10
to fedora-com...@googlegroups.com
IMO this is the right approach.  Use the Fedora PIDs as your unique identifiers across all client applications.  If you need broader assurances of uniqueness, you could look at Ben O'Steen's work on UUIDs & Fedora.

Most people index their metadata in Solr and run searches against that.  This solves your problem of ensuring that queries are handled by software that's optimized for searching.  Of course, then you will have to get the metadata into Solr and ensure that Solr is kept up to date.  This blog post might provide some helpful info in that area.

You could also put the relationship info into the Resource Index (Fedora's bundled triplestore) and use SPARQL or iTQL to query against that.  By default, Fedora will index any triples that you assert in the RELS-EXT datastream, putting them into the Resource Index (if the Resource Index is turned on).  

Matt Zumwalt
MediaShelf, LLC



Owen Nelson

unread,
May 6, 2010, 1:33:48 PM5/6/10
to fedora-com...@googlegroups.com
Matt, you've written a really excellent article there. It really gets
to the heart of the problem. Since Solr's query language is so
comprehensive, you're right -- as long as the index is looked after, it
should be able to feed me all the info I need on my application.

With this strategy, the only times I should have to interact with the
repo directly are when I'm:
* requesting a specific asset
* ingesting and purging

In a related matter - what about object permissions? Do you recommend
handling that entirely with fedora policy? That would seem a bit of a
burden, although I admit I was impressed by the way islandora deals with
it (adding a filter to permit fedora to lookup user roles from the
client application db).

Matthew Zumwalt wrote:
> IMO this is the right approach. Use the Fedora PIDs as your unique
> identifiers across all client applications. If you need broader
> assurances of uniqueness, you could look at Ben O'Steen's work
> on UUIDs & Fedora
> <http://oxfordrepo.blogspot.com/2008/01/conclusions-on-uuids-and-local-ids-in.html>.
>
>
> Most people index their metadata in Solr and run searches against
> that. This solves your problem of ensuring that queries are handled
> by software that's optimized for searching. Of course, then you will
> have to get the metadata into Solr and ensure that Solr is kept up to
> date. This blog post
> <http://yourmediashelf.com/blog/2010/03/01/blacklight-activefedora-and-shelver-interplay-between-searching-managing-and-indexing-in-a-repository-solution/> might
> provide some helpful info in that area.
>
> You could also put the relationship info into the Resource Index
> (Fedora's bundled triplestore) and use SPARQL or iTQL to query against
> that. By default, Fedora will index any triples that you assert in
> the RELS-EXT datastream, putting them into the Resource Index (if the
> Resource Index is turned on).
>
> Matt Zumwalt
> MediaShelf, LLC
> http://www.yourmediashelf.com
>

Matthew Zumwalt

unread,
May 6, 2010, 1:41:16 PM5/6/10
to fedora-com...@googlegroups.com
I'm glad you like the article.

For permissions, we are using the Hydra Rights Metadata schema <http://www.fedora-commons.org/confluence/display/hydra/Hydra+rights+metadata>, putting that into a datastream called rightsMetadata and indexing that information into Solr along with all of the other metadata.  We then enforce discover, read, and edit permissions based on that.  (ie. include the user's user id and group memberships as constraints on queries, etc.)

You could also look at using ODRL to do the same thing.  Chris Beer at WGBH reports having success with that <http://code4lib.org/files/c4l10-cbeer-media-blacklight-and-viewers-like-you.pdf>

Matt Zumwalt
MediaShelf, LLC




Reply all
Reply to author
Forward
0 new messages