[Fedora-users] Triplestore info

4 views
Skip to first unread message

Michael Della Bitta

unread,
Sep 13, 2006, 3:49:24 PM9/13/06
to fedora...@comm.nsdl.org
Hello, all,

I'm seeking a little information about the Kowari triplestore in
Fedora. Does Fedora expose the services Kowari normally hosts, such as
the SOAP services or the cli?

We want to be able to dump the triples from Fedora into Oracle 10g's
triplestore for non-OLTP purposes. While we know we can get at a full
list of all the triples, this might become unwieldily as the number of
objects in our repository grows. Currently with a small subset of the
data we're intending to ingest, dumping all triples takes about 3
minutes. Ideally we'd like to get a list of only the new triples, but
I'm not sure Kowari tracks modification/creation dates, in which case
we'd have to resort to getting only the triples for the objects which
have changed, which we already know how to do.

Thanks for any input you might be able to provide,

Michael Della Bitta

Chris Wilper

unread,
Sep 14, 2006, 10:07:35 AM9/14/06
to Michael Della Bitta, fedora...@comm.nsdl.org

Hi Michael,

Kowari runs inside Fedora in "embedded" mode, so these Kowari-specific features are not exposed.  I think your best bet would be to get the triples for the objects that have changed since the last time you checked.  This is a similar strategy to what the OAI provider service does and it works fine.

By the way, have you used 10g's RDF functionality yet?  I've given it a whirl.  The folks at Case Western have dug deeper, and actually wrote a Trippi connector for it.  So far it has been pretty disappointing in terms of update performance, but I'm no Oracle wiz...maybe my discombobulator was configured wrong.  Regardless, it may be quite practical for certain non-OLTP applications.  I'm just curious about your experience with it.

Regards,
Chris

_______________________________________________
Fedora-users mailing list
Fedora...@comm.nsdl.org
http://comm.nsdl.org/mailman/listinfo/fedora-users


Nick Fischio

unread,
Sep 14, 2006, 11:27:59 AM9/14/06
to fedora...@comm.nsdl.org
Chris,

Regarding the performance of the Oracle Trippi Connector - we took that as
far as we could with Oracle Support. I had a few good dialogues with
various support personnel from Oracle. The result was that our
implementation was as good as one could achieve with Oracle's current RDF
implementation (10g Release 2). Essentially, Oracle never designed their
RDF implementation to support OLTP. My guess is they felt most RDF data is
static and the real value comes from discovery around related or seemingly
unrelated connections. Thus, as of now our Oracle Trippi connector is as
good as it gets if someone wants to substitute Oracle for Kowari.

Nick

Nicholas Fischio
Development Manager
Kelvin Smith Library - Case Western Reserve University
216.368.3509
 
 


Michael Della Bitta

unread,
Sep 15, 2006, 1:07:07 PM9/15/06
to Chris Wilper, fedora...@comm.nsdl.org
Chris,

Thanks for your reply! No, we haven't had a chance to play with 10g's
RDF functionality. We very recently migrated from 9i, and haven't
dumped any RDF data in yet. Updating what we eventually do put into
Oracle at a Fedora object-level granularity makes a lot of sense to
me, because it'll be easier to manage deletions from RELS-EXT should
they occur.

Am I right in thinking that if we do a query to get all the PIDs that
have been updated since the last dump, then remove all the tripes in
Oracle that have those PIDs as a subject, and then dump in all the
triples from Fedora that have those PIDs as a subject, we'd accomplish
a incremental update of sorts?

Thanks for your help,

Michael Della Bitta

Chris Wilper

unread,
Sep 15, 2006, 3:50:40 PM9/15/06
to Michael Della Bitta, fedora...@comm.nsdl.org

Hi Michael,

If I understand your situation, you've got the RI disabled
and the only triplestore you're interested in populating
is the external one (Oracle)?  If that's right, the logic
you gave below should do the trick.  To be clear:

1) query the /fedora/search interface (aka "findObjects")
   for PIDs of objects that have changed
2) for each PID:
   a) call fedora/get/pid/RELS-EXT to get the outgoing triples
   b) delete all triples from your target triplestore with
      that pid (actually, the URI "info:fedora/$pid") as the
      subject.
   c) add the triples you just got from RELS-EXT

Depending on the cost of modifications to Oracle, it may be
economical to avoid more changes than absolutely necessary.
That is, compare the triples in RELS-EXT to the triples in
Oracle for a given Fedora object, and only do deletes and/or
adds of the specific triples that changed.

- Chris

Michael Della Bitta

unread,
Sep 15, 2006, 3:55:33 PM9/15/06
to Chris Wilper, fedora...@comm.nsdl.org
Hello, Chris, and thanks for your reply,

Hmm, our resourceIndex is level 1. How will this impact things?

Michael

Chris Wilper

unread,
Sep 18, 2006, 4:17:29 PM9/18/06
to Michael Della Bitta, fedora...@comm.nsdl.org

Hi Michael,

If you have the RI on, it opens up the possibility for more efficient
incremental updates (but see caveats below).  For example:

Use itql via /fedora/risearch to get all the info from
RELS-EXT for all objects that have changed since some date.
Something like the query below:

select $fedoraObject $rel $val from <#ri>
where $fedoraObject <fedora-view:lastModifiedDate> $modDate
and $modDate <tucana:after> "2006-01-01T10:22:33.001Z" in <#xsd>
and $fedoraObject $rel $val

Caveats:
1) It's dependent on the RI running and Kowari being the underlying RI
triplestore, since it uses itql.
2) You'll get back more triples than just what was in RELS-EXT.
See http://www.fedora.info/download/2.1.1/userdocs/server/resourceIndex/triples.html
for the full list.  So your app would have to filter those out if they're not desired.

In the future, we want Fedora to be able to proactively supply apps with
change notification for objects, via JMS.  So the query-for-changes
step won't always be necessary to support this sort of thing.

Jeffrey Barnett

unread,
Feb 9, 2007, 4:47:22 PM2/9/07
to Chris Wilper, fedora...@comm.nsdl.org
Chris,
I just starting to try out the new fedora-2.2-quick version, and I
wanted to look at the triple store (as Michael did), but since I'm using
the quick version, I don't know whether RI is on or off. When I use
/fedora/risearch there is no visible response in the browser, and the
following exception is logged in /fedora/logs/catalina.out:

2007-02-09 15:23:22 StandardWrapperValve[RISearchServlet]: Allocate
exception for servlet RISearchServlet
javax.servlet.ServletException: Error initting RISearchServlet.
at fedora.server.access.RISearchServlet.getWriter(RISearchServlet.java:36)
at org.trippi.server.http.TrippiServlet.init(TrippiServlet.java:223)
at javax.servlet.GenericServlet.init(GenericServlet.java:211)
...
----- Root Cause -----
java.lang.NullPointerException
at java.io.File.<init>(File.java:194)
at fedora.server.access.RISearchServlet.getWriter(RISearchServlet.java:33)
at org.trippi.server.http.TrippiServlet.init(TrippiServlet.java:223)

1) Should risearch work on fedora-2.2-quick "out of the box"?
2) What is the likely cause of the null pointer exception (and could a
more helpful response be provided)?
jeffrey.barnett.vcf

Edwin Shin

unread,
Feb 10, 2007, 3:46:59 AM2/10/07
to fedora...@comm.nsdl.org
In Fedora 2.2, the RI is off by default. You can change this in your
fedora.fcfg.

If you've already ingested objects with the RI off, you should run the
rebuilder for the existing objects to be indexed by the RI.
Reply all
Reply to author
Forward
0 new messages