Re: FW: Update to NokogiriDatastream and Solrizer

7 views
Skip to first unread message

Rick Johnson

unread,
Oct 13, 2010, 4:51:01 PM10/13/10
to Rick Johnson, matt.z...@yourmediashelf.com, hydra...@googlegroups.com, active...@googlegroups.com
Hi Matt,

   Resending now to hydra-tech and active-fedora...

   To answer your questions, the primary motivation of load_instance_from_solr is to bypass interacting with datastreams directly from Fedora, and should only be used in read-only views.  It was most useful for us in generating browse views based on relationships and metadata values (we created the browse views based on ActiveFedora, not Blacklight).  So, while only touching solr we were able to have access to all active-fedora helper methods instead of dealing with the solr symbol that active-fedora generated for us anyway (made our code a lot cleaner).  This was also pretty efficient just working with metadata datastream fields defined within model classes.

   Until now, I have not tried to add support for any NokogiriDatastreams, but I think I have a decent grasp on how it works after looking at the examples Banu brought back from Hydra Boot Camp last week.  Ideally with a NokogiriDatastream as well it does not actually parse or generate any xml (again what is retrieved and stored is never meant to be saved back).  Instead, it just stores values in memory for use in the UI.

   I have not tested any of the code yet, and because of the new wrinkle for dealing directly with hierarchical structures, it may not increase performance.  I have the code mostly written and will be testing soon.  I am essentially doing the same thing that solrize_term and solrize_node methods except the final step writes to the datastream (again memory only) instead of writing to the solr_doc.

   If all goes well, I should have some code to review soon.

Thanks,
Rick

On Wed, Oct 13, 2010 at 4:37 PM, Rick Johnson <rick.j...@nd.edu> wrote:

________________________________________
From: Matthew Zumwalt [matt.z...@yourmediashelf.com]
Sent: Wednesday, October 13, 2010 4:21 PM
To: Rick Johnson
Subject: Re: Update to NokogiriDatastream and Solrizer

Hi Rick,

Could you re-send this to either he active-fedora list or the hydra-tech list?  We need this type of conversation floating out in the open so people will know what's going on.  I will resend the info below in response:

Wasn't the motivation for load_instance_from_solr to expedite loading content into the application?  The process you described sounds substantially slower and more prone to bugs than just loading the XML with nokogiri and accessing the values using OM.

I think of Solrizer as a tool to provide to_solr behaviors so that you can transform content into solr documents.  Until now I haven't thought of it as a library that would provide from_solr behaviors.

Are you sure that it's even possible to roundtrip data between hierarchical xml and a solr document?  That's a difficult thing to navigate and might be an even more difficult to support over the long term.

Matt Zumwalt
MediaShelf, LLC
http://www.yourmediashelf.com




On Oct 13, 2010, at 2:14 PM, Rick Johnson wrote:

Hi Matt,

 I am working through adding support to load_instance_from_solr for Nokogiri datastreams.  I have figured out most of the ins and outs and am ready to start coding.  I am going to mimic the behaviour of solrize_term and solrize_node in order to populate a Nokogiri datastream object from solr.  The idea is to pass in a solr doc that contains the objects data, iterate through all mappings defined in terminology, check if the appropriate solr name exists in the doc.  Then, instead of updating the solr doc (as is the case to to_solr related methods), it calls update_indexed_attributes with the right term_pointer and value in the solr_doc.

 So, my question lies with where the code should live.  I am leaning towards putting from_solr and methods it uses in Solrizer::XML::TerminologyBasedSolrizer (especially since the solrize_node method in NokogiriDatastream does not appear to be used anymore).  Then, I would remove any implementation of from_solr from NokogiriDatastream class itself.  Make sense?

Thanks,
Rick
--
----------------------------------------------------------
Rick Johnson
Unit Manager, Digital Library Applications and Local Programming Unit
Library Information Systems
University of Notre Dame
Michiana Academic Library Consortium
Notre Dame, IN USA 46556
http://www.library.nd.edu
574-631-1086
------------------------------------------------------------


Rick Johnson

unread,
Oct 14, 2010, 11:06:32 AM10/14/10
to Rick Johnson, matt.z...@yourmediashelf.com, hydra...@googlegroups.com, active...@googlegroups.com
More thoughts...

Looking at the code more, this methodology is dependent on quite a few xpath calls which could be less efficient than just doing a Nokogiri::XML::Document.parse call once (that it is doing when loading from Fedora).  As a result, I am contemplating a hybrid solution as well where it grabs everything from Solr that makes sense, but would load Nokogiri datastreams from Fedora.  Other digital object info such as relationships would be loaded from Solr instead.   Our real slow downs were the cumulative effect of loading many objects at once and dependencies of objects. 

Rick Johnson

unread,
Oct 14, 2010, 11:29:21 AM10/14/10
to Rick Johnson, matt.z...@yourmediashelf.com, hydra...@googlegroups.com, active...@googlegroups.com
On a side note, for those interested here is a link to our site that we brought up in Sept that utilizes the loading from solr in ActiveFedora I talk about below as well as Blacklight for searching and Solrizer.  Adding the Hydra inline editing is next on our list...

Rick Johnson

unread,
Oct 14, 2010, 11:29:46 AM10/14/10
to Rick Johnson, matt.z...@yourmediashelf.com, hydra...@googlegroups.com, active...@googlegroups.com
Would be nice to include the url :)

http://inquisition.library.nd.edu

Rick Johnson

unread,
Oct 28, 2010, 2:53:38 PM10/28/10
to Rick Johnson, matt.z...@yourmediashelf.com, hydra...@googlegroups.com, active...@googlegroups.com
I am close to done with this change and am trying to finalize an integration test spec.  I am attempting to use the fixture in activefedora gem (hydrangea_fixture_mods_article1.foxml.xml) and noticed that the one in ActiveFedora's space is out of sync with the one in hydrangea and am wondering whether I should update ActiveFedora's copy to be same as hydrangea or try to use the older copy?

I stumbled onto this when I used hydrangea to load the fixture for ActiveFedora testing.

  One other note is that I am getting some "stack level too deep" errors when running the nokogiri_datastream_spec.  Is this something anyone has encountered and possibly fixed? (I am in sync with AF 1.2.4 and need to update to 1.2.6 before submitting back).

Thanks,
Rick

Matt Zumwalt

unread,
Oct 28, 2010, 4:30:44 PM10/28/10
to Rick Johnson, Rick Johnson, hydra...@googlegroups.com, active...@googlegroups.com
Rick,

Yes, please update the fixture in AF.

I haven't seen any stack level too deep errors in a while. Try pulling a fresh copy of active_fedora master and see if you get the same errors.

Rick Johnson

unread,
Oct 29, 2010, 8:32:39 AM10/29/10
to Matt Zumwalt, Rick Johnson, hydra...@googlegroups.com, active...@googlegroups.com
Sounds like a plan.  I'll let you know how it goes.

Thanks,
Rick

Rick Johnson

unread,
Oct 29, 2010, 8:54:43 AM10/29/10
to Matt Zumwalt, Rick Johnson, hydra...@googlegroups.com, active...@googlegroups.com
OK, that worked fine without errors and that eliminates anything in Solr or Fedora being incorrect.   I think have pinpointed the problem and it is related to something I am doing and will fix that...

Rick Johnson

unread,
Oct 29, 2010, 9:56:07 AM10/29/10
to Matt Zumwalt, Rick Johnson, hydra...@googlegroups.com, active...@googlegroups.com
Got it working with my changes will bring up and 1.2.6 and should have it ready sometime today...

Rick Johnson

unread,
Oct 29, 2010, 1:52:54 PM10/29/10
to Matt Zumwalt, Rick Johnson, hydra...@googlegroups.com, active...@googlegroups.com
Completed the changes and submitted a pull request...
Reply all
Reply to author
Forward
0 new messages