I am curious as to efforts beyond SolrMarc folks are using to get marc
data into Solr. I'm wondering if we can leverage each
others' (other's?) knowledge and work.
Could folks please reply to solrma...@googlegroups.com with non-
SolrMarc efforts they are aware of, or of "non-standard" uses of
SolrMarc (anything beyond writing local customizations that don't
affect the core code)
I know of:
UWisc - Stephen Meyer - very local modified, stripped down SolrMarc.
Much zippier. Raw marc stored in reldb and retrieved from same at
display time.
UMich - Bill Deuber - working on multithreaded processing, and using
jruby and some ruby wrappers.
I'm also interested in figuring out how to do reasonable performance
testing of indexing as painlessly as possible.
- Naomi
Jonathan
--
You received this message because you are subscribed to the Google Groups "Blacklight Development" group.
To post to this group, send email to blacklight-...@googlegroups.com.
To unsubscribe from this group, send email to blacklight-develo...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/blacklight-development?hl=en.
* https://issues.apache.org/jira/browse/SOLR-1543
* https://issues.apache.org/jira/browse/SOLR-1711
I have not tried recently with a nightly version of this SolrJ client. I
have yet to resolve the problem myself, but plan on returning to it in
the future.
-Steve
Bill Dueber wrote:
> The latest update is that...well, I'm having threading problems. Running
> with a single thread the jruby code is roughly 90% as fast as Solrmarc
> (which some of us would accept just to be able to work in Ruby). I'm
> going to tackle trying to get a better understanding of Ruby threads and
> how they interact with / are implemented by Java native threads later
> this week.
>
> On Mon, Mar 1, 2010 at 12:47 PM, Jonathan Rochkind <roch...@jhu.edu
> <mailto:roch...@jhu.edu>> wrote:
>
> I am pretty optimistic about Bill's approach, and at some point hope
> to find the time to flesh it out into something as flexible and
> easy-to-setup as SolrMarc is, while still remaining simple and
> elegant code.
>
> Jonathan
>
>
> Naomi Dushay wrote:
>
> Hi folks,
>
> I am curious as to efforts beyond SolrMarc folks are using to
> get marc data into Solr. I'm wondering if we can leverage each
> others' (other's?) knowledge and work.
>
> Could folks please reply to solrma...@googlegroups.com
> <mailto:solrma...@googlegroups.com> with non- SolrMarc
> efforts they are aware of, or of "non-standard" uses of
> SolrMarc (anything beyond writing local customizations that
> don't affect the core code)
>
> I know of:
>
> UWisc - Stephen Meyer - very local modified, stripped down
> SolrMarc. Much zippier. Raw marc stored in reldb and
> retrieved from same at display time.
> UMich - Bill Deuber - working on multithreaded processing, and
> using jruby and some ruby wrappers.
>
>
> I'm also interested in figuring out how to do reasonable
> performance testing of indexing as painlessly as possible.
>
>
> - Naomi
>
>
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "Blacklight Development" group.
> To post to this group, send email to
> blacklight-...@googlegroups.com
> <mailto:blacklight-...@googlegroups.com>.
> To unsubscribe from this group, send email to
> blacklight-develo...@googlegroups.com
> <mailto:blacklight-development%2Bunsu...@googlegroups.com>.
> For more options, visit this group at
> http://groups.google.com/group/blacklight-development?hl=en.
>
>
>
>
> --
> Bill Dueber
> Library Systems Programmer
> University of Michigan Library
>
> --
> You received this message because you are subscribed to the Google
> Groups "Blacklight Development" group.
> To post to this group, send email to
> blacklight-...@googlegroups.com.
> To unsubscribe from this group, send email to
> blacklight-develo...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/blacklight-development?hl=en.
--
Stephen Meyer
Library Application Developer
UW-Madison Libraries
312F Memorial Library
728 State St.
Madison, WI 53706
sme...@library.wisc.edu
608-265-2844 (ph)
"Just don't let the human factor fail to be a factor at all."
- Andrew Bird, "Tables and Chairs"
--------
Thought I'd mention the great work that Mike Perham is doing with
rsolr-async... This is a Ruby 1.9 connection driver that uses
EventMachine (and Fibers) to concurrently send updates to Solr via
http:
http://github.com/mwmitchell/rsolr-async
Combine this with the Ruby marc/enhanced gem and I'd imagine you'd
have a nice little combo happening.
-------
Here's an example of mapping with ruby marc and rsolr-direct:
http://github.com/mwmitchell/sifter/tree/master/example/
Seems from talking to Erik though, that using JSolr (under java or
jruby) is going to be better performance than HTTP calls no matter how
you do the HTTP calls. Plus there are escaping issues with trying to
put marc binary in solr via an XML HTTP call, that are taken care of
with JSolr.
But, so many options! I'm starting to get choice-fatigue in my indexing
attempts.
Jonathan
--
You received this message because you are subscribed to the Google Groups "Blacklight Development" group.
To post to this group, send email to blacklight-...@googlegroups.com.
To unsubscribe from this group, send email to blacklight-develo...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/blacklight-development?hl=en.
http://sdg.library.wisc.edu/blog/2010/03/03/solr-marc-indexing-based-on-diffs/
Check out the diagram since that is intended to provide a quick overview
of how we are getting away with doing one of the the most expensive
things (MARC parsing, as Bill points out) as few times as possible. We
only do MARC indexing/processing 50k times for the adds and updates
rather than the 8M times for our whole MARC record set.
-sm
> <mailto:ndus...@stanford.edu>> wrote:
>
>
> Hi folks,
>
> I am curious as to efforts beyond SolrMarc folks are using
> to get marc data into Solr. I'm wondering if we can
> leverage each others' (other's?) knowledge and work.
>
> Could folks please reply to solrma...@googlegroups.com
> <mailto:solrma...@googlegroups.com> with non-
> SolrMarc efforts they are aware of, or of "non-standard"
> uses of SolrMarc (anything beyond writing local
> customizations that don't affect the core code)
>
> I know of:
>
> UWisc - Stephen Meyer - very local modified, stripped down
> SolrMarc. Much zippier. Raw marc stored in reldb and
> retrieved from same at display time.
> UMich - Bill Deuber - working on multithreaded processing,
> and using jruby and some ruby wrappers.
>
> I'm also interested in figuring out how to do reasonable
> performance testing of indexing as painlessly as possible.
>
> - Naomi
>
>
>
>
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "Blacklight Development" group.
> To post to this group, send email to
> blacklight-...@googlegroups.com
> <mailto:blacklight-...@googlegroups.com>.
> To unsubscribe from this group, send email to
> blacklight-develo...@googlegroups.com
> <mailto:blacklight-development%2Bunsu...@googlegroups.com>.
> For more options, visit this group at
> http://groups.google.com/group/blacklight-development?hl=en.
>
>
>
>
> --
> Bill Dueber
> Library Systems Programmer
> University of Michigan Library
>
> --
> You received this message because you are subscribed to the Google
> Groups "Blacklight Development" group.
> To post to this group, send email to
> blacklight-...@googlegroups.com.
> To unsubscribe from this group, send email to
> blacklight-develo...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/blacklight-development?hl=en.
--
Could be that you have faster hardware. Could be that my logic rules are even more complicated (I do use a lot of .bsh scripts with SolrMarc; which shoudln't in itself be a problem, assuming SolrMarc compiles beanshell once, and not recompiles on every record!).
We're both storing full marc in the record, right? So it wasn't that.
Odd. Maybe just your hardware is a lot faster.
________________________________________
From: blacklight-...@googlegroups.com [blacklight-...@googlegroups.com] On Behalf Of Bill Dueber [bi...@dueber.com]
Sent: Wednesday, March 03, 2010 6:15 PM
To: blacklight-...@googlegroups.com
Subject: Re: [Blacklight-development] Re: marc --> solr, and performance testing
Thanks for running these, Bob! I'm a little out-of-date -- what mechanism are you using to push them to solr these days? I'll do a similar check with jruby/StreamingUpdateSolrServer and post results here.
On Wed, Mar 3, 2010 at 4:47 PM, Robert Haschart <rh...@virginia.edu<mailto:rh...@virginia.edu>> wrote:
In some initial tests of benchmarking for SolrMarc running on a 25000 record marc file produced the following timings:
simply reading the marc records, translating them from marc8 to utf-8 encoding, and writing them out in text format to /dev/null: 12 secs
reading the records, translating them, processing them via UVA rather complex indexing specification, NOT sending them to solr: 46 secs
reading the records, translating them, creating the index records, and sending them to solr to an empty index : 1 min 44 secs
reading the records, translating them, creating the index records, and sending them to solr to an index w/ ~4M records : 2 min 4 secs
So pulling all the crap we want out of MARC is expensive, but the push to solr is not just noise.
-Bob Haschart
Bill Dueber wrote:
I'd encourage everyone to not focus too too too much on the speed of sending stuff to solr until you do some benchmarking. Pulling all the crap we want out of MARC is expensive, and the push to solr may be just noise depending on how you're doing it.
In generaly, of course, we want fast pushes to Solr, but for doing what SolrMarc does, I don't think that's the bottleneck anymore.
On Wed, Mar 3, 2010 at 1:46 PM, Jonathan Rochkind <roch...@jhu.edu<mailto:roch...@jhu.edu>> wrote:
I don't particularly want to make the jump to ruby 1.9 for my Blacklight rails app yet myself (too many useful gems not 1.9 yet, 1.9 in general still being 'beta'; in some cases it can be hard to write code that works both under 1.8 and 1.9) -- but running just a separate indexer process under ruby 1.9 might be more feasible, if there's a good reason to, like Mike Perham's code.
Seems from talking to Erik though, that using JSolr (under java or jruby) is going to be better performance than HTTP calls no matter how you do the HTTP calls. Plus there are escaping issues with trying to put marc binary in solr via an XML HTTP call, that are taken care of with JSolr.
But, so many options! I'm starting to get choice-fatigue in my indexing attempts.
Jonathan
matt mitchell wrote:
I didn't realize this was on the blacklight list -- on solrmarc too.
Just gonna copy what I put over there...
--------
Thought I'd mention the great work that Mike Perham is doing with
rsolr-async... This is a Ruby 1.9 connection driver that uses
EventMachine (and Fibers) to concurrently send updates to Solr via
http:
http://github.com/mwmitchell/rsolr-async
Combine this with the Ruby marc/enhanced gem and I'd imagine you'd
have a nice little combo happening.
-------
Here's an example of mapping with ruby marc and rsolr-direct:
http://github.com/mwmitchell/sifter/tree/master/example/
On Feb 26, 2:05 pm, Naomi Dushay <ndus...@stanford.edu<mailto:ndus...@stanford.edu>> wrote:
Hi folks,
I am curious as to efforts beyond SolrMarc folks are using to get marc data into Solr. I'm wondering if we can leverage each others' (other's?) knowledge and work.
Could folks please reply to solrma...@googlegroups.com<mailto:solrma...@googlegroups.com> with non-
SolrMarc efforts they are aware of, or of "non-standard" uses of SolrMarc (anything beyond writing local customizations that don't affect the core code)
I know of:
UWisc - Stephen Meyer - very local modified, stripped down SolrMarc. Much zippier. Raw marc stored in reldb and retrieved from same at display time.
UMich - Bill Deuber - working on multithreaded processing, and using jruby and some ruby wrappers.
I'm also interested in figuring out how to do reasonable performance testing of indexing as painlessly as possible.
- Naomi
--
You received this message because you are subscribed to the Google Groups "Blacklight Development" group.
To post to this group, send email to blacklight-...@googlegroups.com<mailto:blacklight-...@googlegroups.com>.
To unsubscribe from this group, send email to blacklight-develo...@googlegroups.com<mailto:blacklight-development%2Bunsu...@googlegroups.com>.
For more options, visit this group at http://groups.google.com/group/blacklight-development?hl=en.
--
Bill Dueber
Library Systems Programmer
University of Michigan Library
--
You received this message because you are subscribed to the Google Groups "Blacklight Development" group.
To post to this group, send email to blacklight-...@googlegroups.com<mailto:blacklight-...@googlegroups.com>.
To unsubscribe from this group, send email to blacklight-develo...@googlegroups.com<mailto:blacklight-develo...@googlegroups.com>.
For more options, visit this group at http://groups.google.com/group/blacklight-development?hl=en.
--
You received this message because you are subscribed to the Google Groups "Blacklight Development" group.
To post to this group, send email to blacklight-...@googlegroups.com<mailto:blacklight-...@googlegroups.com>.
To unsubscribe from this group, send email to blacklight-develo...@googlegroups.com<mailto:blacklight-development%2Bunsu...@googlegroups.com>.
Beanshell in Java is not Java -- it will still only be as fast as the
Beanshell interpreter (just like JRuby and all of the other
dynamically typed JVM languages):
http://ikayzo.org/confluence/pages/viewpage.action?pageId=16
So it's quite likely that at least *some* of your performance
differential is .bsh-related (which needs to be offset with the
development time of doing what you're doing in beanshell in Java).
-Ross.