MarcCombiningReader and multiple holdings in Voyager

20 views
Skip to first unread message

Michael Levy

unread,
Feb 10, 2012, 12:07:44 PM2/10/12
to solrmarc-tech
I'm thinking of trying to get the call number from a Voyager export.

MarcCombiningReader looks like a great resource.

In the thread at
http://groups.google.com/group/solrmarc-tech/browse_thread/thread/cbcbda15381b31a3
Demain writes "One test case that I haven't tried yet, but which we
should probably look at, is what happens if one bib record is followed
by multiple holdings records...."

We will have such cases. Has anyone tested this?

Thanks much,
Michael

Demian Katz

unread,
Feb 10, 2012, 12:17:49 PM2/10/12
to solrma...@googlegroups.com
I haven't explicitly tested it... but if you can export a set of records that demonstrate the case, it should be very easy to figure out if it works correctly or not!

- Demian
________________________________________
From: solrma...@googlegroups.com [solrma...@googlegroups.com] On Behalf Of Michael Levy [michae...@gmail.com]
Sent: Friday, February 10, 2012 12:07 PM
To: solrmarc-tech
Subject: [solrmarc-tech] MarcCombiningReader and multiple holdings in Voyager

Thanks much,
Michael

--
You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
To post to this group, send email to solrma...@googlegroups.com.
To unsubscribe from this group, send email to solrmarc-tec...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/solrmarc-tech?hl=en.

Naomi Dushay

unread,
Feb 10, 2012, 1:21:51 PM2/10/12
to solrma...@googlegroups.com
I ended up writing a class to handle this case as well as the simpler ones:

https://github.com/solrmarc/stanford-solr-marc/blob/master/core/src/org/solrmarc/marc/CombineMultBibsMhldsReader.java

Note that this is in my own *fork* of the solrmarc code, which is freely available from github. (I desperately wanted to switch to git and to have a simpler build structure). I also had to create a way to indicate this class should be used -- let me know if you want more info.

- Naomi

Naomi Dushay

unread,
Feb 10, 2012, 1:28:22 PM2/10/12
to solrma...@googlegroups.com
oh - I should have said -- I do have tests that "prove" it works for all the variants I could think of (one bib, mult bibs, one mhld, mult mhlds, records out of order, mhld without matching bib, etc.)

https://github.com/solrmarc/stanford-solr-marc/blob/master/stanford-sw/test/src/edu/stanford/CombineMultBibsMhldsReaderTest.java

I just realized the coverage reports on Hudson** aren't showing this because of how I originally had to put the reader in core, but kept the tests in my site code.

**http://hudson.projectblacklight.org/hudson/job/stanford-solr-marc%20CORE%20code/
http://hudson.projectblacklight.org/hudson/job/stanford-solr-marc%20SITE%20code/

- Naomi

Michael Levy

unread,
Feb 10, 2012, 4:14:24 PM2/10/12
to solrmarc-tech
Great, thanks! When I get to do some testing, I'll share how it goes.

On Feb 10, 1:28 pm, Naomi Dushay <ndus...@stanford.edu> wrote:
> oh - I should have said -- I do have tests that "prove" it works for all the variants I could think of (one bib, mult bibs, one mhld, mult mhlds, records out of order, mhld without matching bib, etc.)
>
> https://github.com/solrmarc/stanford-solr-marc/blob/master/stanford-s...
>
> I just realized the coverage reports on Hudson** aren't showing this because of how I originally had to put the reader in core, but kept the tests in my site code.
>
> **http://hudson.projectblacklight.org/hudson/job/stanford-solr-marc%20C...http://hudson.projectblacklight.org/hudson/job/stanford-solr-marc%20S...
>
> - Naomi
>
> On Feb 10, 2012, at 10:21 AM, Naomi Dushay wrote:
>
>
>
>
>
>
>
> > I ended up writing  a class to handle this case as well as the simpler ones:
>
> >https://github.com/solrmarc/stanford-solr-marc/blob/master/core/src/o...
>
> > Note that this is in my own *fork* of the solrmarc code, which is freely available from github.   (I desperately wanted to switch to git and to have a simpler build  structure).   I also had to create a way to indicate this class should be used -- let me know if you want more info.
>
> > - Naomi
>
> > On Feb 10, 2012, at 9:17 AM, Demian Katz wrote:
>
> >> I haven't explicitly tested it...  but if you can export a set of records that demonstrate the case, it should be very easy to figure out if it works correctly or not!
>
> >> - Demian
> >> ________________________________________
> >> From: solrma...@googlegroups.com [solrma...@googlegroups.com] On Behalf Of Michael Levy [michaelrl...@gmail.com]
> >> Sent: Friday, February 10, 2012 12:07 PM
> >> To: solrmarc-tech
> >> Subject: [solrmarc-tech] MarcCombiningReader and multiple holdings in Voyager
>
> >> I'm thinking of trying to get the call number from a Voyager export.
>
> >> MarcCombiningReader looks like a great resource.
>
> >> In the thread at
> >>http://groups.google.com/group/solrmarc-tech/browse_thread/thread/cbc...
> >> Demain writes "One test case that I haven't tried yet, but which we
> >> should probably look at, is what happens if one bib record is followed
> >> by multiple holdings records...."
>
> >> We will have such cases. Has anyone tested this?
>
> >> Thanks much,
> >> Michael
>
> >> --
> >> You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
> >> To post to this group, send email to solrma...@googlegroups.com.
> >> To unsubscribe from this group, send email to solrmarc-tec...@googlegroups.com.
> >> For more options, visit this group athttp://groups.google.com/group/solrmarc-tech?hl=en.
>
> >> --
> >> You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
> >> To post to this group, send email to solrma...@googlegroups.com.
> >> To unsubscribe from this group, send email to solrmarc-tec...@googlegroups.com.
> >> For more options, visit this group athttp://groups.google.com/group/solrmarc-tech?hl=en.

Michael Levy

unread,
Feb 29, 2012, 9:26:05 AM2/29/12
to solrmarc-tech
I could use help with MarcCombiningReader.

Below is a part of a sample Voyager Bib record and, immediately below
that is a holdings record that relates to the bib record. I am not
sure how to configure marc.combine_records in this case. I saw an
example suggesting using .* but that resulted in an error. I have not
been able to get the records to combine; instead I get a new record,
in the example below of id 1177.

marc.combine_records = (see above)
marc.combine_records.left_field = 001
marc.combine_records.right_field = 004


=LDR  00917cam a2200277 a 4500
=001  1159
=005  20120120171027.0
=008  900503s1989\\\\gw\\\\\\\\\\\\000\0ager\c
 .... etc

=LDR  00205cx  a22000973  4500
=001  1177
=004  1159
=005  20080708155856.0
=008  9707210p\\\\8\\\4001aueng0000000
=014  1\$aUSHOM   3688
=852  0\$bstacks$hPT67.C86$iA3 1989

How should I set marc.combine_records? I'd like to get the 852 record
into the bib record. Thanks!

Demian Katz

unread,
Feb 29, 2012, 9:29:49 AM2/29/12
to solrma...@googlegroups.com
If you only need the 852 field, this is the easiest configuration:

marc.combine_records = 852


marc.combine_records.left_field = 001
marc.combine_records.right_field = 004

- Demian

> -----Original Message-----
> From: solrma...@googlegroups.com [mailto:solrma...@googlegroups.com]
> On Behalf Of Michael Levy
> Sent: Wednesday, February 29, 2012 9:26 AM
> To: solrmarc-tech
> Subject: [solrmarc-tech] Re: MarcCombiningReader and multiple holdings in
> Voyager
>

> --
> You received this message because you are subscribed to the Google Groups
> "solrmarc-tech" group.
> To post to this group, send email to solrma...@googlegroups.com.
> To unsubscribe from this group, send email to solrmarc-

> tech+uns...@googlegroups.com.

Michael Levy

unread,
Feb 29, 2012, 10:09:15 AM2/29/12
to solrmarc-tech
Demian,
Thanks.  With

marc.combine_records = 852
marc.combine_records.left_field  = 001
marc.combine_records.right_field = 004

I get:

rake solr:marc:index MARC_FILE=/var/www/html/dc/misc/Interleaved_Bib-
Holdings_records_example_2-8-12.mrc
(in /var/www/html/dc/blacklight6)
DEPRECATION WARNING: Rake tasks in vendor/plugins/blacklight/tasks and
vendor/plugins/blacklight_advanced_search/tasks are deprecated. Use
lib/tasks instead. (called from /usr/local/lib/ruby/gems/1.8/gems/
rails-2.3.11/lib/tasks/rails.rb:10)
java -Xmx512m  -Dsolr.hosturl=http://typhon.ushmm.org:8986/solr  -jar /
var/www/html/dc/blacklight6/vendor/plugins/blacklight/solr_marc/
SolrMarc.jar /var/www/html/dc/blacklight6/config/SolrMarc/
config.properties /var/www/html/dc/misc/Interleaved_Bib-
Holdings_records_example_2-8-12.mrc

 INFO [main] (MarcImporter.java:816) - Starting SolrMarc indexing.
 INFO [main] (Utils.java:191) - Opening file: /var/www/html/dc/
blacklight6/config/SolrMarc/config.properties
 INFO [main] (MarcImporter.java:749) -  Connecting to remote Solr
server at URL http://typhon.ushmm.org:8986/solr/update
Solrversion 1.4.1 using v1 binary response writer
 INFO [main] (MarcHandler.java:357) - Attempting to open data file: /
var/www/html/dc/misc/Interleaved_Bib-
Holdings_records_example_2-8-12.mrc
 INFO [main] (MarcImporter.java:308) - Added record 1 read from file:
1159
ERROR [main] (MarcImporter.java:363) - Unable to index record 1177
(record count 2) -- String index out of range: 0
java.lang.StringIndexOutOfBoundsException: String index out of range:
0
        at java.lang.String.charAt(String.java:694)
        at
org.solrmarc.index.SolrIndexer.getSubfieldDataAsSet(SolrIndexer.java:
1884)
        at
org.solrmarc.index.SolrIndexer.getFieldList(SolrIndexer.java:1338)
        at org.solrmarc.index.SolrIndexer.map(SolrIndexer.java:679)
        at org.solrmarc.marc.MarcImporter.addToIndex(MarcImporter.java:
382)
        at
org.solrmarc.marc.MarcImporter.importRecords(MarcImporter.java:304)
        at org.solrmarc.marc.MarcImporter.handleAll(MarcImporter.java:
572)
        at org.solrmarc.marc.MarcImporter.main(MarcImporter.java:832)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:
43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at com.simontuffs.onejar.Boot.run(Boot.java:334)
        at com.simontuffs.onejar.Boot.main(Boot.java:170)
ERROR [main] (MarcImporter.java:366) - ******** Halting indexing!
********

Demian Katz

unread,
Feb 29, 2012, 10:11:47 AM2/29/12
to solrma...@googlegroups.com
Have you looked at record 1177 and corresponding holdings to see if there is something weird about the 852 field(s) involved?

If you choose a different field, other than 852, from the holdings record (just as a test), do you get different results?

- Demian

> -----Original Message-----
> From: solrma...@googlegroups.com [mailto:solrma...@googlegroups.com]
> On Behalf Of Michael Levy
> Sent: Wednesday, February 29, 2012 10:09 AM
> To: solrmarc-tech
> Subject: [solrmarc-tech] Re: MarcCombiningReader and multiple holdings in
> Voyager
>

Levy, Michael

unread,
Feb 29, 2012, 12:18:44 PM2/29/12
to solrma...@googlegroups.com
I've got a sample .mrc file I'm using for testing. The first record's bib and holdings are below. I don't think there is anything funny about the 852. The error message shows up on all of the holdings.  If I change marc.combine_records to 001 or 004 I don't get an error, but I do still get new records in Solr whose id is the holdings id; they are not merged.
I tried setting combine_records to 014; I get the error for holdings that have a 014 but no error where there is no field 014 in the holdings record.

=LDR  00917cam a2200277 a 4500
=001  1159
=005  20120120171027.0
=008  900503s1989\\\\gw\\\\\\\\\\\\000\0ager\c
=020  \\$a3351015046
=035  \\$a90151817 //r90
=035  \\$aUSHOM   3688
=040  \\$aCU$cCU$dDLC
=043  \\$ae-ge---$ae-gx---
=050  00$aPT67.C86$bA3 1989
=100  1\$aCwojdrak, Günther.
=245  10$aKontrapunkt :$bTagebuch 1943-1944, neu betrachtet 1986 /$cGünther Cwojdrak.
=250  \\$a1. Aufl.
=260  \\$aBerlin :$bAufbau-Verlag,$c1989.
=300  \\$a127 p. ;$c20 cm.
=591  \\$aRecord updated by Marcive processing 20 January 2012.
=600  10$aCwojdrak, Günther$vDiaries.
=650  \0$aCritics$zGermany (East)$vDiaries.
=650  \0$aLiterature$xHistory and criticism.
=651  \0$aGermany$xHistory$y1933-1945.
=650  \0$aSoldiers$zGermany$vDiaries.
=650  \0$aWorld War, 1939-1945$vPersonal narratives, German.

Naomi Dushay

unread,
Feb 29, 2012, 1:32:16 PM2/29/12
to solrma...@googlegroups.com
Michael,

You could try the CombineMultBibsMhldsReader.   Here's what you need in the config:

- use org.solrmarc.marc.CombineMultBibsMhldsReader as the combining 
#  reader wrapper for split marc records followed by mhld records
#  e.g. :    bib1 bib2 mhld2 mhld2 bib3 bib4 bib4 mhld4 mhld4 bib5 ...
stanford.combining.reader = true

# - marc.combine_records - if a bib record with all of its holdings stuffed into
#  a bib field is too big to be exported from Sirsi as a single record, the 
Sirsi system writes the record out multiple times with each subsequent record 
#  containing a subset of the total holdings information.  When this property
#  is set, the "MarcCombiningReader" class is used to create a single marc
#  record with all of the occurrences of the given field.  That is, if the
Sirsi export created 3 bib records due to lots of items, and the item info
#  is in a 999 bib field, then setting marc.combined_records = 999 will
#  create a single marc record with ALL the 999 fields for importing into Solr.
marc.combine_records = 999

It is hardcoded to:

- look for ids to match in 001
- if it finds MHLDS,  it takes the following fields from them:   852|853|863|866|867|868


- Naomi


--
You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
To post to this group, send email to solrma...@googlegroups.com.
To unsubscribe from this group, send email to solrmarc-tec...@googlegroups.com.

Demian Katz

unread,
Feb 29, 2012, 2:24:52 PM2/29/12
to solrma...@googlegroups.com

If you’re still having trouble after trying Naomi’s suggestion, perhaps you could share a binary MARC file of a couple records so we can try to reproduce the problem.

 

thanks,
Demian

 

--

You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
To post to this group, send email to solrma...@googlegroups.com.

To unsubscribe from this group, send email to solrmarc-tec...@googlegroups.com.

Levy, Michael

unread,
Feb 29, 2012, 4:42:16 PM2/29/12
to solrma...@googlegroups.com
I'm attaching a small mrc file to this email that includes 2 bib and 2 holdings records that are exhibiting the problem.

Naomi I'd love to try your  CombineMultBibsMhldsReader. Is there a compiled SolrMarc.jar or replacement or do I need to compile your project from source at https://github.com/solrmarc/stanford-solr-marc  ?
--

Michael R. Levy   Director, Digital Collections    202.488.6132
United States Holocaust Memorial Museum
www.ushmm.org



TwoInterleavedVoyagerRecords.mrc

Naomi Dushay

unread,
Feb 29, 2012, 5:41:46 PM2/29/12
to solrma...@googlegroups.com
Hi Michael,

wow - browsing SVN at code.google.com/p/solrmarc   it appears that nearly all the stanford code is missing, and I also don't see that class.  

But, you should be able to do this:

cd stanford-solr-marc
ant dist_site

The dist directory will have everything you need (not in a single jar). 

To run, take a look at stanford-sw/scripts/build-test-index.sh.   It is hardcoded, but setting up the classpath and figuring out the command line is in there.

- Naomi





<TwoInterleavedVoyagerRecords.mrc>

Naomi Dushay

unread,
Feb 29, 2012, 5:44:38 PM2/29/12
to solrma...@googlegroups.com
whoops - michael -- that will index with the stanford specific code … but maybe that's ok for the test?   I was pretty sure I pushed that code up to googlecode, and it worries me that a bunch of the stanfordBlacklight stuff is missing.

On Feb 29, 2012, at 1:42 PM, Levy, Michael wrote:

<TwoInterleavedVoyagerRecords.mrc>

Demian Katz

unread,
Mar 1, 2012, 9:50:44 AM3/1/12
to solrma...@googlegroups.com

I imported this file into my VuFind instance using the settings we discussed earlier (and the most recent official SolrMarc release):

 

marc.combine_records = 852

marc.combine_records.left_field = 001

marc.combine_records.right_field = 004

 

Everything worked correctly.

 

Perhaps you need to look at your index.properties file – I’m beginning to suspect that the record merge is working correctly, but some data extraction (perhaps in your 852-related definition) is causing the error.  If you comment out index.properties lines related to 852 and reindex, does the error go away?

 

- Demian

Robert Haschart

unread,
Mar 1, 2012, 11:11:44 AM3/1/12
to solrma...@googlegroups.com
I did a test similar to Demian's, and got the same result.  Using a config with the marc.combine settings listed below, I read in the file you attached to a previous message, and had the program simply printout the combined record after it read it in.   The following is the results that were printed:

(I also tried changing    marc.combine_records=852|014 which included both the 852 field and the 014 field from the holdings info in the combined record.)

LEADER 00917cam a2200277 a 4500
001 1159
005 20120120171027.0

008 900503s1989    gw            000 0ager c
020   $a3351015046
035   $a90151817 //r90
035   $aUSHOM   3688
040   $aCU$cCU$dDLC
043   $ae-ge---$ae-gx---
050 00$aPT67.C86$bA3 1989
100 1 $aCwojdrak, Günther.
245 10$aKontrapunkt :$bTagebuch 1943-1944, neu betrachtet 1986 /$cGünther Cwojdrak.
250   $a1. Aufl.
260   $aBerlin :$bAufbau-Verlag,$c1989.
300   $a127 p. ;$c20 cm.
591   $aRecord updated by Marcive processing 20 January 2012.
600 10$aCwojdrak, Günther$vDiaries.
650  0$aCritics$zGermany (East)$vDiaries.
650  0$aLiterature$xHistory and criticism.
651  0$aGermany$xHistory$y1933-1945.
650  0$aSoldiers$zGermany$vDiaries.
650  0$aWorld War, 1939-1945$vPersonal narratives, German.
852 0 $bstacks$hPT67.C86$iA3 1989

LEADER 00958cam a2200265 i 4500
001 2082
005 20120120171124.0
008 760916s1976    nyua     b    001 0 eng 
010   $a75030498 //r964
020   $a0805511628
040   $aDLC$cDLC$dDLC
043   $ae-un---
050 00$aDS135.R93$bU334 1976
090   $aDS135.U4$bF75 1976
100 1 $aFriedman, Saul S.,$d1937-
245 10$aPogromchik :$bthe assassination of Simon Petlura /$cSaul S. Friedman.
260   $aNew York :$bHart Pub. Co.,$cc1976.
300   $axiv, 414 p. :$bill. ;$c21 cm.
504   $aIncludes bibliographical references and index.
591   $aRecord updated by Marcive processing 20 January 2012.
650  0$aJews$xPersecutions$zUkraine.
600 10$aPetli︠u︡ra, Symon Vasylʹovych,$d1879-1926$xAssassination.
600 10$aShṿartsbard, Shalom,$d1886-1938.
600 10$aShṿartsbard, Shalom,$d1886-1938$xTrials, litigation, etc.
650  0$aAntisemitism$zUkraine.
852 0 $bstacks$hDS135.U4$iF75 1976

Levy, Michael

unread,
Mar 1, 2012, 11:32:31 AM3/1/12
to solrma...@googlegroups.com
Thank you both so much for doing these tests; I'm very grateful. My issue may have to do with Solr or SolrMarc version or some other configuration issue. I'll use the latest everything and get back to the list.


Demian Katz

unread,
Mar 1, 2012, 11:35:15 AM3/1/12
to solrma...@googlegroups.com

It’s definitely worth inspecting your import mappings before going to too much trouble…  sometimes a missing subfield designator leads to problems that look similar to yours…

 

i.e.

 

holdings = 852

 

vs.

 

holdings = 852abcdefg

 

(the first will fail, the second will work)

 

- Demian

 

From: solrma...@googlegroups.com [mailto:solrma...@googlegroups.com] On Behalf Of Levy, Michael
Sent: Thursday, March 01, 2012 11:33 AM
To: solrma...@googlegroups.com
Subject: Re: [solrmarc-tech] Re: MarcCombiningReader and multiple holdings in Voyager

 

Thank you both so much for doing these tests; I'm very grateful. My issue may have to do with Solr or SolrMarc version or some other configuration issue. I'll use the latest everything and get back to the list.

 

 

--

Reply all
Reply to author
Forward
0 new messages