Cool features! I hope they're documented! :-D
And I hope we're encouraging interested parties to subscribe to this
list to hear about the latest solrmarc features!
- Naomi
On Jun 22, 2009, at 2:40 PM, Robert Haschart wrote:
> Jonathan,
>
> Rather than changing the schema to not specify multiValued, you also
> have a number of other options:
>
> change the spec for getting the title value from:
>
> title_t = 245a
>
> to :
>
> title_t = 245a, first
>
> which will get only the first occurance of a given field/subfield.
>
> or to:
>
> title_t = 245aa
>
> which will concatenate all 'a' subfields for a given 245 field and
> return them as a single entry.
>
> or even:
>
> title_t = 245aa, first
>
> which will handle instances with multiple 'a' subfields in a single
> 245 field as above, but still not die if multiple 245 fields are
> present.
>
> Lastly two features that were added to Solrmarc very recently (as in
> the demo code does not yet have it) could help your situation. One
> is that errors such as missing required field, duplicate, non-
> multiValued fields, or field with unknown (to solr) names, will
> generate an error message for that record, but allow the indexing to
> continue. Second there is a new standard "custom" index function
> called getSingleIndexEntry which could be used like this:
>
> title_t = custom, getSingleIndexEntry(245aa, true)
>
> It would process the 245aa field spec as above, and then if multiple
> entries results it would select the longest result to use for the
> title_t index entry, and if the second parameter to the function
> is true (and if marc.include_errors is enabled) it will generate a
> marc_error index entry containing the additional errorneous 245
> field data.
>
> -Bob Haschart
>
>
> Jonathan Rochkind wrote:
>
>> Cool. I'm not sure multiple 245$a's even technically _is_ "bad
>> data". It's not neccesarily AACR2, but I found out the hard way
>> recently on another project that we have LOTS of non-AACR2 data in
>> our catalog, including AACR1 data, pre-AACR data, and data
>> cataloged to rare books and manuscripts standards, none of which
>> our catalogers actually consider 'bad'.
>> So Ross suggests one answer, changing it to:
>>
>> title_t = custom, removeTrailingPunct(245a), first
>>
>> So it'll ignore the second 245. Another option might be figuring
>> out how to set the solrmarc setup to concatenate two 245$a's,
>> rather than ignore the second one, which would seem to me to be the
>> actually appropriate thing to do in this case, barring letting
>> title_t take multiple values. Is it possible to do that somehow?
>>
>> Anyone have an opinion on if either of these two things should be
>> done in the standard out-of-the-box demo setup, to accomodate this
>> kind of data?
>>
>> Jonathan
>>
>>
>>
>> Naomi Dushay wrote:
>>
>>> Jonathan,
>>>
>>> We have a TON of bad data. Lots of records with multiple 245a;
>>> lots
>>> of records with other similar problems.
>>>
>>> We use solrmarc, and it nicely steps around these problems. There
>>> are
>>> solrmarc lists;
>>>
>>> solrmarc...@googlegroups.com
>>> solrmarc-...@googlegroups.com
>>>
>>> - Naomi
>>>
>>>
>>> On Jun 22, 2009, at 1:44 PM, Jonathan Rochkind wrote:
>>>
>>>
>>>> Trying to take one baby step at a time in getting a little demo of
>>>> Blacklight with our demo, I'm trying to index a sample of our own
>>>> local MARC records.
>>>>
>>>> I get an error from "rake app:index:marc", not sure exactly why,
>>>> error below. Maybe errors in my MARC? There definitely _will_ be
>>>> illegalities in my marc, would rather the indexer recovered from
>>>> them somehow (or just skipped that record, if nothing else is
>>>> possible), rather than punted the entire input.
>>>>
>>>> Should I switch to SolrMARC instead, is it a bit more forgiving? At
>>>> one point I know I saw a pointer in the documentation to SolrMarc
>>>> docs, to help you get started with solrmarc instead of the bundled
>>>> ruby indexer... but now I can't seem to find it.
>>>>
>>>> Here's what "raek app:index:marc" is telling me (giving me an html
>>>> error message even though it's a command-line rake task, which is a
>>>> bit odd).
>>>> rake aborted!
>>>> <html>
>>>> <head>
>>>> <meta http-equiv="Content-Type" content="text/html;
>>>> charset=ISO-8859-1"/>
>>>> <title>Error 400 </title>
>>>> </head>
>>>> <body><h2>HTTP ERROR: 400</h2><pre>ERROR: [53567] multiple values
>>>> encountered for non multiValued field title_t: [Honan's Handbook to
>>>> medical Europe, A ready reference book to the universities,
>>>> hospitals, clinics, laboratories and general medical work of the
>>>> principal cities of Europe]</pre>
>>>> <p>RequestURI=/solr/update</p><p><i><small><a href="http://jetty.mortbay.org/
>>>> ">Powered by Jetty://</a></small></i></p><br/>
>>>> _______________________________________________
>>>> Blacklight-development mailing list
>>>> Blacklight-...@rubyforge.org
>>>> http://rubyforge.org/mailman/listinfo/blacklight-development
>>>> Blacklightopac Blog http://blacklightopac.org/
>>>>
>>>
>>>
>>> _______________________________________________
>>> Blacklight-development mailing list
>>> Blacklight-...@rubyforge.org
>>> http://rubyforge.org/mailman/listinfo/blacklight-development
>>> Blacklightopac Blog http://blacklightopac.org/
>>>
>>
>> _______________________________________________
>> Blacklight-development mailing list
>> Blacklight-...@rubyforge.org
>> http://rubyforge.org/mailman/listinfo/blacklight-development
>> Blacklightopac Blog http://blacklightopac.org/
>
>
> _______________________________________________
> Blacklight-development mailing list
> Blacklight-...@rubyforge.org
> http://rubyforge.org/mailman/listinfo/blacklight-development
> Blacklightopac Blog http://blacklightopac.org/