More on SolrMarc and uppercase subfields

63 views
Skip to first unread message

Demian Katz

unread,
Sep 28, 2012, 7:56:38 AM9/28/12
to Hannah Ullrich, vufin...@lists.sourceforge.net, solrma...@googlegroups.com
Another message I'm copying to solrmarc-tech...

This definitely sounds like a case of SolrMarc making assumptions that it really shouldn't; I don't think it's SolrMarc's job to validate the subfields, especially given that the MARC standard is constantly evolving.

If I had to guess, though, I'd speculate that this is actually not a problem specific to SolrMarc, but rather something related to the underlying Marc4j library.  Can somebody with more Marc4j experience confirm or deny that?

Assuming this is related to Marc4j, is there a way to loosen the restrictions, perhaps through a new SolrMarc configuration option?

- Demian

From: Hannah Ullrich [hannah....@ub.uni-freiburg.de]
Sent: Friday, September 28, 2012 7:46 AM
To: vufin...@lists.sourceforge.net
Subject: Re: [VuFind-Tech] SolrMarc 2.4 upgrade

Hallo Stefan,

we have the same problem with the 689 fields.

I get the following info in our marc_error field:

<arr name="marc_error">
<str>Minor Error  : Subfield tag is an invalid uppercase character, changing it to lower case. --- [ 689 : D ]</str>
<str>Major Error  : Subfield tag is an invalid character, using first character of field as subfield tag. --- [ 689 : A ]</str>
</arr>

using vufind 1.3

Hannah

Am 28.09.2012 12:26, schrieb Winkler, Stefan:

Hi Demian,

 

I didn't try your upgrade in the 1.3 trunk but used the solrmarc 2.4 vufind binary directly from http://code.google.com/p/solrmarc/downloads/list

 

After indexing I seems like the case sensitivity while reading the subfield-tags was lost.

 

MarcEdit Marc21:

=689  01$Af$2gnd$aKongress

 

1. Vufind FullRecord (v2.3.1):

689 01 |A f  |2 gnd  |a Kongress 

 

2. Vufind FullRecord (v2.4):

689 01 |a f  |2 gnd  |a Kongress 

 

marc.properties properties parses this Field using "topic_facet = 689a"

 

The result is, that  in the second case I get the $A subfield with the value "f" als topic_facet.

 

Can anybody confirm this for his/her index? I suppose uppercase subfields like $A or $D   is used mainly in  germany.

 

Best wishes

Stefan

 

 

 

--

Stefan Winkler

Bibliotheksservice-Zentrum Baden-Württemberg (BSZ)

78457 Konstanz / Germany

Phone: +49 7531 88 2364

E-Mail: stefan....@bsz-bw.de

http://www.bsz-bw.de

 

Von: Demian Katz [mailto:demia...@villanova.edu]
Gesendet: Mittwoch, 26. September 2012 19:11
An: vufin...@lists.sourceforge.net
Betreff: [VuFind-Tech] SolrMarc 2.4 upgrade

 

I have just upgraded both the VuFind 1.x trunk and the VuFind 2.x master branch to use SolrMarc 2.4, the latest release.  (VuFind 1.x was previously on 2.3.1, while 2.x was previously using a custom-built version very close to 2.4).  Hopefully this will make Monday's 2.0beta release just a little bit easier to manage thanks to the inclusion of a known SolrMarc version.  It also allows me to close a couple of JIRA tickets related to fixes/improvements in 2.4.

I don't anticipate any problems related to this upgrade, but please let me know if you run into any indexing troubles using the latest-and-greatest code.

thanks,
Demian



------------------------------------------------------------------------------
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html


_______________________________________________
Vufind-tech mailing list
Vufin...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/vufind-tech


-- 
Hannah Ullrich
Fachinformatikerin

Universitaetsbibliothek Freiburg
EDV Dezernat
Rempartstr. 10-16
79098 Freiburg
Tel: +49-761 / 203-3877

Simon Spero

unread,
Sep 29, 2012, 11:47:45 AM9/29/12
to solrma...@googlegroups.com
Upper case letters are not allowed in MARC21 records (though they *are* allowed in generic marc.  

I thought that the DNB had completed the conversion to MARC21?

Simon

--
You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
To post to this group, send email to solrma...@googlegroups.com.
To unsubscribe from this group, send email to solrmarc-tec...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/solrmarc-tech?hl=en.

Demian Katz

unread,
Oct 1, 2012, 9:28:14 AM10/1/12
to solrma...@googlegroups.com, hannah....@ub.uni-freiburg.de, Winkler, Stefan

I am copying this message back to Hannah and Stefan in case they did not see your reply on the solrmarc-tech list.

 

Is SolrMarc meant to enforce the rules of MARC21?  If so, should there be an option to relax them?  It seems that there is value in being able to work with non-standard records as long as they are structurally correct.

 

- Demian

Jonathan Rochkind

unread,
Oct 1, 2012, 10:42:42 AM10/1/12
to solrma...@googlegroups.com, Demian Katz, Hannah Ullrich, vufin...@lists.sourceforge.net
Marc4J has always been very Marc agnostic -- it doesn't even assume
Marc21, but works fine with UNIMARC and other European MARC variants.
Meaning it makes absolutely no assumptions about field/subfield
semantics, or what subfields or legal, or anything like that.

With one important principled exception: Marc4J _does_ follow the actual
MARC specification (not Marc21, the generic MARC spec they are all based
on, I forget the number), to the letter. If the MARC spec says that
subfields are not case sensitive, then it would not surprise me to see
Marc4J implementing that as written in the spec.



On 9/28/2012 7:56 AM, Demian Katz wrote:
> Another message I'm copying to solrmarc-tech...
>
> This definitely sounds like a case of SolrMarc making assumptions that
> it really shouldn't; I don't think it's SolrMarc's job to validate the
> subfields, especially given that the MARC standard is constantly evolving.
>
> If I had to guess, though, I'd speculate that this is actually not a
> problem specific to SolrMarc, but rather something related to the
> underlying Marc4j library. Can somebody with more Marc4j experience
> confirm or deny that?
>
> Assuming this is related to Marc4j, is there a way to loosen the
> restrictions, perhaps through a new SolrMarc configuration option?
>
> - Demian
> ------------------------------------------------------------------------
> *From:* Hannah Ullrich [hannah....@ub.uni-freiburg.de]
> *Sent:* Friday, September 28, 2012 7:46 AM
> *To:* vufin...@lists.sourceforge.net
> *Subject:* Re: [VuFind-Tech] SolrMarc 2.4 upgrade
>> Bibliotheksservice-Zentrum Baden-W�rttemberg (BSZ)
>>
>> 78457 Konstanz / Germany
>>
>> Phone: +49 7531 88 2364
>>
>> E-Mail: stefan....@bsz-bw.de
>>
>> http://www.bsz-bw.de
>>
>> *Von:*Demian Katz [mailto:demia...@villanova.edu]
>> *Gesendet:* Mittwoch, 26. September 2012 19:11
>> *An:* vufin...@lists.sourceforge.net
>> *Betreff:* [VuFind-Tech] SolrMarc 2.4 upgrade

Robert Haschart

unread,
Oct 1, 2012, 3:39:30 PM10/1/12
to solrma...@googlegroups.com
It is something I added.  I had come across several records in our collection that had erroneous upper case subfield tags, and then found documentation that seemed to state unequivocally that subfield tags must be numeric 0-9 or lowercase alphabetic a-z.   I thought that this restriction was true for MARC in general, rather than a specific implementation of MARC. 

I'm pretty sure that this change is in the MarcPermissiveStreamReader portion of Marc4j.   Since it appears to be causing a problem I think that it should be able to enabled/disabled, perhaps via a property.    I'll try to put together a minor release either of marc4j or solrmarc with this fix as soon as possible.  My current thinking is that is should be disabled by default, and be able to be enabled with a property.  Perhaps this is true for some other pieces of the error detecting/correcting code in the Permissive reader.

-Bob Haschart

Demian Katz

unread,
Oct 1, 2012, 3:59:38 PM10/1/12
to solrma...@googlegroups.com

That makes sense to me.  Thanks for the update!

Reply all
Reply to author
Forward
0 new messages