On Aug 16, 2024, at 9:31 AM, Uwe Steinmann <vuf...@steinmann.cx> wrote:
Am Fri, Aug 16, 2024 at 01:51:24PM +0000 schrieb Demian Katz:
Uwe,
Have you tried converting your MARCXML to binary MARC using a tool
like yaz-marcdump
(https://software.indexdata.com/yaz/doc/yaz-marcdump.html) to see if
that solves the problem? If so, you might be able to use format
conversion as a workaround until we can find a better solution in
SolrMarc; either way, it would help to prove or disprove my theory.
Your theory appears to be right. Importing the binary MARC works!
So I tried 100000 records after converting them to .mrc and it also
works. It even works without setting
org.marc4j.MarcPermissiveStreamReader.upperCaseSubfields = true
in imports.properties
By looking at the code I thought that would be needed in any case.
Thanks for the help so far. That at least keeps me going to get some
data into my test installation
Uwe
_______________________________________________
VuFind-General mailing list
VuFind-...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/vufind-general
It certainly seems to me that the encoding of the input data should not change the behavior of the software… but I don’t know if there’s a technical reason why the MarcPermissiveStreamReader can only work with binary MARC data. I’d definitely favor greater consistency, but I’m not sure how hard that would be to achieve without digging deeper into the Marc4j code/documentation.
- Demian
I did a little more digging and found that there is a technical reason for all this: the permissive behavior is a constructor switch on Marc4j’s MarcPermissiveStreamReader, which can only read binary MARC. The MarcUnprettyXmlReader class used for reading XML does not appear to have an equivalent switch. This seems to be a shortcoming/inconsistency in Marc4j that is just affecting the upstream SolrMarc code. Again, I only did a cursory investigation, so maybe I’m missing something, but this may not be simple to solve.
- Demian