Hello,
I'm currently trying to use SolrMarc to index some MARC-XML authority records.
I keep running into the error:
Exception occurred while Indexing: Unable to parse input
As a sanity check, I've also been playing around with a sample record from LOC (http://www.loc.gov/standards/marcxml/Sandburg/sandburg.xml); this indexed correctly at first, but after I ran it through a pretty-printer (the jEdit XML module's Indent XML feature), it started displaying the same error message! Seems that there may be a broad range of problems that trigger this same message.
I have attached a sample authority record that appears valid to my (admittedly somewhat inexperienced) eye in case anyone else wants to try to reproduce the problem.
Has anybody else seen this issue? Any suggestions on how to get more information to help with MARC-XML troubleshooting? I've fiddled with the log4j settings but haven't managed to get any more details out.
thanks,
Demian
Good eye -- that was one of the problems. It also appears that the leader doesn't pass validation against the MARC21slim.xsd file. If I remove the leader and fix the null indicators, I can get SolrMarc to index the data.
Of course, now that I've pinned down the issues, I'm still not sure of the best solution -- is there a way to make SolrMarc more forgiving of slightly invalid XML?
thanks,
Demian
yaz-marcdump actually had even more problems with these files than SolrMarc did. I couldn't get anything out of it (v3.0.46, if that matters) other than "yaz_marc_read_xml failed." Adding the "-v" verbose flag provided no extra details.
A bit more digging revealed that the problem was a missing leader byte 09 (character coding scheme). When I manually added this byte back into the leader, both yaz-marcdump and SolrMarc behaved correctly. It would definitely be nice to have better feedback to help pinpoint problems like this, plus a means of ignoring corrupted leaders if desired.
I have attached a more raw version of my previous sample in case anybody is interested in playing with this. Would it be worthwhile for me to open a ticket in JIRA with this sample record for help with future development?
- Demian
From:
solrma...@googlegroups.com [mailto:solrma...@googlegroups.com] On
Behalf Of Bill Dueber
Sent: Thursday, May 13, 2010 10:13 AM
To: solrma...@googlegroups.com
Subject: Re: [solrmarc-tech] Troubleshooting MARC-XML
Just the usual warning: not all valid MARC-XML is valid binary MARC due to length issues, so keep an eye out.