[solrmarc-tech] Troubleshooting MARC-XML

94 views
Skip to first unread message

Demian Katz

unread,
May 11, 2010, 3:34:39 PM5/11/10
to solrma...@googlegroups.com

Hello,


I'm currently trying to use SolrMarc to index some MARC-XML authority records.  I keep running into the error:

 

                Exception occurred while Indexing: Unable to parse input

 

As a sanity check, I've also been playing around with a sample record from LOC (http://www.loc.gov/standards/marcxml/Sandburg/sandburg.xml); this indexed correctly at first, but after I ran it through a pretty-printer (the jEdit XML module's Indent XML feature), it started displaying the same error message!  Seems that there may be a broad range of problems that trigger this same message.

 

I have attached a sample authority record that appears valid to my (admittedly somewhat inexperienced) eye in case anyone else wants to try to reproduce the problem.

 

Has anybody else seen this issue?  Any suggestions on how to get more information to help with MARC-XML troubleshooting?  I've fiddled with the log4j settings but haven't managed to get any more details out.

 

thanks,

Demian

--
You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
To post to this group, send email to solrma...@googlegroups.com.
To unsubscribe from this group, send email to solrmarc-tec...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/solrmarc-tech?hl=en.
sample.xml

Bill Dueber

unread,
May 11, 2010, 5:07:02 PM5/11/10
to solrma...@googlegroups.com
I'm noticing that some indicators are null strings ("") as opposed to spaces (" ") -- is that it?
--
Bill Dueber
Library Systems Programmer
University of Michigan Library

Demian Katz

unread,
May 12, 2010, 4:10:39 PM5/12/10
to solrma...@googlegroups.com

Good eye -- that was one of the problems.  It also appears that the leader doesn't pass validation against the MARC21slim.xsd file.  If I remove the leader and fix the null indicators, I can get SolrMarc to index the data.

 

Of course, now that I've pinned down the issues, I'm still not sure of the best solution -- is there a way to make SolrMarc more forgiving of slightly invalid XML?

 

thanks,

Demian

Till Kinstler

unread,
May 12, 2010, 4:32:56 PM5/12/10
to solrma...@googlegroups.com
Am 12.05.2010 22:10, schrieb Demian Katz:
> Good eye -- that was one of the problems. It also appears that the
> leader doesn't pass validation against the MARC21slim.xsd file. If I
> remove the leader and fix the null indicators, I can get SolrMarc to
> index the data.

Just out of curiosity: Have you tried converting the records from
MARCXML to binary MARC, eg. with the yaz-marcdump tool (part of the open
source YAZ package by Indexdata: http://www.indexdata.com/yaz)?
What does yaz-marcdump say? And if it converts the records without
complaints, does solrmarc eat them then?

Till

Bill Dueber

unread,
May 13, 2010, 10:13:00 AM5/13/10
to solrma...@googlegroups.com
Just the usual warning: not all valid MARC-XML is valid binary MARC due to length issues, so keep an eye out.
--
Bill Dueber
Library Systems Programmer
University of Michigan Library

Demian Katz

unread,
May 13, 2010, 11:49:48 AM5/13/10
to solrma...@googlegroups.com

yaz-marcdump actually had even more problems with these files than SolrMarc did.  I couldn't get anything out of it (v3.0.46, if that matters) other than "yaz_marc_read_xml failed."  Adding the "-v" verbose flag provided no extra details.

 

A bit more digging revealed that the problem was a missing leader byte 09 (character coding scheme).  When I manually added this byte back into the leader, both yaz-marcdump and SolrMarc behaved correctly.  It would definitely be nice to have better feedback to help pinpoint problems like this, plus a means of ignoring corrupted leaders if desired.

 

I have attached a more raw version of my previous sample in case anybody is interested in playing with this.  Would it be worthwhile for me to open a ticket in JIRA with this sample record for help with future development?

 

- Demian

 

From: solrma...@googlegroups.com [mailto:solrma...@googlegroups.com] On Behalf Of Bill Dueber
Sent: Thursday, May 13, 2010 10:13 AM
To: solrma...@googlegroups.com
Subject: Re: [solrmarc-tech] Troubleshooting MARC-XML

 

Just the usual warning: not all valid MARC-XML is valid binary MARC due to length issues, so keep an eye out.

sample.xml
Reply all
Reply to author
Forward
0 new messages