Ruby script to check for missing MARC

2 views
Skip to first unread message

Ross Singer

unread,
Jan 6, 2010, 9:58:14 PM1/6/10
to blacklight-...@googlegroups.com, solrmarc...@googlegroups.com
Hi all, over on the VuFind list, Jeffrey Barnett ran into a problem
where he was missing 400K records (out of 8+ million) from his
SolrMarc load into VuFind and was wondering how to figure out which
were missing. I took a stab at a simple script using ruby-marc and
rsolr to figure out which MARC 001s didn't have a corresponding Solr
doc id and append the missing MARC record into a batch to allow you to
run SolrMarc over it again.

I figure something like this might be generally useful to anybody
using Solr, MARC and/or SolrMarc, so here it is:

http://gist.github.com/270920

I figure this script could also be a jumping off point for a script to
clean up bad MARC records that MARC4J rejects -- ruby-marc has a much
more lenient parser.

I don't actually have a setup where I can confirm if this script
actually works (!!) but if somebody wanted to try it out, there's
nothing here that could break anything (well, depending on how many
records are missing -- your system could slow down, I suppose).

Thanks,
-Ross.

Reply all
Reply to author
Forward
0 new messages