indexing marcxml files

150 views
Skip to first unread message

PG

unread,
Aug 29, 2011, 7:51:42 PM8/29/11
to Blacklight Development
Can I index marcxml instead of .mrc files using rake solr:marc:index
MARC_FILE=/some/file.xml

Chris Beer

unread,
Aug 30, 2011, 10:06:44 AM8/30/11
to blacklight-...@googlegroups.com
Someone else who actually uses MARC will perhaps confirm or deny this:

The Blacklight rake task is a very light wrapper around SolrMarc [1] (with its own mailing lists, linked from that page, should you run into trouble with it). From the SolrMarc documentation [2]:

> The SolrMarc program reads in MARC records stored in standard binary (ISO 2709) or XML format and uses a configurable and customizable script for extracting values from the fields and sub-fields of the MARC record to build an index entry for adding to Solr.

So, I imagine it should "just work".


Chris


[1] http://code.google.com/p/solrmarc
[2] http://code.google.com/p/solrmarc/wiki/ConfiguringSolrMarc#About_SolrMarc

On Aug 29, 2011, at 7:51 PM, PG wrote:

> Can I index marcxml instead of .mrc files using rake solr:marc:index
> MARC_FILE=/some/file.xml
>

> --
> You received this message because you are subscribed to the Google Groups "Blacklight Development" group.
> To post to this group, send email to blacklight-...@googlegroups.com.
> To unsubscribe from this group, send email to blacklight-develo...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/blacklight-development?hl=en.
>

Adam Wead

unread,
Aug 30, 2011, 11:51:43 AM8/30/11
to blacklight-...@googlegroups.com
Chris is right.  If you run rake solr:marc:index:info it'll output a bunch of info about what it's actually passing to the solrMarc.jar file, including the actual command which should be the last line of the output.

To PG's question, I haven't tried it with xml.  I've used mrc files of single and multiple records, with some tweaking using yaz.  I think yaz will convert to mrc from xml, so if SolrMarc doesn't take xml you could convert...

...adam

Jonathan Rochkind

unread,
Aug 30, 2011, 12:03:38 PM8/30/11
to blacklight-...@googlegroups.com

SolrMarc should take MarcXML fine. I think it expects the input file to be named ending in suffix .xml to recognize it as XML.

 

Although come to think of it, I send marc to SolrMarc on stdin too; not sure how you’d tell it to expect XML in that case. But the SolrMarc list would be the place to ask. In general, SolrMarc is intended to take MarcXML fine, and the solr:marc:index wrapper rake task should do so as well. Several Blacklight and SolrMarc users are using SolrMarc with MarcXML.

--

You received this message because you are subscribed to the Google Groups "Blacklight Development" group.

Robert Haschart

unread,
Aug 30, 2011, 12:20:56 PM8/30/11
to blacklight-...@googlegroups.com
SolrMarc does take MarcXML fine.    As Jonathan mentions, if the argument for the filename is passed on the command-line it expects that the file will have a suffix of .xml  if the file is piped into the program through stdin, the program will peek ahead at the data to determine whether it is marc xml or binary marc.

-Bob Haschart
Reply all
Reply to author
Forward
0 new messages