Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

New perl module MARC::File::MiJ -- marc-in-json for perl

1 view
Skip to first unread message

Bill Dueber

unread,
Jul 15, 2013, 11:00:35 AM7/15/13
to Code for Libraries, perl...@perl.org
The marc-in-json format is, as you might expect, a JSON serialization for MARC. A JSON serialization for MARC is potentially useful in the same places where MARC-XML would be useful (long records, utility of human-readable records, etc.) without what many perceive to be the relative pain of working with XML vs JSON.

It's currently supported across several implementations:
  • ruby's marc gem
  • php's File_MARC
  • java's marc4j
  • python's pymarc
There wasn't one for perl, so I wrote one :-)

MARC::File::MiJ is a perl module that allows MARC::Record to encode/decode marc-in-json. It also supplies a handler to MARC::File/MARC::Batch that will read marc-in-json records from a newline-delimited-json (ndj) file (where each line is a JSON object without unescaped newlines, ending with a newline). 

marc-in-json encoding/decoding tends to be pretty fast, since json parsers tend to be pretty fast, and uncompressed filesizes occupy a middle-ground between binary marc and marc-xml. A sample file of about 18k marc records looks like this:

  31M topics.mrc
  56M topics.ndj (newline-delimited JSON)
  93M topics.xml

 8.9M topics.mrc.gz
 7.9M topics.ndj.gz
 8.7M topics.xml.gz

​...so obviously it compresses pretty well, too.

I can take generic questions; bugs should go to https://rt.cpan.org/Public/Bug/Report.html?Queue=MARC-File-MiJ

[ Note that there are many other possible JSON serializations for MARC, including the (incompatible) one implemented in the MARC::File::JSON module]




--
Bill Dueber
Library Systems Programmer
University of Michigan Library
0 new messages