marcxml --> mrc

829 views
Skip to first unread message

Heidi Frank

unread,
Apr 29, 2014, 6:22:06 PM4/29/14
to pym...@googlegroups.com
hi everyone,
I'm *sure* there must be a function that converts MARCXML records to MARC21/mrc, but after some digging, I'm not finding it.  I did find the reverse, converting mrc to MARCXML with the function pymarc.record_to_xml(record) - but how could I reverse that?

basically, I have a set of individual marcxml files, each containing 1 record, and I need to convert them to MARC21 files and join them into a single .mrc file.  I'm currently using MarcEdit for these functions, which is great, but I want to be able to have a script do this automatically without manual intervention, so would like to use PyMARC.

please let me know what I'm missing… (I'm positive it must be obvious)

thanks in advance for any leads.
best,
heidi

Dan Scott

unread,
Apr 29, 2014, 7:24:59 PM4/29/14
to pym...@googlegroups.com
Hi Heidi:

On Tue, Apr 29, 2014 at 6:22 PM, Heidi Frank <hf...@nyu.edu> wrote:
> hi everyone,
> I'm *sure* there must be a function that converts MARCXML records to
> MARC21/mrc, but after some digging, I'm not finding it. I did find the
> reverse, converting mrc to MARCXML with the function
> pymarc.record_to_xml(record) - but how could I reverse that?

There is! Check out test/writer.py in the source repository for an example.

The pertinent chunk of code is:

# write a record off to a file
writer = pymarc.MARCWriter(open('test/writer-test.dat', 'wb'))
record = pymarc.Record()
field = pymarc.Field('245', ['0', '0'], ['a', 'foo'])
record.add_field(field)
writer.write(record)
writer.close()

Hope this helps.

Heidi Frank

unread,
Apr 30, 2014, 2:40:53 PM4/30/14
to pym...@googlegroups.com
Hi Dan,
Thanks for your reply!   I looked at that code, and that's what I use to add/modify fields for MARC records in an mrc file, but I'm not seeing the connection to converting marcxml into "regular" marc fields…

I had found the "record_to_xml" function in the pymarc/marcxml.py file which seems to take a record in an mrc file and create a marcxml record out of it, but I need to do the opposite of that.  I have marcxml records and want to generate mrc records - hoping there's already a function to do this, something like "xml_to_mrc" or such…

am I missing something in the code you've pointed me to?

Godmar Back

unread,
Apr 30, 2014, 3:23:43 PM4/30/14
to pym...@googlegroups.com


--
You received this message because you are subscribed to the Google Groups "pymarc Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pymarc+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dan Scott

unread,
Apr 30, 2014, 4:26:20 PM4/30/14
to pym...@googlegroups.com
On Wed, Apr 30, 2014 at 3:23 PM, Godmar Back <god...@gmail.com> wrote:
>
> http://takhteyev.org/courses/11W/inf1005/pymarcdoc/pymarc.marcxml-module.html#map_xml

I think what Godmar is trying to say (by pointing to a very old
version of the pymarc source and not really saying anything) is that
the map_xml function (which calls the first argument you pass to it
for each record) can be your friend. For example:

def map_xml_to_binary(self):
writer = pymarc.MARCWriter(file('marcbinary.mrc', 'wb'))
records = pymarc.map_xml(writer.write, 'marcasxml.xml')
writer.close()

A little clearer, perhaps, but possibly problematic if your XML file
contains a lot of records (because it parses all of the records into
memory first):

def xml_to_binary(self):
records = pymarc.parse_xml_to_array('marcasxml.xml')
writer = pymarc.MARCWriter(file('marcbinary.mrc', 'wb'))
for r in records:
writer.write(r)
writer.close()

In short, pymarc.MARCWriter writes MARC21 binary records, so as long
as you have records to begin with (whether manually constructed or
parsed from XML or MARC21 binary source), you're good to go.

Dan

Heidi Frank

unread,
Apr 30, 2014, 9:06:43 PM4/30/14
to pym...@googlegroups.com
Hi Dan,
Thanks for explaining this!!  before I saw your explanation I had went ahead and wrote a quick script to create a single .mrc file from my individual marcxml files by traversing the xmldoms for each…
--------------------------------------------------------------
import os
import pymarc
from pymarc import Record, Field
from xml.dom.minidom import parseString

inst_code = raw_input('Enter the 3-letter institutional code: ')
batch_date = raw_input('Enter the batch date (YYYYMMDD): ')
base_dir = 'work/'+inst_code+'/'+inst_code+'_'+batch_date
marcRecsOut = pymarc.MARCWriter(file(base_dir+'/'+inst_code+'_'+batch_date+'_1_orig_recs.mrc', 'w'))

marcxml_dir = base_dir+'/marcxml'
for filename in os.listdir(marcxml_dir):
file_path = os.path.join(marcxml_dir,filename)
if os.path.isfile(file_path):
if file_path[-3:]=='xml':
marcxml_file = open(file_path, 'r')
marcxml_str = marcxml_file.read()
marcxml_file.close()
mrc_rec = Record()
xmlDOM = parseString(marcxml_str)
xml_recs = xmlDOM.getElementsByTagName('record')
for xml_rec in xml_recs:
ldrs = xml_rec.getElementsByTagName('leader')
for ldr in ldrs:
ldr_data = ldr.firstChild.nodeValue
ldr_field = Field(tag='000', data=ldr_data)
mrc_rec.add_field(ldr_field)
cntrls = xml_rec.getElementsByTagName('controlfield')
for cntrl in cntrls:
cntrl_tag = cntrl.getAttribute('tag')
cntrl_data = cntrl.firstChild.nodeValue
cntrl_field = Field(tag=cntrl_tag, data=cntrl_data)
mrc_rec.add_field(cntrl_field)
datafields = xml_rec.getElementsByTagName('datafield')
for datafield in datafields:
datafield_tag = datafield.getAttribute('tag')
ind1 = datafield.getAttribute('ind1')
ind2 = datafield.getAttribute('ind2')
mrc_field = Field(tag=datafield_tag, indicators=[ind1,ind2], subfields=[])
subfields = datafield.getElementsByTagName('subfield')
for subfield in subfields:
subfield_code = subfield.getAttribute('code')
subfield_data = subfield.firstChild.nodeValue
subfield_data = subfield_data.encode('ascii', 'ignore')
mrc_field.add_subfield(subfield_code,subfield_data)
mrc_rec.add_field(mrc_field)
marcRecsOut.write(mrc_rec)

marcRecsOut.close()
--------------------------------------------------------------

BUT, I definitely plan to look into the pymarc functions you mention since I'd really like to understand that method.

thanks again for you help!
cheers,
heidi

Heidi Frank

unread,
May 1, 2014, 9:47:25 AM5/1/14
to pym...@googlegroups.com
Hi again, Dan,
Just wrote my code to use the functions as you describe, and looks like it resolves my issues.  with my other script, I was having problems with the LDR field, so your method works perfectly - plus the code is much shorter :)
------------------------------------------------------------------------------------------------
#!/usr/bin/python

import os
import pymarc
from pymarc import Record, Field

inst_code = raw_input('Enter the 3-letter institutional code: ')
batch_date = raw_input('Enter the batch date (YYYYMMDD): ')
base_dir = 'work/'+inst_code+'/'+inst_code+'_'+batch_date
marcRecsOut_bin = pymarc.MARCWriter(file(base_dir+'/'+inst_code+'_'+batch_date+'_1_orig_recs_bin.mrc', 'wb'))

marcxml_dir = base_dir+'/marcxml'
for filename in os.listdir(marcxml_dir):
file_path = os.path.join(marcxml_dir,filename)
if os.path.isfile(file_path):
if file_path[-3:]=='xml':
marc_xml_array = pymarc.parse_xml_to_array(file_path)
for rec in marc_xml_array:
marcRecsOut_bin.write(rec)

marcRecsOut_bin.close()
------------------------------------------------------------------------------------------------

THANKS for the detailed explanation about those xml pymarc functions.  much appreciated!
heidi
Reply all
Reply to author
Forward
0 new messages