latin-1 encoding error when writing out MARC file downloaded via OAI-PMH

40 views
Skip to first unread message

rutht...@gmail.com

unread,
Oct 7, 2021, 3:07:42 PM10/7/21
to pymarc Discussion
Hi all,

We recently turned on the OAI extracts for our Symphony ILS and I've been testing different methods of working with the data. I've got it mostly working with pymarc but am running into an encoding error when writing out the record for https://catalog.libraries.psu.edu/catalog/2844997 which contains (which contains W 78⁰22ʹ30ʺ--W 77⁰07ʹ35ʺ/N 41⁰10ʹ00ʺ--N 40⁰41ʹ00ʺ, which is what's causing the issue).

My step-by-step process code (vs breaking into functions) is here: https://gitlab.com/-/snippets/2186472 Anyone should be able to harvest that file from our OAI, but I can provide the OAI response as well.

I'm getting: UnicodeEncodeError: 'latin-1' codec can't encode character '\u2070' in position 27: ordinal not in range(256)

Basically I'm requesting the record, using fromstring to turn it into an etree root, then a tree, then selecting the MARC file (represented by dangMARC), using marcxml_to_array(params)[0] to turn that into a pymarc record object. I then write the record object to a .mrc file. It's not elegant but it works fine. It roundtrips nicely in both pymarc and MarcEdit and I thought I was done until I hit this encoding issue.

Because it's bytes, not text, I got stumped. Would appreciate any help.

Thanks,
Ruth

Geoffrey Spear

unread,
Oct 9, 2021, 5:39:24 PM10/9/21
to pym...@googlegroups.com
Ruth,

You can force the record to unicode by adding:

record.leader = record.leader[:9] + "a" + record.leader[10:]

after line 13 of your code.

By default, if leader position 9 is blank instead of "a", pymarc will try to encode all of the text in your record as latin-1 for umm reasons of practicality beating purity (and us not having a MARC-8 encoder).

--
You received this message because you are subscribed to the Google Groups "pymarc Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pymarc+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pymarc/239fad99-c0d9-4677-b2ef-6e12fd2319e0n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages