Attempts to parse records result in a garbled table.

28 views
Skip to first unread message

Mykle Law

unread,
Dec 30, 2024, 5:29:04 PM12/30/24
to pymarc Discussion
I'm attempting to use PyMarc to flatten my data and generate tabular data for a spreadsheet.

I'm running into problems where oddities in 9XX field data cause trouble. My current suspicion is that line breaks (and/or tabs?) are being interpreted by PyMarc's reader as the end of a MaRC record.

I know that we use line breaks in comments in the 5XX fields without any trouble, so I'm trying to track down where the problem lies.

Anyone have any suggestions?

Thanks,
-Myke

Mykle Law

unread,
Dec 30, 2024, 6:01:22 PM12/30/24
to pymarc Discussion
So I was wrong. It looks like a line break in any field may cause this problem.
So far, fields that I've found with the issue include: 500, 530, 563, 591, 905, 910.

I'm guessing this is universal, across all fields. And the need we likely have is to remove the line breaks from our data.

Andy Kohler

unread,
Dec 30, 2024, 7:16:39 PM12/30/24
to pym...@googlegroups.com
Hi Myke -

I believe control characters like tab, carriage return, and line feed are not supposed to be used in MARC21.  The only ASCII code points in 00-1F which are expected are ESCAPE (0x1B), RECORD TERMINATOR (0x1D), FIELD TERMINATOR (0x1E), and SUBFIELD DELIMITER (0x1F), per [1].  However, my memory may be wrong - I didn't see anything in a quick skim through [2] which prohibited formatting characters like TAB and CR/LF.

That said, PyMARC and other MARC tools generally don't enforce this.  The problems you're seeing are more generic: you're working with data which is expected to be textual, and formatting characters like TAB and CR/LF shouldn't be part of that textual data.

I generally replace TAB and CR/LF with space (0x20) character(s) when I find it in MARC data.


Thanks --Andy

--
You received this message because you are subscribed to the Google Groups "pymarc Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pymarc+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/pymarc/d3d6c018-a345-4465-ae17-60f4c8dc3f75n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages