Creation of MARC Validator using Lex/Yacc/Antlr

Tino Dai

unread,

Nov 21, 2023, 2:01:22 PM11/21/23

to pymarc Discussion

Hi All,

I have been tossing around a python MARC validator as we are rewriting some key components at the LC and to scratch an itch that I have had for many years.

A) Has this been done before with parser/tokenizer? I know that MarcEdit has some validation capabilities. Also, I'm sure that a lot of organizations have their own homegrown utilities.

B) Would it have uses outside of internal LC workflows?

Any and all feedback would be welcomed!

Thanks in advance,

Tino

Tomasz Kalata

unread,

Nov 22, 2023, 9:42:06 PM11/22/23

to pym...@googlegroups.com

This is an interesting idea, Tino.

Yes I think this type of validator will have uses outside of the LC. I'm not aware of any Python package that would validate the structure and certain content of the MARC format (for example, valid values for indicators), as MarcEdit does. To get around this, some of our applications call MarcEdit's command line tool. The experience was mixed though and we never went beyond just flagging files that have validation problems. MarcEdit reports are also not meant for machine consumption out of the box. There are certain types of invalid coding that we are interested in and other problems can be ignored, or at least are not critical. Being able to focus on certain issues would be nice. It would also be practical - there could be a potentially overwhelming number of problems in legacy data (older records not conforming to the current standard), especially in larger sets.

It would be nice to extend validation to local rules. Being able to define custom rules for 9xx fields would make such a validator pretty powerful, I think.

If I remember correctly, someone on the code4lib slack channel floated an idea of such a validator a while ago, but as a web service. Not really convinced this would be as useful as a Python package.

Cheers,

Tomasz

--
You received this message because you are subscribed to the Google Groups "pymarc Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pymarc+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pymarc/c86bdf6e-41f6-4712-9ee4-b5706724bb8cn%40googlegroups.com.

--

Tomasz Kalata

Assistant Director, Cataloging

BookOps

The New York Public Library & Brooklyn Public Library

bookops.org | 917-229-9559

Ben W.

unread,

Dec 20, 2023, 9:56:46 AM12/20/23

to pymarc Discussion

Doing much the same as you, Tomasz, in using MarcEdit calls for validation, but I'm certainly interested in Tino's idea. I'm not familiar with Lex or any of the other software mentioned, however. Curious to hear where you go with it, Tino, I think it's a good idea.