Major version release

122 views
Skip to first unread message

Andrew Hankinson

unread,
Mar 18, 2023, 10:39:29 AM3/18/23
to pymarc Discussion
Hi all,

I have recently filed a merge request on the pymarc GitLab repo:


This contains several improvements and consistency updates, but with the caveat that it contains a number of breaking changes. The details are contained in the Merge Request if you are interested.

It was suggested that we open a discussion here to see if there are any other backwards-incompatible changes that are needed, since these changes would already constitute a breaking release, so it's better to do them now.

Are there any thoughts or concerns?

Thank you very much,
-Andrew Hankinson

RISM Digital Center,
Bern Switzerland

Tomasz Kalata

unread,
Mar 19, 2023, 11:40:31 PM3/19/23
to pymarc Discussion
Hi Andrew,

I really like this change in modeling subfields - I think catalogers have this type of mental model for subfields, as code-value pairs. I see you still provide a way to set subfields via a list of strings. This feature was simple (I still like it's simplicity!), but I tripped myself many times accidentally missing a value. I will experiment with your code to get a better feel, but RIGHT ON!

Cheers,
Tomasz

Ed Summers

unread,
Mar 23, 2023, 2:14:29 PM3/23/23
to pym...@googlegroups.com
Thanks Andrew and Tomasz!

I agree, this seems like a significant usability improvement for pymarc and worthy of a new version release. Are there are other backwards incompatible changes that people have been sitting on? I don’t want to over-complicate things by making too many changes, but perhaps it’s worth considering as we move to a v5?

I don’t use pymarc much these days, so I feel a bit out of touch with how people are using it.

//Ed
> --
> You received this message because you are subscribed to the Google Groups "pymarc Discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pymarc+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pymarc/950e3cb2-197d-4aba-bc6a-b80e24ef5ce8n%40googlegroups.com.


Andy Kohler

unread,
Apr 5, 2023, 8:00:44 PM4/5/23
to pym...@googlegroups.com
I am a big fan of this proposed change.  Our main use case for pymarc is data manipulation, which often involves changing subfield data. The old method of iterating through code-value pairs is really awkward; moving to this CodedSubfield approach is a significant improvement.

Thanks --Andy / UCLA Library IT

Andrew Hankinson

unread,
Apr 6, 2023, 4:48:44 AM4/6/23
to pymarc Discussion
One thing that the current proposal keeps, but I would be interested in feedback on, is that it keeps the current "get" behaviour of the `.subfields` property more-or-less unchanged; that is, asking for the `subfields` will return `['a', 'Foo', 'b', 'Bar']`, instead of `[CodedSubfield(code='a', value='Foo'), CodedSubfield(code='b', value='Bar')]`. Is this legacy behaviour still desired?

Andy Kohler

unread,
Apr 6, 2023, 11:36:24 PM4/6/23
to pym...@googlegroups.com
I think keeping the legacy default behavior isn't desired... but may be necessary.  Otherwise this would be a truly breaking change.

But if I understand correctly, the changes are largely internal, with the option to access the CodedSubfield structures via field._subfields.  So new development can use this new approach, without requiring rewrites of older code.

I personally would prefer doing away with the legacy behavior, as in my opinion it offers no advantages as-is, and lots of unpleasantness when doing non-trivial work with MARC records.  But given that this is the de-facto standard python package for MARC manipulation, it's probably best to be conservative.

Thanks --Andy K.

Andrew Hankinson

unread,
Apr 7, 2023, 3:11:16 AM4/7/23
to pymarc Discussion
I personally wouldn't advocate for using "._subfields" in external code, since the leading underscore convention tends to indicate a "private" method. The new behaviour is internal, except that both the constructor and the `set_subfields` method will accept a `CodedSubfield` list -- it will just never return one, unless you bypass everything and go through the `._subfields` property.

In earlier iterations of this I had an internal set of the subfield codes so that the `contains` operation was an O(1) lookup, but I ultimately removed it because I felt it was probably unnecessary for the amount of data that was in any given field. But I think that's the danger in relying on things marked internal for external code -- that there might be some other operation in the class that manages the data in the property that doesn't happen if you use the property directly.

Looking at the code there *shouldn't* be anything that falls into that category right now, so maybe it's not a big deal? As a possible alternate solution I could rename `._subfields` to `.coded_subfields` to mark that as an explicit public property, and change everything to work with that?

Andy Kohler

unread,
Apr 10, 2023, 6:43:52 PM4/10/23
to pym...@googlegroups.com
For me - and, I suspect, anyone who uses pymarc to do significant work with MARC records - the pain is working with subfields, especially iterating through them.

Renaming "._subfields" to ".coded_subfields", to show support for public use of this data, would be helpful.

Thanks --Andy K.

--
You received this message because you are subscribed to the Google Groups "pymarc Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pymarc+un...@googlegroups.com.

Ed Summers

unread,
Apr 13, 2023, 8:45:28 AM4/13/23
to pym...@googlegroups.com
It looks like there is unanimous support for these changes Andrew! Thank you for taking the time to improve pymarc.

I'll admit to not having looked closely but from what I have seen it seems like you have attempted to preserve backwards compatibility?

Assuming this to be the case, do you (or others) have an opinion about whether we should aim to remove the old (error prone) way of working with subfields by:

1. releasing this work as v4.3.0, with deprecation warnings for the old access methods
2. designating some date when we will release a v5.0.0 with the deprecated subfield behavior removed

Or should we just live with the multiple interfaces to subfields? The way you’ve implemented it, it seems pretty maintainable the way it is?

//Ed

Andrew Hankinson

unread,
Apr 13, 2023, 9:17:45 AM4/13/23
to pym...@googlegroups.com
Personally, I don't see much benefit to releasing 4.3.0 with these changes. The advances over 4.2.2 are marginal at best, in terms of performance and aesthetics for already written code. (IOW, the 'difficult' bit is already done for existing projects that use pymarc, and the changes I'm proposing mostly bring quality-of-life improvements to people writing new code).

I attempted to preserve backwards compatibility because I wasn't sure how the new tuple structure would be received, but given that it seems to be preferred over the old way (lots of people seem to have the same difficulties I did!) I would probably be a bit less concerned about this now.

The only reason I could see to release 4.3.0 is to insert some deprecation warnings, but I wouldn't necessarily change anything in the implementation. If projects are using dependency tag management (a big "if") then their tools shouldn't upgrade themselves to a major release.

If people are not familiar with the update and continue to work with the "old" way through the "old" properties, there may be some subtle bugs that pop up... like trying to insert new values in the list directly. The getters / setters work OK for emulating the old interface in broad strokes, but they're not a 1-to-1 functionality match. In that case maybe it's better to be noisy about breaking things rather than give the illusion that they're still working as they were before?

So my personal preference, now, would be to:

1. Remove the "old" behaviour from the `.subfields` properties and make `.subfields` operate with `CodedSubfield` tuples entirely. Call it 5.0.0.
2. Optionally, if it's necessary, create a 4.3.0 with deprecation warnings for those who might be able to update to a minor change, but with no other functional changes in the code.

-Andrew
> --
> You received this message because you are subscribed to the Google Groups "pymarc Discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pymarc+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pymarc/19B7AC56-6519-4CE1-A78F-CB3BE92F4F8B%40pobox.com.

Geoffrey Spear

unread,
Apr 13, 2023, 12:05:53 PM4/13/23
to pym...@googlegroups.com
I don't see much utility in a separate 4.x branch/release that's going to warn you that something is deprecated if you can't get rid of the warning by changing your code without upgrading to a 5.x release.

A 5.0.0 with documentation mentioning the change and pointing users back at 4.2 for their legacy code that uses the old behavior would be my inclination.

Dan Scott

unread,
Apr 13, 2023, 2:09:08 PM4/13/23
to pymarc Discussion
On Thursday, April 13, 2023 at 12:05:53 p.m. UTC-4 geoff...@gmail.com wrote:
I don't see much utility in a separate 4.x branch/release that's going to warn you that something is deprecated if you can't get rid of the warning by changing your code without upgrading to a 5.x release.

A 5.0.0 with documentation mentioning the change and pointing users back at 4.2 for their legacy code that uses the old behavior would be my inclination.

+1

Ed Summers

unread,
Apr 13, 2023, 3:09:25 PM4/13/23
to pym...@googlegroups.com


> On Apr 13, 2023, at 2:09 PM, Dan Scott <den...@gmail.com> wrote:
>
> A 5.0.0 with documentation mentioning the change and pointing users back at 4.2 for their legacy code that uses the old behavior would be my inclination.
>
> +1

It definitely seems simplest/easiest to go this way.

So let’s aim for a v5.0.0 release that has CodedSubfield functionality and the old subfield list soup fully removed.

Is this some additional work you are willing to take on Andrew? I'm willing to rewrite some tests that break, and update documentation if it is helpful.

//Ed

Andrew Hankinson

unread,
Apr 14, 2023, 4:43:41 AM4/14/23
to pym...@googlegroups.com
OK! I have some time today so I can make these changes and push them to the MR.

I'll also see if I can figure out a few sentences for documentation, but as always a second (or third or fourth) set of eyes on it would be appreciated!

-Andrew
> --
> You received this message because you are subscribed to the Google Groups "pymarc Discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pymarc+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pymarc/7663892B-08B8-463E-9CE6-6C4F34018588%40pobox.com.

Svetlana Koroteeva

unread,
Apr 14, 2023, 7:59:02 PM4/14/23
to pymarc Discussion
Hi,

Have been using pymarc for a while it is stable and working good and hoping it will be. Just remember once was lacking of "add_ordered_subfield" method. 

Regards,
Svetlana

Ed Summers

unread,
Apr 16, 2023, 6:33:30 AM4/16/23
to pym...@googlegroups.com

> On Apr 12, 2023, at 3:46 AM, Svetlana Koroteeva <svetlana....@gmail.com> wrote:
> Have been using pymarc for a while it is stable and working good and hoping it will be. Just remember once was lacking of "add_ordered_subfield" method.

Thanks Svetlana! If there were an add_ordered_subfield method how would you imagine using it?

You may have noticed it already but the Field.add_subfield() method does have an optional pos parameter which you can use to insert at a particular position in the subfield.

//Ed

Ed Summers

unread,
Apr 17, 2023, 5:47:32 AM4/17/23
to pym...@googlegroups.com
The current plan [1] is to release the new subfield functionality that Andrew contributed as v5.0.0 on May 1st. A note [2] has been added to the README.md to indicate that this is a breaking change (hence the major version release).

If there are other things that should be considered for v5.0.0 please chime in on issue 190 [2].

At the moment the only other thing that is being considered is David Dowling’s proposed change to the way that the record leader [3] is updated when records are automatically converted from MARC-8 to UTF-8 (the default behavior). If you have opinions about that please chime in here or on #193 [4].

[1] https://gitlab.com/pymarc/pymarc/-/issues/190
[2] https://gitlab.com/pymarc/pymarc/-/tree/v5#writing
[3] https://www.loc.gov/marc/bibliographic/bdleader.html
[4] https://gitlab.com/pymarc/pymarc/-/merge_requests/193

Ed Summers

unread,
Apr 17, 2023, 5:50:54 AM4/17/23
to pym...@googlegroups.com

> On Apr 17, 2023, at 5:46 AM, Ed Summers <e...@pobox.com> wrote:
>
> If there are other things that should be considered for v5.0.0 please chime in on issue 190 [2].

My apologies, I meant to footnote https://gitlab.com/pymarc/pymarc/-/issues/190 there.

Svetlana Koroteeva

unread,
Apr 18, 2023, 4:59:24 PM4/18/23
to pymarc Discussion
Kia ora,

It is all working really good. It was just a  case, which was only one, when we needed to insert a subfield to one of the fields in number of the existing records. As as remember it was "e", however position was not always the same, as record might contain "c" subfield and others and might not. We sorted subfileds via another iteration.  I just remember that add_ordered_field put fields in correct orders and thought that it would be good to have the same function for subfields.

Regards,
Svetlana

Reply all
Reply to author
Forward
0 new messages