Deprecation thoughts

36 views
Skip to first unread message

Jason Moore

unread,
Oct 25, 2018, 10:35:41 AM10/25/18
to sy...@googlegroups.com
Howdy,

I'm writing because I've been a bit frustrated over two recent SymPy API changes that are backwards incompatible I've had to deal with upstream. I now maintain multiple libraries that depend on SymPy. At least one library, PyDy, is reasonably widely used and others have at least some users (mainly because I use them in my teaching which 30-60 students interact with each year). I've been maintaining PyDy for almost a decade now.

I personally believe that SymPy does not take backwards compatibility serious enough and that this has a detrimental effect on growing our user base, not to mention my frustration as a user that needs SymPy to act like a library. I think that this is a broad issue in OSS, in general, but is particularly obvious with SymPy. Having an unchanging API becomes very important when projects are big. We have 100+ citations to the new paper, 5000+ stars on Github, and something like 300k downloads on PyPi.

I would like to see us be much more strict about five things:

1. We should not change existing tests unless the change in the outputs don't affect user code, e.g. if an output expression is simpler that's ok.
2. We should not change existing public API unless absolutely necessary. Changing API just because it isn't the optimal design is not a valid reason to change it.
3. Deprecation warnings must be used if changes are made at these should be in place for at least a year.
4. Never give warnings in documentation or code that exclaim something like "this api is experimental and may change in future versions". If you make it public, then you are stuck with it.
5. Never merge your own code. This "I'm merging in 24 hours if no response" stuff is bad practice. We should at least require one review of the code.
*

I find that many scientific Python packages have this same issue. My experience is that almost every time I come back to some code from a prior project that depends on lots of scientific Python packages I have to spend hours and hours getting my code running with new versions of the dependencies. It is a royal pain in the ass and for new users they will simply stop using Python. I see clearly see this with my students. I don't generally say many good things about Matlab, but I can almost always run code written even a decade ago on new versions of Matlab with no editing needed. That is the most phenomenal feature of that language. We don't just see this in SymPy. With IPython changing the name, breaking into a of ton packages, and updating major versions every month, breaks all kinds of user code that depended on the project. It was a mess. Luckily IPython wasn't used as a library that widely.

In PyDy, for example, we try to have it function with dependency versions that are 2 years old, typically maintain compatibility with the package versions available in the Ubuntu LTS release. If you look at the PyDy code base you now find loads of `if sympy_version >= x: do this; if sympy_version >=x and sympy_version <=y: do that; ...`. It is a mess! And that's only for 2 years of API changes.

I understand that the desire for new features, better design, etc are at odds with 1-5 above, but once a project is popular enough and it is a library we unfortunately have to take 1-5 more seriously. New ideas can be matured in side projects then brought to SymPy.

I'm bringing this up because it has been a thorn in my side since we started PyDy and frustrates me so much. For example, I spent 2 hours the other morning diagnosing a test failure in PyDy that appeared without any changes to the PyDy code, diagnosing what happened in SymPy, and submitting a PR to revert the change. Now I'm having to extensively argue my point to a first time contributor that made the change which has taken another hour of my time. My hours are way more precious in my life now and I can't count how many I've had to waste because we here at SymPy because we are not strict about 1-5. I rely on SymPy to work for my students and upstream users. I can't do my job if SymPy breaks my student's code and we can't move forward. This is a big deal for me. Note that I don't believe the solution is pinning to specific library version dependencies for every last script or program I write, this this boils up as dependency hell and packaging nightmares for users, which is also one of Python's weaknesses.

I'd like some discussion about this of course, but I would like to concretely propose adding some language similar to 1-5 to our deprecation policy, have other leaders in the group on board with this, and work to train core developers in following it.

Sincerely a rant,

Jason

* I'd also like to see us follow semantic versioning, but can hold that for a future conversation. As it would require much more to implement.

Aaron Meurer

unread,
Oct 25, 2018, 4:19:52 PM10/25/18
to sy...@googlegroups.com
Thanks for writing this Jason. I agree we need to be considerate about
API compatibility.

For reference, here is our deprecation policy
https://github.com/sympy/sympy/wiki/Deprecating-policy.

I think there are two challenges with respect to this. One is that it
isn't always obvious that something will break API. I think in one of
the issues you are referring to, a return type changed simply because
some code was moved around and it caused sympify() to be called on the
result. To be fair, though, this was caught by the tests and the tests
shouldn't have been changed without some more consideration.

Secondly is that sometimes newer contributors are not are familiar
with the dangers of changing APIs.

A third issue is that sometimes it isn't clear what is and isn't API.
For example, the exact output format of a function like simplify() is
not considered API, and can change between releases without any
deprecations. We also have several functions that are not included in
__init__.py, but it isn't clear if they are "internal" or not (for
instance https://github.com/sympy/sympy/issues/15384).

My comments on your points:

> 1. We should not change existing tests unless the change in the outputs don't affect user code, e.g. if an output expression is simpler that's ok.

I agree. This is on the reviewers, unless you can think of some way to
test this automatically with the bot.

Also, we should try to write down and clarify better what is and isn't
considered API for the purposes of breakage. Like I said, changing the
output format of an expression in a way that is still mathematically
equal is generally not considered breakage. Fixing something that was
wrong should not be considered breakage either. And I'm not sure how
we can clarify which internal functions are API. There have also been
some misconceptions. For instance, a lot of the subclassing APIs use
methods that start with underscore, but these are still public APIs
because we expect public subclasses to be able to define them (so as a
random example, adding a required argument to an _eval_* method would
be a no go because it would break existing subclasses that override
it).

> 2. We should not change existing public API unless absolutely necessary. Changing API just because it isn't the optimal design is not a valid reason to change it.

I would qualify this. I think it's sometimes necessary to change an
API. For example, an existing API may be too restrictive to allow
something to be done. Also, a very important consideration is
deprecability. If the old API can be kept intact for a deprecation
period (or sometimes, even without deprecating), that makes it much
more reasonable to do a break, vs. situations where this is not
technically possible.

> 3. Deprecation warnings must be used if changes are made at these should be in place for at least a year.

Absolutely agree. Our deprecation policy outlines this pretty clearly
(although the length of the deprecation is still not specified, but a
year sounds fine to me).

> 4. Never give warnings in documentation or code that exclaim something like "this api is experimental and may change in future versions". If you make it public, then you are stuck with it.

I'm not sure I agree with this. The problem with API breakage is that
it is often just not possible, as humans, to design an API right the
first time. When this is known, we still often want to get things
merged in so that people can start to play around with them. We
actually have a sympy.sandbox module where stuff like this can go in.

I agree a major problem here is that people won't read the
documentation. Maybe one option would be to make such things private
APIs (everything starts with an underscore). This would make it
clearer that it could change, and it would actually force an API
change of at least the name when it is made public. We could also use
UserWarning for this, though non-deprecation warnings should be used
sparingly.

> 5. Never merge your own code. This "I'm merging in 24 hours if no response" stuff is bad practice. We should at least require one review of the code.

I agree here too, though it's understandable given the low ability of reviewers.

> In PyDy, for example, we try to have it function with dependency versions that are 2 years old, typically maintain compatibility with the package versions available in the Ubuntu LTS release. If you look at the PyDy code base you now find loads of `if sympy_version >= x: do this; if sympy_version >=x and sympy_version <=y: do that; ...`. It is a mess! And that's only for 2 years of API changes.

Is there any reason you don't just require the latest version of SymPy
for each PyDy release. Trying to support old SymPy releases seems like
a lot of work for you as a single developer and I'm curious what are
the benefits.

> * I'd also like to see us follow semantic versioning, but can hold that for a future conversation. As it would require much more to implement.

As the person who sets the release numbers, we are doing some variant
of semantic versioning now, in that every major release of SymPy bumps
the second number (1.2, 1.3, 1.4). I'm only using the third number for
small bugfix releases after a major release (for instance, 1.1.1 fixed
a few minor issues that popped up when 1.1 was released
https://github.com/sympy/sympy/wiki/release-notes-for-1.1.1).

I'm not really sure how we should treat the first number. I'll
probably just bump it after 1.9, but I'm open to other suggestions. By
strict semantic versioning rules, every release should bump the first
number, because every release has major changes. Our release cycle
doesn't really coincide with the design of semantic versioning. It was
designed for smaller projects that release after every pull request.

Aaron Meurer
> --
> You received this message because you are subscribed to the Google Groups "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
> To post to this group, send email to sy...@googlegroups.com.
> Visit this group at https://groups.google.com/group/sympy.
> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/CAP7f1AgxRL7gu%2Bkdg6MW7yZF4%3DdrkScg078BizoGUyqqWKUHoA%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages