API Stability (Java and other)

12 views
Skip to first unread message

Tom Morris

unread,
Jul 23, 2020, 4:22:41 PM7/23/20
to openref...@googlegroups.com
I had actually drafted most of this before today's "let's document a bunch of APIs" note, although it's principally focused on the Java APIs.

Historically we've followed general industry practice for maintaining Java APIs, marking as deprecated APIs that are being retired then removing them later, after at least one version of notice. Since 2015 or so, we've been less disciplined about this and generally haven't made any attempt to provide version to version compatibility.

I'd like to see us return to providing greater API stability because I think it's necessary to encourage extension writers and protect their investment as well as to decouple extension updates from OpenRefine versions.

There is, however, a cost to maintaining compatibility. It requires a bigger investment in careful engineering and review of APIs up front and then the extra work to support both the deprecated API and the new API, for at least some period. It also would benefit from tooling to support checking to make sure that we haven't broken any of our commitments.

After wasting a couple of days restoring API compatibility for 3.5, I used one of those tools to review how we've been doing on compatibility of the Java API since 2.6 and the results are summarized in the table below along with some brief comments. The full reports available in the attached zip file for those who want to dig into the detail.

VersionBinary CompatibilitySource CompatibilityComments
2.6-rc1
2.799.4%99.4%Half of removed methods had been marked @deprecated
2.899.7%97.3%
3.087.1%85.8%Major version bump, so no compatibility guarantees
3.193.1%93.1%Incompatible Engine/Operation refactoring
3.244.3%44.3%JSON parser change
3.3100.0%98.9%But entire REST API changed for CSRF
3.4-beta298.5%96.6%
Mostly project model join, which probably should have been internal API
3.5-SNAPSHOT98.0%98.0%
Mostly archive filename changes - I think I fixed these, but haven't verified

Some random items that I noticed while doing this survey:
  • There was apparently never a 2.6 release that I could find, just 2.6-RC2?
  • Packaging has changed so that there is no longer an operefine-x.y.jar but instead a bunch of .class files (at least for Linux which I used for this survey)
  • The Linux kit has tripled in size from 40MB to 130MB. The other distributions have changed proportionally less, but the Mac kit got up to 193MB in 3.4-beta2, but then dropped to 145MB in the current snapshot release, but I'm not sure why.
If we're going to continue to attempt to maintain a stable Java API there are things that we can do to help ourselves here including:
  • being more conservative about visibility of things so that developers can use the public/protected/private visibility to understand what they can rely on and what they can't
  • don't make internal third-party classes/interfaces part of the API. We got burned by this severely with the json.org objects, so we shouldn't repeat the mistake with Jackson.
  • audit the public APIs for additional trouble spots
  • document our intent for how long we'll support interfaces, what developers can expect, etc
In addition to the Java APIs we've got other extension points that we've encouraged developers to write to including those for:
  • importers along with their associated file types, MIME types, and format guessers
  • exporters
  • commands & operations
  • UI menu items
  • extension modules (Butterfly) bundling some of the above
There are also various miscellaneous internal structures like:
  • operation history format (JSON)
  • preferences
  • templating exporter templates
So, which, if any, of these interfaces do we want to publish as stable for developers to use? What guarantees do we want to make? How much engineering effort are we willing to invest to make this supportable?

Tom

p.s. There isn't a "right" answer, so don't waste time trying to guess what it might be.

openrefine-compat.zip

Tom Morris

unread,
Aug 10, 2020, 9:07:39 PM8/10/20
to openref...@googlegroups.com
Although there's no "right answer," I am interested in feedback. Someone must have some opinions.

This is related to the recent Date datatype changes which were made in OpenRefine 3.0, but apparently not announced. I've added that to the migration documentation https://github.com/OpenRefine/OpenRefine/wiki/Migration-guide-for-extension-and-fork-maintainers. I also made a couple of other changes including clarifying what's the minimum needed for extension maintainers to upgrade as well as reversing the order of entries so that the most recent changes will be at the top of the page. Please review.

One of the things that I think we can do to help ourselves on this front is to provide more higher level methods for extension writers to use as well as being careful not to leak 3rd party datatypes in our APIs (e.g. Jackson).

Tom

Tom

Antonin Delpeuch (lists)

unread,
Aug 13, 2020, 5:38:13 AM8/13/20
to openref...@googlegroups.com
Hi Tom,

Sorry for the delay in replying to this. I found your percentages very
interesting, I did not know these tools existed.

Replying to your points inline.

On 11/08/2020 03:07, Tom Morris wrote:
> * There was apparently never a 2.6 release that I could find, just
> 2.6-RC2?

I think that's correct.

> * Packaging has changed so that there is no longer an
> operefine-x.y.jar but instead a bunch of .class files (at least
> for Linux which I used for this survey)

We could consider reintroducing jars, perhaps as separate artifacts that
people can use to build extensions. But I think it would be better to
upload these to Maven Central instead (as discussed elsewhere already,
https://github.com/OpenRefine/OpenRefine/issues/2254)

> * The Linux kit has tripled in size from 40MB to 130MB. The other
> distributions have changed proportionally less, but the Mac kit
> got up to 193MB in 3.4-beta2, but then dropped to 145MB in the
> current snapshot release, but I'm not sure why.

Perhaps the removal of test dependencies from the packaged artifacts.

>
> If we're going to continue to attempt to maintain a stable Java API
> there are things that we can do to help ourselves here including:
>
> * being more conservative about visibility of things so that
> developers can use the public/protected/private visibility to
> understand what they can rely on and what they can't

Yes, I would do this by listing the intended extension points and make
sure that everything that those extension points do not depend on are
hidden.

> * don't make internal third-party classes/interfaces part of the
> API. We got burned by this severely with the json.org
> <http://json.org> objects, so we shouldn't repeat the mistake
> with Jackson.

I'm open to proposals to isolate that, but I think this is going to be
fairly involved, duplicating a lot of the logic that Jackson provides
especially around deserialization of polymorphic types. It seems
difficult to do it without a performance loss too (switching back and
forth between String and JSON classes multiple times over the course of
a serialization).

That being said there are libraries which make their underlying JSON
library pluggable, I think (although I cannot remember which one exactly
right now, perhaps a Google library for Drive or Sheets).


> * audit the public APIs for additional trouble spots
> * document our intent for how long we'll support interfaces, what
> developers can expect, etc
>
> In addition to the Java APIs we've got other extension points that
> we've encouraged developers to write to including those for:
>
> * importers along with their associated file types, MIME types,
> and format guessers
> * exporters
> * commands & operations
> * UI menu items
> * extension modules (Butterfly) bundling some of the above
>
> There are also various miscellaneous internal structures like:
>
> * operation history format (JSON)
> * preferences
> * templating exporter templates
>
> So, which, if any, of these interfaces do we want to publish as
> stable for developers to use? What guarantees do we want to make?
> How much engineering effort are we willing to invest to make this
> supportable?

I would say all these extension points you listed above should be stable
within a given major version.

That has not been the case for 3.x, as our hand was forced (by license
and security issues). Perhaps we should have incremented the major
version for these changes? I think the main downside was to publish a
major version without big user-facing changes, perhaps users would have
been a bit confused.

Antonin

Thad Guidry

unread,
Aug 13, 2020, 8:45:56 AM8/13/20
to openref...@googlegroups.com
Run
  mvn dependency:analyze


There's quite a bit of cleanup that needs to be done.
Shall I help with that today?


Antonin Delpeuch (lists)

unread,
Aug 13, 2020, 8:56:25 AM8/13/20
to openref...@googlegroups.com
On 13/08/2020 14:45, Thad Guidry wrote:
> Run
>   mvn dependency:analyze
>
>
> There's quite a bit of cleanup that needs to be done.
> Shall I help with that today?

There might be useful things to fix but there are also spurious
warnings. Tread carefully!

Antonin

Thad Guidry

unread,
Aug 13, 2020, 9:02:00 AM8/13/20
to openref...@googlegroups.com
Yes, I see some of those spurious warnings.  Maven just doesn't know about certain things there.  BTW, Gradle is better in this regard I have seen.
I'll tread carefully with a small PR or two.

Thad

Thad Guidry

unread,
Aug 13, 2020, 9:28:43 AM8/13/20
to openref...@googlegroups.com
Antonin,

Do extensions always need this ?  so that clients can use the extensions via api ?  is that the reasoning I see it included and provided ?

    <dependency>
      <groupId>javax.servlet</groupId>
      <artifactId>servlet-api</artifactId>
      <version>2.5</version>
      <scope>provided</scope>
    </dependency>

Thad Guidry

unread,
Aug 13, 2020, 9:58:19 AM8/13/20
to openref...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages