Killer feature: Package/API Metadata

51 views
Skip to first unread message

Evan Chan

unread,
Mar 24, 2013, 4:14:06 AM3/24/13
to adep...@googlegroups.com
I have a crazy idea that I wanted to share with you guys, and could be a killer feature of adept.

What if we stored more than the hash of the binary jar?  What if we stored a hash of the jar's API as well?
For example, for each jar, a list of the packages in the jar, and a hash of all the classes (and possibly method signatures) in each package.

    akka.actors     299adc50
    akka.fsm          923495

This would give developers some crazy powerful tools:
- Which newer versions of my current dependencies are likely to break my current code?  Find out with adept, well before you try out your code!  No more wasting time trying different versions of Guava!
- Easily search all the jars out there by package (or even class name), and how different are their APIs?
- Predict classpath conflicts ahead of time.  Compute the package/API hash of all the library jars that come with Hadoop, and see if my Hadoop job will conflict.   This would be absolutely huge for the Hadoop community.  Ample such opportunities with other frameworks.
- For library authors: warn or enforce semantic versioning.  If some % of your API hash differs, then you need to bump your minor version #.

All of this information could be generated (albeit slowly) from existing Maven repos.

Down the line, this metadata may get us closer to the ideal of having a true module system, or having a build system that can bring in the right dependencies for you just by looking at your import statements.  

Would love your comments.  :)

-Evan

Fredrik Ekholdt

unread,
Mar 24, 2013, 6:24:37 AM3/24/13
to adep...@googlegroups.com
I think this is an awesome idea! Even just Did-the-API-just-change warning woudl be great

-Evan

Paul Phillips

unread,
Mar 24, 2013, 2:24:37 PM3/24/13
to adep...@googlegroups.com

On Sun, Mar 24, 2013 at 1:14 AM, Evan Chan <vel...@gmail.com> wrote:
What if we stored more than the hash of the binary jar?  What if we stored a hash of the jar's API as well?
For example, for each jar, a list of the packages in the jar, and a hash of all the classes (and possibly method signatures) in each package.

We've batted this around for a while - more in the context of having scalac automatically generating such hashes so binary incompatibilities could easily be diagnosed. But it fits better with a language-agnostic tool.

Evan Chan

unread,
Mar 24, 2013, 6:58:09 PM3/24/13
to adep...@googlegroups.com
I realized the devil is in the details.  

The most useful thing for people is, are the APIs they are using right now going to change?  However, this requires storing all of the API details with each package, because the details will get hashed away.

Here is a reasonable proposal, I think:
- Compute the hash of a package's API in such a way that you can deduce how far the API has changed between any two versions/hashes, just by subtraction.
  (for example, hash code = 10 * (sum of class name hashcodes) + (sum of method signature hashcodes).  A new class causes the overall hashcode to increase much more than a change in a method signature).
- For a given package, adept would be able to estimate how much the API has changed in future versions.   For example, for Akka:

package akka.actor
    v2.0.3:      +3%
    v2.0.4:      +5%
    v2.1.0:      +20%

If someone wanted the details of the API changes, then they (or Adept) could download the jars and actually examine the API differences, perhaps using a tool such as Clirr.

I can see this being used by companies to enforce certain compatibility standards.

-Evan

Mark Harrah

unread,
Mar 25, 2013, 9:38:35 AM3/25/13
to adep...@googlegroups.com
Overall, I would also like to see better support for compatibility automation...

On Sun, 24 Mar 2013 15:58:09 -0700 (PDT)
Evan Chan <vel...@gmail.com> wrote:

> I realized the devil is in the details.
>
> The most useful thing for people is, are the APIs they are using right now
> going to change? However, this requires storing all of the API details
> with each package, because the details will get hashed away.
>
> Here is a reasonable proposal, I think:
> - Compute the hash of a package's API in such a way that you can deduce how
> far the API has changed between any two versions/hashes, just by
> subtraction.
> (for example, hash code = 10 * (sum of class name hashcodes) + (sum of
> method signature hashcodes). A new class causes the overall hashcode to
> increase much more than a change in a method signature).

A reasonable hash code, such as of the class name String, will be reasonably evenly distributed among ints, so there isn't a meaningful sense of 10*<hash-code> being more or less than <hash-code>. There would have to be hash codes for method signatures and class names that would give that meaning and it would probably still overflow often enough to ruin it.

> - For a given package, adept would be able to estimate how much the API has
> changed in future versions. For example, for Akka:
>
> package akka.actor
> v2.0.3: +3%
> v2.0.4: +5%
> v2.1.0: +20%

This would make for a nice demo and that alone might be worth it. In practice, as you say, we ultimately care whether it is drop-in binary and/or source compatible. Hashes will give an approximation to that, but there will be situations where it is compatible for a particular client but the hashes don't have enough information to prove it. How common are those situations and is this useful despite that?

If you could say "the current binary API of the package or class is 4a3b83 + 9e223f", then you'd know that something that runs against 4a3b83 will work with the current API. Doing that seems plausible, but it only handles additions and not deprecations/deletions. It seems like deletions make it more complicated, but maybe there's a nice solution waiting here.

As far as the dependency manager is concerned, if this isn't used for dependency resolution, it is just additional metadata. As long as there is a way to encode custom structured information in the metadata, adept is fine. It is only when the information is needed for resolution. Something like "I need binary version 4a3b83 of akka" would be involved in resolution, but "akka v2.0.3 is +3% different from akka v2.0.4" would not and could be provided by code/a plugin on top of adept.

It is overall a good thing to continue to discuss and it could still fall within the scope of adept. I'm just trying to establish the core feature set that would support these additional (but possibly killer) features.

-Mark

> If someone wanted the details of the API changes, then they (or Adept)
> could download the jars and actually examine the API differences, perhaps
> using a tool such as Clirr.
>
> I can see this being used by companies to enforce certain compatibility
> standards.
>
> -Evan
>
>
>
> On Sunday, March 24, 2013 10:24:37 AM UTC-8, Paul Phillips wrote:
> >
> >
> > On Sun, Mar 24, 2013 at 1:14 AM, Evan Chan <vel...@gmail.com <javascript:>

Evan Chan

unread,
Mar 29, 2013, 3:44:56 AM3/29/13
to adep...@googlegroups.com
Mark,

On Mon, Mar 25, 2013 at 6:38 AM, Mark Harrah <dmha...@gmail.com> wrote:
Overall, I would also like to see better support for compatibility automation...

Not just that, but richer metadata.  I was bitten again by a problem where one jar includes all of another jar's dependencies, way more than are needed.    Most build tools are too much of a pain for people to figure out how to mark a dep as not transitive, etc. etc.
 
>   (for example, hash code = 10 * (sum of class name hashcodes) + (sum of
> method signature hashcodes).  A new class causes the overall hashcode to
> increase much more than a change in a method signature).

A reasonable hash code, such as of the class name String, will be reasonably evenly distributed among ints, so there isn't a meaningful sense of 10*<hash-code> being more or less than <hash-code>.  There would have to be hash codes for method signatures and class names that would give that meaning and it would probably still overflow often enough to ruin it.

The overflow problem can be easily solved.  Just use a LONG to sum up a bunch of INT hashcodes. 

> - For a given package, adept would be able to estimate how much the API has
> changed in future versions.   For example, for Akka:
>
> package akka.actor
>     v2.0.3:      +3%
>     v2.0.4:      +5%
>     v2.1.0:      +20%

This would make for a nice demo and that alone might be worth it.  In practice, as you say, we ultimately care whether it is drop-in binary and/or source compatible.  Hashes will give an approximation to that, but there will be situations where it is compatible for a particular client but the hashes don't have enough information to prove it.  How common are those situations and is this useful despite that?

Yes, this would be good to do a POC on.
 

If you could say "the current binary API of the package or class is 4a3b83 + 9e223f", then you'd know that something that runs against 4a3b83 will work with the current API.  Doing that seems plausible, but it only handles additions and not deprecations/deletions.  It seems like deletions make it more complicated, but maybe there's a nice solution waiting here.

I was thinking an additive solution would be great too, but it seems difficult to do in practice.  Maybe one based on diff'ing the API changes; groups of additions get a unique hashcode, and changes to existing APIs cause existing hashcodes to be changed too.  Tracking all the chunks of changes efficiently would not be fun though, and not sure if it's worth our spare time.
 

As far as the dependency manager is concerned, if this isn't used for dependency resolution, it is just additional metadata.  As long as there is a way to encode custom structured information in the metadata, adept is fine.  It is only when the information is needed for resolution.  Something like "I need binary version 4a3b83 of akka" would be involved in resolution, but "akka v2.0.3 is +3% different from akka v2.0.4" would not and could be provided by code/a plugin on top of adept.

It is overall a good thing to continue to discuss and it could still fall within the scope of adept.  I'm just trying to establish the core feature set that would support these additional (but possibly killer) features.

Yes that's true.  
For me what I could use right now, is to repackage a jar using adept (Netflix Astyanax) and have it not transitively pull in all of Cassandra's deps.
Another one is to promote "provided" dependencies correctly (currently provided deps aren't included in pom, and you have to include it in the build file of the app / library user).
It seems this information cannot be encapsulated in current POM.

-Evan
  

-Mark

> If someone wanted the details of the API changes, then they (or Adept)
> could download the jars and actually examine the API differences, perhaps
> using a tool such as Clirr.
>
> I can see this being used by companies to enforce certain compatibility
> standards.
>
> -Evan
>
>
>
> On Sunday, March 24, 2013 10:24:37 AM UTC-8, Paul Phillips wrote:
> >
> >
> > On Sun, Mar 24, 2013 at 1:14 AM, Evan Chan <vel...@gmail.com <javascript:>
> > > wrote:
> >
> >> What if we stored more than the hash of the binary jar?  What if we
> >> stored a hash of the jar's API as well?
> >> For example, for each jar, a list of the packages in the jar, and a hash
> >> of all the classes (and possibly method signatures) in each package.
> >>
> >
> > We've batted this around for a while - more in the context of having
> > scalac automatically generating such hashes so binary incompatibilities
> > could easily be diagnosed. But it fits better with a language-agnostic tool.
> >
> >



--
Because the people who are crazy enough to think they can change the world,
are the ones who do.     -- Steve Jobs
Reply all
Reply to author
Forward
0 new messages