maven_jar with checksums?

449 views
Skip to first unread message

Shawn Pearce

unread,
Mar 25, 2015, 2:14:57 AM3/25/15
to bazel-...@googlegroups.com
I was looking at maven_jar and I am a little disappointed to have it missing a checksum like http_jar has.

When we put Gerrit Code Review onto Buck we wrote our own maven_jar() rule that accepts SHA-1s and verifies on download. This has been very valuable to us as a project and we would hate to lose that level of verification if we moved to Bazel.

I know SHA-256 is the new thing, but Maven Central publishes SHA-1 hashes for the files it serves up. Being able to at least include that in our build rules to verify the file is likely to be what we think it is before using it has caught a number of broken caches.


On another note, maven_jar() in Bazel takes the group_id, artifact_id and version as 3 separate parameters. But then uses "group:artifact" in the exclude. This is... awkward to have two different syntaxes for similar things.

Gerrit chose to use "group:artifact:version[:classifier]" for its maven_jar() rule because this is the format used by Apache Buildr; you can copy and paste it directly from search.maven.org.


Finally... maven_jar() needs a classifier option. :)


Lukács T. Berki

unread,
Mar 25, 2015, 6:21:42 AM3/25/15
to Shawn Pearce, Kristina Chodorow, bazel-...@googlegroups.com

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To post to this group, send email to bazel-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/CAH%2BXAAqSDLC1_JNt%2BkBcQQb1D4MRGixtCKn1A-DtK6Ea%3DmAR0Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Kristina Chodorow

unread,
Mar 25, 2015, 9:25:59 AM3/25/15
to Lukács T. Berki, Shawn Pearce, bazel-...@googlegroups.com
On Wed, Mar 25, 2015 at 6:21 AM, Lukács T. Berki <lbe...@google.com> wrote:


On Wed, Mar 25, 2015 at 7:14 AM, 'Shawn Pearce' via bazel-discuss <bazel-...@googlegroups.com> wrote:
I was looking at maven_jar and I am a little disappointed to have it missing a checksum like http_jar has.

When we put Gerrit Code Review onto Buck we wrote our own maven_jar() rule that accepts SHA-1s and verifies on download. This has been very valuable to us as a project and we would hate to lose that level of verification if we moved to Bazel.

Agreed that'd be valuable, I've added an issue for this: https://github.com/google/bazel/issues/57


I know SHA-256 is the new thing, but Maven Central publishes SHA-1 hashes for the files it serves up. Being able to at least include that in our build rules to verify the file is likely to be what we think it is before using it has caught a number of broken caches.


On another note, maven_jar() in Bazel takes the group_id, artifact_id and version as 3 separate parameters. But then uses "group:artifact" in the exclude. This is... awkward to have two different syntaxes for similar things.

Gerrit chose to use "group:artifact:version[:classifier]" for its maven_jar() rule because this is the format used by Apache Buildr; you can copy and paste it directly from search.maven.org. 

Sounds like it would be better to combine the three (optionally four) fields into one string in the maven_jar rule?


Finally... maven_jar() needs a classifier option. :)

Members of bazel-discuss should have access to this design doc (unfortunately I can't figure out a way to just make it public): https://docs.google.com/document/d/1RmEan3cnUldQiUBT0IySuKoisQdeuMX79D2cMluuezo/edit.  If you can't see it, here's the eventual plan:

maven_jar(
    name = "appengine",
    group_id = "com.google.appengine",
    artifact_id = "appengine-api-1.0-sdk",
    version = "1.9.17",
    type_and_classifier = [
        {type = "jar", classifier = "debug"},
        {type = "war"}, {classifier = "src"}
    ], # Optional
    repositories = [
        "repo.maven.apache.org", 
        "uk.maven.org",
    ]
)

Comments welcome.

Han-Wen Nienhuys

unread,
Mar 25, 2015, 10:10:31 AM3/25/15
to Kristina Chodorow, Lukács T. Berki, Shawn Pearce, bazel-...@googlegroups.com
On Wed, Mar 25, 2015 at 2:25 PM, 'Kristina Chodorow' via bazel-discuss
<bazel-...@googlegroups.com> wrote:
>>> On another note, maven_jar() in Bazel takes the group_id, artifact_id and
>>> version as 3 separate parameters. But then uses "group:artifact" in the
>>> exclude. This is... awkward to have two different syntaxes for similar
>>> things.
>>>
>>>
>>> Gerrit chose to use "group:artifact:version[:classifier]" for its
>>> maven_jar() rule because this is the format used by Apache Buildr; you can
>>> copy and paste it directly from search.maven.org.
>
>
> Sounds like it would be better to combine the three (optionally four) fields
> into one string in the maven_jar rule?
>
>>>
>>> Finally... maven_jar() needs a classifier option. :)
>
>
> Members of bazel-discuss should have access to this design doc
> (unfortunately I can't figure out a way to just make it public):
> https://docs.google.com/document/d/1RmEan3cnUldQiUBT0IySuKoisQdeuMX79D2cMluuezo/edit.

Can you copy the text into a new doc that is external to google? We
should start discussing these designs outside of the google.com silo.

--
Han-Wen Nienhuys
Google Munich
han...@google.com

Shawn Pearce

unread,
Mar 25, 2015, 1:15:40 PM3/25/15
to Kristina Chodorow, Lukács T. Berki, bazel-...@googlegroups.com
On Wed, Mar 25, 2015 at 6:25 AM, Kristina Chodorow <kcho...@google.com> wrote:
On Wed, Mar 25, 2015 at 6:21 AM, Lukács T. Berki <lbe...@google.com> wrote:
On Wed, Mar 25, 2015 at 7:14 AM, 'Shawn Pearce' via bazel-discuss <bazel-...@googlegroups.com> wrote:

I know SHA-256 is the new thing, but Maven Central publishes SHA-1 hashes for the files it serves up. Being able to at least include that in our build rules to verify the file is likely to be what we think it is before using it has caught a number of broken caches.

Bah. I forgot that we actually have sha1, bin_sha1, and src_sha1.

There are some JARs we pull where we use the Java sources in our build and we want to verify we got the correct JAR down for those. Might be possible to just write a different maven_jar() rule with classifier = "sources" instead of lumping that into a single rule.

On another note, maven_jar() in Bazel takes the group_id, artifact_id and version as 3 separate parameters. But then uses "group:artifact" in the exclude. This is... awkward to have two different syntaxes for similar things.

Gerrit chose to use "group:artifact:version[:classifier]" for its maven_jar() rule because this is the format used by Apache Buildr; you can copy and paste it directly from search.maven.org. 

Sounds like it would be better to combine the three (optionally four) fields into one string in the maven_jar rule?

We found it easier, especially with a large number of maven_jar() rules for our project. Each rule was slightly shorter with an id field taking 1 line vs. group/artifact/version taking 3 lines.

Unfortunately we do a fair amount of string concat, e.g.:

  VERS = '3.7.0.201502260915-r.58-g65c379e'

  maven_jar(
    name = 'jgit',
    id = 'org.eclipse.jgit:org.eclipse.jgit:' + VERS,
  )

  maven_jar(
    name = 'jgit-servlet',
    id = 'org.eclipse.jgit:org.eclipse.jgit.http.server:' + VERS,
  )

 
Finally... maven_jar() needs a classifier option. :)

Members of bazel-discuss should have access to this design doc (unfortunately I can't figure out a way to just make it public): https://docs.google.com/document/d/1RmEan3cnUldQiUBT0IySuKoisQdeuMX79D2cMluuezo/edit.  If you can't see it, here's the eventual plan:

maven_jar(
    name = "appengine",
    group_id = "com.google.appengine",
    artifact_id = "appengine-api-1.0-sdk",
    version = "1.9.17",
    type_and_classifier = [
        {type = "jar", classifier = "debug"},
        {type = "war"}, {classifier = "src"}

I... don't understand what this would do. I can't even guess given my basic knowledge of Maven.

Thomas Broyer

unread,
Mar 26, 2015, 4:35:35 AM3/26/15
to bazel-...@googlegroups.com, kcho...@google.com, lbe...@google.com
IIUC, a maven_jar creates a "repository" with several artifacts in there (see the bind#actual value at http://bazel.io/docs/build-encyclopedia.html#maven_jar) so my understanding would be that this creates @appengine//jar_debug (or something like that), @appengine//war and @appengine//src. So you could possibly add a sha1 to those type_and_classifier entries: { type = "jar", classifier = "debug", sha1 = "…" }
But contrary to the maven_jar in Bucklets, maven_jar in Bazel reads the POM (so it would need a sha1 too) and downloads transitive dependencies (how'd you provide the sha1 for those then?)

1½ year ago, in the context of Buck, I proposed an additional, preliminary step (e.g. using Ivy or Aether) generating discrete rules for each artifact.
Put differently, you define all your external (Maven) dependencies, along with rules (à la dependencyManagement in Maven, or more complex things like in Gradle, see http://gradle.org/docs/current/userguide/dependency_management.html#sub:client_module_dependencies and http://gradle.org/docs/current/userguide/dependency_management.html#N1542D), and you get rules you can depend on for each declared dependency, and with complete control on the transitive dependencies. Use cases (these are real use-cases I deal with in a project, currently built with Gradle):
  • you use several dependencies that all depend on Guava or Guice in different versions, and you possibly want to use yet another version, but at least align the version;
  • you depend on something that depends on Guice with some extensions, you don't use those extensions yourself, but you need to align all versions on the version of Guice you're using;
  • you need to replace some dependency with another, due to licensing issues for example (e.g. replace javax.activation:activation or javax.servlet:javax.servlet-api which are CDDL with their Geronimo equivalent under Apache v2; or jcip-annotations which are under Creative Commons Attribution, with an Apache v2 equivalent), or bad practices of depending on "bundles" (e.g. mockito-all instead of mockito-core, or the old junit instead of junit-dep – this has now been fixed, so you need to replace junit-dep transitive deps with junit)
  • you need to tell the build tool that two dependencies are actually the same (e.g. javassist:javassist or javax.servlet:servlet-api are the old coordinates for org.javassist:javassist and javax.servlet-api:javax.servlet-api respectively)
  • you need to exclude some transitive dependencies (e.g. I use google-oauth-client only for parsing and serializing objects, I don't need the httpclient dependency)
With the "repository" concept in Bazel, you can probably come up with a DSL for that in BUILD (or WORKSPACE?) files that would do everything needed during a "bazel fetch" (instead of a first step to generate BUILD files followed by a fetch using those files). If you don't, then I think leaving all that to an external tool (like I proposed for Buck at the time) and not resolve transitive dependencies (Bucklets' maven_jar is actually just a thin wrapper around an equivalent to Bazel's http_jar, just generating the URL from the coordinates) is better (that said, one could still use an external tool generating http_jar rules with my approach…)

Disclaimer: I haven't yet even tried Bazel and am just starting reading the docs, and I never actually experimented with the Ivy+Buck idea.

Kristina Chodorow

unread,
Mar 28, 2015, 12:44:00 PM3/28/15
to Thomas Broyer, bazel-...@googlegroups.com, Lukács T. Berki
IIUC, a maven_jar creates a "repository" with several artifacts in there (see the bind#actual value at http://bazel.io/docs/build-encyclopedia.html#maven_jar) so my understanding would be that this creates @appengine//jar_debug (or something like that), @appengine//war and @appengine//src. So you could possibly add a sha1 to those type_and_classifier entries: { type = "jar", classifier = "debug", sha1 = "…" }
But contrary to the maven_jar in Bucklets, maven_jar in Bazel reads the POM (so it would need a sha1 too) and downloads transitive dependencies (how'd you provide the sha1 for those then?)

I updated the design doc and copied it to a document I could make public (which unfortunately lost all of the comments that were on it): https://docs.google.com/document/d/1LIF_CXwamK6MAkLDekS83TlqL50sS7GqL9ocfwcmAu8/edit#.  Please feel free to comment.
 

1½ year ago, in the context of Buck, I proposed an additional, preliminary step (e.g. using Ivy or Aether) generating discrete rules for each artifact.
Put differently, you define all your external (Maven) dependencies, along with rules (à la dependencyManagement in Maven, or more complex things like in Gradle, see http://gradle.org/docs/current/userguide/dependency_management.html#sub:client_module_dependencies and http://gradle.org/docs/current/userguide/dependency_management.html#N1542D), and you get rules you can depend on for each declared dependency, and with complete control on the transitive dependencies. Use cases (these are real use-cases I deal with in a project, currently built with Gradle):
  • you use several dependencies that all depend on Guava or Guice in different versions, and you possibly want to use yet another version, but at least align the version;
  • you depend on something that depends on Guice with some extensions, you don't use those extensions yourself, but you need to align all versions on the version of Guice you're using;
  • you need to replace some dependency with another, due to licensing issues for example (e.g. replace javax.activation:activation or javax.servlet:javax.servlet-api which are CDDL with their Geronimo equivalent under Apache v2; or jcip-annotations which are under Creative Commons Attribution, with an Apache v2 equivalent), or bad practices of depending on "bundles" (e.g. mockito-all instead of mockito-core, or the old junit instead of junit-dep – this has now been fixed, so you need to replace junit-dep transitive deps with junit)
  • you need to tell the build tool that two dependencies are actually the same (e.g. javassist:javassist or javax.servlet:servlet-api are the old coordinates for org.javassist:javassist and javax.servlet-api:javax.servlet-api respectively)
  • you need to exclude some transitive dependencies (e.g. I use google-oauth-client only for parsing and serializing objects, I don't need the httpclient dependency)
With the "repository" concept in Bazel, you can probably come up with a DSL for that in BUILD (or WORKSPACE?) files that would do everything needed during a "bazel fetch" (instead of a first step to generate BUILD files followed by a fetch using those files).

I added a bit to https://docs.google.com/document/d/1LIF_CXwamK6MAkLDekS83TlqL50sS7GqL9ocfwcmAu8/edit#heading=h.twhwz2g1yedf about dependencies... I think it'll be a lot easier for Bazel if people have to specify the transitive dependencies, but a lot harder for the users.  I suggested adding a command to generate all the dependencies in WORKSPACE-file format so people can copy-paste, then customize as needed (which I think would take care of most of the issues you mention above?).
 
If you don't, then I think leaving all that to an external tool (like I proposed for Buck at the time) and not resolve transitive dependencies (Bucklets' maven_jar is actually just a thin wrapper around an equivalent to Bazel's http_jar, just generating the URL from the coordinates) is better (that said, one could still use an external tool generating http_jar rules with my approach…)

Disclaimer: I haven't yet even tried Bazel and am just starting reading the docs, and I never actually experimented with the Ivy+Buck idea.

Thanks for all the input! 

Thomas Broyer

unread,
Mar 28, 2015, 8:38:36 PM3/28/15
to bazel-...@googlegroups.com, t.br...@gmail.com, lbe...@google.com


On Saturday, March 28, 2015 at 5:44:00 PM UTC+1, Kristina Chodorow wrote:
IIUC, a maven_jar creates a "repository" with several artifacts in there (see the bind#actual value at http://bazel.io/docs/build-encyclopedia.html#maven_jar) so my understanding would be that this creates @appengine//jar_debug (or something like that), @appengine//war and @appengine//src. So you could possibly add a sha1 to those type_and_classifier entries: { type = "jar", classifier = "debug", sha1 = "…" }
But contrary to the maven_jar in Bucklets, maven_jar in Bazel reads the POM (so it would need a sha1 too) and downloads transitive dependencies (how'd you provide the sha1 for those then?)

I updated the design doc and copied it to a document I could make public (which unfortunately lost all of the comments that were on it): https://docs.google.com/document/d/1LIF_CXwamK6MAkLDekS83TlqL50sS7GqL9ocfwcmAu8/edit#.  Please feel free to comment.

I'll add some of the comments below to the doc too.
 
 

1½ year ago, in the context of Buck, I proposed an additional, preliminary step (e.g. using Ivy or Aether) generating discrete rules for each artifact.
Put differently, you define all your external (Maven) dependencies, along with rules (à la dependencyManagement in Maven, or more complex things like in Gradle, see http://gradle.org/docs/current/userguide/dependency_management.html#sub:client_module_dependencies and http://gradle.org/docs/current/userguide/dependency_management.html#N1542D), and you get rules you can depend on for each declared dependency, and with complete control on the transitive dependencies. Use cases (these are real use-cases I deal with in a project, currently built with Gradle):
  • you use several dependencies that all depend on Guava or Guice in different versions, and you possibly want to use yet another version, but at least align the version;
  • you depend on something that depends on Guice with some extensions, you don't use those extensions yourself, but you need to align all versions on the version of Guice you're using;
  • you need to replace some dependency with another, due to licensing issues for example (e.g. replace javax.activation:activation or javax.servlet:javax.servlet-api which are CDDL with their Geronimo equivalent under Apache v2; or jcip-annotations which are under Creative Commons Attribution, with an Apache v2 equivalent), or bad practices of depending on "bundles" (e.g. mockito-all instead of mockito-core, or the old junit instead of junit-dep – this has now been fixed, so you need to replace junit-dep transitive deps with junit)
  • you need to tell the build tool that two dependencies are actually the same (e.g. javassist:javassist or javax.servlet:servlet-api are the old coordinates for org.javassist:javassist and javax.servlet-api:javax.servlet-api respectively)
  • you need to exclude some transitive dependencies (e.g. I use google-oauth-client only for parsing and serializing objects, I don't need the httpclient dependency)
With the "repository" concept in Bazel, you can probably come up with a DSL for that in BUILD (or WORKSPACE?) files that would do everything needed during a "bazel fetch" (instead of a first step to generate BUILD files followed by a fetch using those files).

I added a bit to https://docs.google.com/document/d/1LIF_CXwamK6MAkLDekS83TlqL50sS7GqL9ocfwcmAu8/edit#heading=h.twhwz2g1yedf about dependencies... I think it'll be a lot easier for Bazel if people have to specify the transitive dependencies, but a lot harder for the users.

IIUC, you basically made maven_jar an equivalent of http_jar with the URL generated from repositories and artifact, and with a sha1 instead of sha256 checksum.
This is basically what Bucklets' maven_jar macro does too: https://gerrit.googlesource.com/bucklets/+/d2936a48fc559e90b66e83de7ff163202e75486b/maven_jar.bucklet (Buck now has an undocumented remote_file rule and "buck fetch" command, but Bucklets doesn't (yet?) use it).

It's not clear how you'd declare the transitive dependencies: there's no 'deps' or 'exports' attribute to the maven_jar rule, so does it mean you'd have to define a filegroup? or list all dependencies as deps to your java_library?
Buck's prebuilt_jar has a 'deps' attribute, and Bucklets' maven_jar macro has such an attribute too, that it passes to the generated prebuilt_jar (as a side note, Buck's prebuilt_jar does not have an 'exported_deps' attribute, so maven_jar fakes it with an intermediate java_library that depends on the prebuilt_jar).

Not sure why you'd need 'repositories' to be multi-valued too, except maybe if you want to use a variable there for all your maven_jar rules without caring which one comes from which repo. But when you write your maven_jar rule you know which repo it's available in.

Note: you'll probably want to add some configuration property to allow overriding the repositories, so that people building, say, an opensource project at his company could use the company's repository manager that mirrors/caches all the repositories used in the build. It's common practice to do that at least in CI servers. Or would Bazel respect the ~/.m2/settings.xml configuration for mirrors?
 
I suggested adding a command to generate all the dependencies in WORKSPACE-file format so people can copy-paste, then customize as needed (which I think would take care of most of the issues you mention above?).

+1, good idea.
When I wrote my blog post, Buck didn't have a remote_file rule, so I was thinking of an external tool that would download the JARs and generate prebuilt_jar rules. Buck now has a remote_file rule and "buck fetch" command, so you could have your external tool only generate the prebuilt_jar and remote_file rules, and let "buck fetch" do the downloading.
With built-in first-class support for Maven JARs though (similar to the prebuilt_jar+remote_file rules in Buck), having a command that generates the transitive dependencies from one maven_jar would probably solve 60% use-cases.
Let's be honest though: without "version conflict mediation" (at a minimum), this is going to be painful: e.g. I use closure-templates (soy) which depends on Guice 3 and Guava 14, but I use Guice 4.0-beta5 and Guava 18 in my project, the command would then likely generate the Guice 3 and Guava 14 rules, and I'd have to remove (Guava 14) or tweak (Guice 3: Soy uses Guice AssistedInject and Multibindings, which I don't use, so I'd have to change their version to 4.0-beta5, and lookup their sha1s).

An alternative could be to have some "maven_import" rule referencing a pom.xml where you list all your dependencies (and could use excludes and dependencyManagement, and maybe even declare repositories), and have it download all of them transitively (respecting dependencyManagement and excludes, and do conflict resolution – e.g. I use Args4j 2.0.31, Soy depends on 2.0.26) and expose them as files within the 'maven_import' "remote repository" (e.g. "//external:maven-deps/com.google.template/soy/2015-03-27/soy-2015-03-27.jar"). The downside of that approach is that you cannot tweak it further, but you'd have the same capabilities as any Maven user (i.e. not perfect, but usable).
For more complex use-cases, one could still use a third-party tool that generate http_jar+bind rules, as I proposed 1½ years ago (caveat: how'd you declare transitive dependencies?)
Reply all
Reply to author
Forward
0 new messages