How to always prefer highest version of a dependency?

900 views
Skip to first unread message

Christopher Kilian

unread,
Mar 28, 2022, 9:29:48 PM3/28/22
to bazel-discuss
My project has several instances of targets where there are multiple versions of the same dependency (e.g. commons-io-1.3.2 and commons-io-2.5 both exist in the project and may both end up in the transitive closure of a target). When the same dependency shows up, I want to always have the newest version get chosen to be highest up in the classpath. Is there a way I can do this?

I tried playing around with the JavaInfo provider to see if I could create a rule to sort the order of the "transitive_compile_time_jars" and "transitive_runtime_jars" fields, but JavaInfo objects are immutable and there doesn't seem to be an easy way to create a new JavaInfo object from an existing JavaInfo object.

Alex Humesky

unread,
Mar 28, 2022, 11:03:35 PM3/28/22
to Christopher Kilian, bazel-discuss
To the extent possible, it's usually most straightforward to have just 1 version of every library:

Since it sounds like you're aiming to choose e.g. commons-io-2.5 for the final binary anyway, can you just have everything depend on commons-io-2.5 (and delete commons-io-1.3.2)? Trying to swap out different versions along the build graph sounds like you might just end up with runtime errors, or other problems that will be hard to debug.

If you're using rules_jvm_external to get these dependencies, there's some documentation on how to pick one version:

If you're working with external workspaces that have different versions of some other external workspace, you can override the transitive dependencies from the top-level workspace file with repo_mapping:

On Mon, Mar 28, 2022 at 9:29 PM 'Christopher Kilian' via bazel-discuss <bazel-...@googlegroups.com> wrote:
My project has several instances of targets where there are multiple versions of the same dependency (e.g. commons-io-1.3.2 and commons-io-2.5 both exist in the project and may both end up in the transitive closure of a target). When the same dependency shows up, I want to always have the newest version get chosen to be highest up in the classpath. Is there a way I can do this?

I tried playing around with the JavaInfo provider to see if I could create a rule to sort the order of the "transitive_compile_time_jars" and "transitive_runtime_jars" fields, but JavaInfo objects are immutable and there doesn't seem to be an easy way to create a new JavaInfo object from an existing JavaInfo object.

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/c70ee45e-5f41-4332-91cd-b3530e8978b3n%40googlegroups.com.

On Mon, Mar 28, 2022 at 9:29 PM 'Christopher Kilian' via bazel-discuss <bazel-...@googlegroups.com> wrote:
My project has several instances of targets where there are multiple versions of the same dependency (e.g. commons-io-1.3.2 and commons-io-2.5 both exist in the project and may both end up in the transitive closure of a target). When the same dependency shows up, I want to always have the newest version get chosen to be highest up in the classpath. Is there a way I can do this?

I tried playing around with the JavaInfo provider to see if I could create a rule to sort the order of the "transitive_compile_time_jars" and "transitive_runtime_jars" fields, but JavaInfo objects are immutable and there doesn't seem to be an easy way to create a new JavaInfo object from an existing JavaInfo object.

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/c70ee45e-5f41-4332-91cd-b3530e8978b3n%40googlegroups.com.

Christopher Kilian

unread,
Mar 29, 2022, 12:18:31 AM3/29/22
to bazel-discuss
Hey Alex,

Thanks for the quick response. There are a couple of situations where we do need the old dependency. For example, with the following hierarchy:

       A          B
    /    \    /     \
  C       D          E
/             \
F (2.0)     F (1.0)

We want A to resolve to F 2.0, but we still need B to resolve to F 1.0. We can't easily get rid of F 1.0 at the moment.

We're currently migrating one of our repositories to Bazel, and the previous build system had this behavior enabled, so we'd like to replicate the behavior as much as we can if possible.

Thanks!

Chris

Alex Humesky

unread,
Apr 5, 2022, 9:09:53 PM4/5/22
to Christopher Kilian, bazel-discuss
On Tue, Mar 29, 2022 at 12:18 AM 'Christopher Kilian' via bazel-discuss <bazel-...@googlegroups.com> wrote:
Hey Alex,

Thanks for the quick response. There are a couple of situations where we do need the old dependency. For example, with the following hierarchy:

       A          B
    /    \    /     \
  C       D          E
/             \
F (2.0)     F (1.0)

We want A to resolve to F 2.0, but we still need B to resolve to F 1.0. We can't easily get rid of F 1.0 at the moment.

Thanks for the diagram. Just to clarify, does B depend (directly) on F 1.0, and A (directly) on F 2.0 ? With Java strict deps, B cannot use anything in F 1.0 (via D), and A cannot use anything in F 2.0 (via C)

If at least B doesn't use anything in F, then it seems you could switch D to depend on F 2.0 and B would be none-the-wiser (this is one of the advantages of Java strict deps).

I'll assume though that A or C needs classes in F 2.0 and B needs classes in B 1.0 (i.e. you have strict deps off, or there are direct dependencies not shown in the diagram), and D can work with either F 1.0 or F 2.0.

The JavaInfo constructor is mainly for rules which themselves do java compilation of some kind, so it's not super convenient for this kind of classpath manipulation. Something like this worked with --strict_java_deps=off, more investigating would be needed to sort that out:

defs.bzl:

def _pick_highest_version_impl(ctx):
 
  empty_jar = ctx.actions.declare_file(ctx.label.name + "_empty.jar")
  ctx.actions.run_shell(
    outputs = [empty_jar],
    arguments = [empty_jar.path],
    command = "echo | zip -q > $1 && zip -dq $1 -")
 
  merged = java_common.merge([d[JavaInfo] for d in ctx.attr.deps])

  sorted_java_info = JavaInfo(
    output_jar = empty_jar,
    compile_jar = None,
    deps = [
        # use empty jar to avoid having to pass a real jar for the compile-time-only deps
        JavaInfo(output_jar = empty_jar, compile_jar = jar)
            for jar in reversed(sorted(merged.transitive_compile_time_jars.to_list()))
    ],
    runtime_deps = [
        JavaInfo(output_jar = jar, compile_jar = jar)
            for jar in reversed(sorted(merged.transitive_runtime_jars.to_list()))
    ],
  )

  return [sorted_java_info]

pick_highest_version = rule(
  implementation = _pick_highest_version_impl,
  attrs = {
    "deps": attr.label_list(),
  },
)


BUILD:

load("//:defs.bzl", "pick_highest_version")

java_binary(
  name = "Foo",
  srcs = ["Foo.java"],
  deps = [
    ":pick_highest_version",
  ],
)

pick_highest_version(
  name = "pick_highest_version",
  deps = [
    # relies on naming convention
    "//java/dep/v1:dep",
    "//java/dep/v2:dep",
  ],
)


Even if it were easy to manipulate the JavaInfo providers, adjusting the classpath may or may not actually work in general, because javac will freely inline constants (some Strings and primitives) into compiled classes. So there may be constants from F 2.0 that get compiled into C, and different values of those constants from F 1.0 that get compiled into D. So you could see problems at runtime, even if all the classes and methods work out between the different versions.

There are other ways to go about substituting the classes:

1) Make two versions of F 1.0: the existing version, and a neverlink version (i.e. set neverlink = True, which tells the java rules not to include the classes from that library in the deploy jar). Have D depend on the neverlink version of F 1.0. At runtime of A, D will see the classes from F 2.0 included via C. Then, add F 1.0 to the runtime_deps of B (if it's not already in deps). Then at the runtime of B, D will see the classes from F 1.0

2) Basically reimplement the deploy jar creation logic of java_binary (assuming you're using deploy jars). Make a rule that can take 1 java_binary and that has an aspect that traverses the dependencies of the java_binary to collect the jars, and picks the jars to include in the final deploy jar. (singlejar can be used to combine the jars, which is what java_binary does). (This one is the most automatic of all the options, again assuming there's some file naming convention you can rely on to pick the jars)

But again in general these could suffer from possible hard-to-diagnose runtime problems.

There are a few things you could do to do this in a safer way:

1) Create 2 D targets, one which depends on F 1.0 for B, and another that depends on F 2.0 for A. Using "parallel targets" is in some ways the easiest, but it can get pretty unwieldy depending on how often you have to do this, and how deep your dependencies go, because now you have lots of similar targets and mistakes can happen. Macros, naming conventions, and visibility (i.e. make D that depends on F 1.0 visible only to B) can make this a little easier, though things can get tricky if you already have macros.

2) Use a select() and flags to say which version to use in the build. Something like this:

e.g. third_party/F/BUILD

    load("@bazel_skylib//rules:common_settings.bzl", "string_flag")

    string_flag(
        name = "version",
        values = ["1.0", "2.0"],
        build_setting_default = "2.0",
    )

    config_setting(
      name = "use_F_v1_0",
      flag_values = {
        ":version": "1.0",
      }
    )

    config_setting(
      name = "use_F_v2_0",
      flag_values = {
        ":version": "2.0",
      }
    )

    alias(
      name = "F",
      actual = select({
        "use_F_v1_0": "//third_party/F/v1.0:F_v1_0",
        "use_F_v2_0": "//third_party/F/v2.0:F_v2_0",
      }),
      visibility = ["//visibility:public"],
    )

e.g. third_party/F/v1.0/BUILD

    java_library(
      name = "F_v1_0",
      ....
      visibility = ["//third_party/F:__pkg__"],
    )

e.g. third_party/F/v2.0/BUILD

    java_library(
      name = "F_v2_0",
      ....
      visibility = ["//third_party/F:__pkg__"],
    )


Then everything depends on the alias //third_party/F:F, and the version to use is determined by the default value "2.0", or the flag --//third_party/F:version=1.0

The problems with this are that you need to know that when building B, you have to set that flag on the command line, and it also means that you can't build A and B at the same time (i.e. in the same invocation of bazel), because this selection is made for the entire build.

3) #2 can be extended to use configuration transitions to avoid having to use a top-level flag.


Note that there are possibly performance and memory scaling considerations with using transitions, because they can cause the build graph to grow very large.

This is somewhat tricky to get right, so I included a more complete example. There are a few considerations in addition to the scaling considerations:
- The top-level target still has to care about its transitive dependencies
- It's tricky to get all the details right. In order to transition the java_binary to a configuration that sets the version of F to use, there has to be some rule (_java_binary_multi_version_deps below) on top of the java_binary to do the transition (transitions are only attached to attributes or to rules themselves, and we can't modify java_binary to do that from Starlark). This means that you no longer have the java_binary to work with, you have the transition rule. "Forwarding" rules or "wrapper" rules are not very well supported in Starlark, e.g. _java_binary_multi_version_deps below has to explicitly forward java_binary's providers, the runfiles, the deploy jar (and some providers are not available in Starlark).

But with this, it's possible to build both binaries at the same time, and intermediate libraries (C here) get the correct inlined constants:

$ bazel build java/A:A_deploy.jar java/B:B_deploy.jar
INFO: Analyzed 2 targets (51 packages loaded, 1093 targets configured).
INFO: Found 2 targets...
INFO: Elapsed time: 8.737s, Critical Path: 7.25s
INFO: 27 processes: 11 internal, 10 linux-sandbox, 6 worker.
INFO: Build completed successfully, 27 total actions

$ bazel-bin/java/A/A_deploy.jar
Version of F in A is 2

$ bazel-bin/java/B/B_deploy.jar
Version of F in B is 1

$ unzip -d /tmp/A-C bazel-bin/java/A/A_deploy.jar C/C.class

$ javap -c /tmp/A-C/C/C.class
Compiled from "C.java"
public class C.C {
  public C.C();
    Code:
       0: aload_0
       1: invokespecial #1                  // Method java/lang/Object."<init>":()V
       4: return

  public static int getFVersion();
    Code:
       0: iconst_2
       1: ireturn
}

$ unzip -d /tmp/B-C bazel-bin/java/B/B_deploy.jar C/C.class
Archive:  bazel-bin/java/B/B_deploy.jar
  inflating: /tmp/B-C/C/C.class      

$ javap -c /tmp/B-C/C/C.class
Compiled from "C.java"
public class C.C {
  public C.C();
    Code:
       0: aload_0
       1: invokespecial #1                  // Method java/lang/Object."<init>":()V
       4: return

  public static int getFVersion();
    Code:
       0: iconst_1
       1: ireturn
}

Example code:

WORKSPACE:
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
http_archive(
    name = "bazel_skylib",
    urls = [
        "https://mirror.bazel.build/github.com/bazelbuild/bazel-skylib/releases/download/1.0.2/bazel-skylib-1.0.2.tar.gz",
        "https://github.com/bazelbuild/bazel-skylib/releases/download/1.0.2/bazel-skylib-1.0.2.tar.gz",
    ],
    sha256 = "97e70364e9249702246c0e9444bccdc4b847bed1eb03c5a3ece4f83dfe6abc44",
)
load("@bazel_skylib//:workspace.bzl", "bazel_skylib_workspace")

bazel_skylib_workspace()
defs.bzl:
def _java_binary_multi_version_deps_transition_impl(settings, attr):
    # update each flag according to the values give to the target's version
    # attribute
    new_settings = dict(settings)
    new_settings.update(attr.dep_versions)
    return new_settings

_java_binary_multi_version_deps_transition = transition(
    implementation = _java_binary_multi_version_deps_transition_impl,
    inputs = ["//java/F:version"],
    outputs = ["//java/F:version"],
)

def _java_binary_multi_version_deps_impl(ctx):
  binary = ctx.attr.binary[0]
  # forward deploy jar
  ctx.actions.symlink(
      output=ctx.outputs.deploy_jar,
      target_file=ctx.attr.binary_deploy_jar[0].files.to_list()[0])
  # forward providers
  return [
      binary[InstrumentedFilesInfo],
      binary[JavaInfo],
      binary[OutputGroupInfo],
      DefaultInfo(
          files = binary.files,
          data_runfiles = binary.data_runfiles,
          default_runfiles = binary.default_runfiles),
  ]

_java_binary_multi_version_deps = rule(
  implementation = _java_binary_multi_version_deps_impl,
  attrs = {
    "binary": attr.label(
        mandatory = True,
        cfg = _java_binary_multi_version_deps_transition),
    "binary_deploy_jar": attr.label(
        mandatory = True,
        allow_single_file = True,
        cfg = _java_binary_multi_version_deps_transition),
    "dep_versions": attr.string_dict(),
    "_allowlist_function_transition": attr.label(
      default = "@bazel_tools//tools/allowlists/function_transition_allowlist"),
  },
  # This is deprecated, but java_binary's deploy jar is an implicit output.
  outputs = {
    "deploy_jar": "%{name}_deploy.jar",
  },
)

def java_binary_multi_version_deps(name, **attrs):
  dep_versions = attrs.pop("dep_versions", {})
  visibility = attrs.pop("visibility", [])

  java_binary_name = "_" + name
  native.java_binary(
      name = java_binary_name,
      visibility = ["//visibility:private"],
      **attrs)

  _java_binary_multi_version_deps(
      name = name,
      binary = java_binary_name,
      # The deploy jar is not put in an output group, so it must be forwarded
      # explicitly.
      binary_deploy_jar = java_binary_name + "_deploy.jar",
      dep_versions = dep_versions,
      visibility = visibility,
  )
java/A/A.java:
package A;

import C.C;

public class A {
  
  public static void main(String[] args) {
    System.out.println("Version of F in A is " + C.getFVersion());
  }
}
java/A/BUILD:
load("//:defs.bzl", "java_binary_multi_version_deps")

java_binary_multi_version_deps(
  name = "A",
  srcs = ["A.java"],
  main_class = "A.A",
  deps = ["//java/C:C"],
  dep_versions = {
    "//java/F:version": "2.0",
  },
)
java/B/B.java:
package B;

import C.C;

public class B {
  
  public static void main(String[] args) {
    System.out.println("Version of F in B is " + C.getFVersion());
  }
}
java/B/BUILD:
load("//:defs.bzl", "java_binary_multi_version_deps")

java_binary_multi_version_deps(
  name = "B",
  srcs = ["B.java"],
  main_class = "B.B",
  deps = ["//java/C:C"],
  dep_versions = {
    "//java/F:version": "1.0",
  },
)
java/C/BUILD:
java_library(
  name = "C",
  srcs = ["C.java"],
  deps = ["//java/F:F"],
  visibility = ["//visibility:public"]
)
java/C/C.java:
package C;

import F.F;

public class C {
  
  public static int getFVersion() {
    return F.version;
  }
}
java/F/BUILD:
load("@bazel_skylib//rules:common_settings.bzl", "string_flag")

string_flag(
    name = "version",
    values = ["1.0", "2.0"],
    build_setting_default = "2.0",
)

config_setting(
  name = "use_F_v1_0",
  flag_values = {
    ":version": "1.0",
  }
)

config_setting(
  name = "use_F_v2_0",
  flag_values = {
    ":version": "2.0",
  }
)

alias(
  name = "F",
  actual = select({
    "use_F_v1_0": "//java/F/v1.0:F_v1_0",
    "use_F_v2_0": "//java/F/v2.0:F_v2_0",
  }),
  visibility = ["//visibility:public"],
)
java/F/v1.0/BUILD:
java_library(
  name = "F_v1_0",
  srcs = ["F.java"],
  visibility = ["//java/F:__pkg__"],
)
java/F/v1.0/F.java:
package F;

public class F {
  public static final int version = 1;
}
java/F/v2.0/BUILD:
java_library(
  name = "F_v2_0",
  srcs = ["F.java"],
  visibility = ["//java/F:__pkg__"],
)
java/F/v2.0/F.java:
package F;

public class F {
  public static final int version = 2;
}

java-library-multi-version-select-transition.zip

Christopher Kilian

unread,
Apr 6, 2022, 8:33:09 PM4/6/22
to bazel-discuss
Wow, thanks for the detailed response!

I wan experimenting with a version of pick_highest_version but I was using the jars in the transitive_*_jars to create a single jar and passing it into the JavaInfo constructor, but it was too slow to be viable. The version with the empty jar is working much better!


Right now we're mid-migration for one of our repositories to Bazel, so using the rule will help us have parity between the build systems without changing the project structure in a way that may impact the other build system. Once the migration is complete, we'll want to look into ways to make this more robust, and your other examples will be extremely helpful to have.

Thanks again for the super detailed explanation!

Chris
Reply all
Reply to author
Forward
0 new messages