Strategies for building against prebuilt binary deb and docker artifacts

249 views
Skip to first unread message

psi...@gmail.com

unread,
May 1, 2020, 10:20:23 AM5/1/20
to bazel-discuss
I have a use case where I am trying to compile `cc_library` targets that depend on libraries that are not easy to compile from source (Drake, Gazebo, etc).  I have a well-curated list of debs and docker images that contain pre-built versions of these libraries from trusted sources.

What is the best strategy to build a cc_library() against prebuilt libraries in either a large collection of debs (~200) or a docker image with these debs installed?  Note that I do not need to run a docker environment, I just need to build against the contents of the image.  I can assume I have a matching architecture/platform.  I can later use rules_docker to create a runnable image with only runtime dependencies.

Some things I have been looking at:
  • rules_pkg seems to include a deb_packages() directive, but I don't see any examples about how to use this as a deps source for a cc_library().  I would assume I would need to do something like export the contents as a filegroup(), then define a bunch of cc_library() targets against the libraries included in the debs/docker-image.  Is this possible?
  • rules_docker seems to include a dpkg_list(), but it isn't clear if there is any way to use the contents of that as a dep for a cc_library() directly.
  • It's fairly easy to expose local system libraries to bazel via cc_library() definitions with pre-built shared libraries in the srcs field, but this requires messing up my machine with all sorts of build dependencies that cannot be tracked.
  • dazel can be used to create a development docker image which does include the necessary build information, but this requires running docker at build time, and the development image size is the union of all binary dependencies in any part of the project, which seems unnecessary.  This is probably the closest solution I have found so far though.
I guess my ideal situation would be something like this for a docker-image:

load("@io_bazel_rules_docker//container:container.bzl", "container_pull")

container_pull
(
    name
= "debian_custom_base",
    digest
= "sha256:94ef41840973ca90f803d267fa2e87833e62d6ee48aa6765f9be94cfe6564a87",
    registry
= "my.docker.registry",
    repository
= "base_image_with_deps/debian10",
    tag
= "debug",
)

container_repository
(
    name
= "eigen",
    image
= "debian_custom_base",
    build_file_content
= """
        cc_library(
            name = "
eigen",
            hdrs = glob(["
/usr/include/eigen3/**/*"]),
            visibility = ["
//visibility:public"],
            strip_include_prefix
= "include/eigen3",
       
)
   
""",
)

Or something like this for a debian package set:

load("@distroless//package_manager:dpkg.bzl", "dpkg_list", "dpkg_src")

dpkg_src
(
    name
= "debian_buster",
    arch
= "amd64",
    distro
= "buster",
    sha256
= "889681a6f709a3872833643a2ab28aa5bf4839ec5a8994cd4382f179a6521c63",
    snapshot
= "20200501T025542Z",
    url
= "http://snapshot.debian.org/archive",
)

dpkg_list
(
    name
= "packages_buster",
    packages
= [
       
"libeigen-dev",
       
...
   
],
    sources
= ["@debian_buster//file:Packages.json"],
)

dpkg_repository
(
    name
= "eigen",
    build_file_content
= """
        load("@packages_buster//file:packages.bzl", "packages")

        cc_library(
            name = "eigen",
            hdrs
= glob(["/usr/include/eigen3/**/*"]),
            visibility
= ["//visibility:public"],
            strip_include_prefix
= "/usr/include/eigen3",
            deps
= [
               
packages["libeigen-dev"],  # Include transitive debs that eigen requires from the checksummed Packages.gz list.
           
],
       
)
   
""",
)


Brian Silverman

unread,
May 1, 2020, 2:47:54 PM5/1/20
to psi...@gmail.com, bazel-discuss
I see two options, which should both work for either docker images or .debs: converting to a tarball beforehand and using http_archive, or merging the tarballs with a custom repository rule. I typically do the former, so I'll talk mostly about that.

.deb files, docker images, and most other ways of packaging code are fundamentally just tarballs with metadata. For most things I build against, the metadata doesn't matter (no hook scripts that matter etc). This means you can just use the tarballs and ignore the rest.

`dpkg-deb --fsys-tarfile foo.deb` will extract the tarball for a .deb. From here, you can merge them together however you want (typically none of the files overlap, so you can just stick them together in any order). For example, if you look in http://robotics.mvla.net/spartanrobotics/releases/src/2019_frc971_software_20200103_final.tar.xz at //debian:packages.bzl, there are a set of steps for automatically bundling up a package with all of its dependencies, and some rules you could start from.

I usually end up just using the dpkg command-line tools, because they're pretty stable and widespread. If you want to avoid that, I see a couple of Python packages which can extract that tarball directly.

Each layer in a docker image is a tarball too. A docker save is a tarball of tarballs, and some JSON to tell which is which. There are some special filename conventions for deleting files ("whiteout" files), but you can probably ignore those for this use case.

If you set up something like the first deb_packages example (docker_build is the old name for container_image), you'll get everything you care about in the final layer. This means you could just grab the -layer.tar output from the container_image directly. Then, you can take this tarball and use it with http_archive to build against.

If you want to merge all the layers in a container, it's fairly straightforwards too. I have some Python code that does it in ~150 lines of pure Python, but unfortunately that's not opensource. https://github.com/jwilder/docker-squash is an example doing it in Go. `docker export` would probably also work. I figured out most of that stuff by just extracting a docker save tarball and looking around.

As I mentioned at the beginning, I've had a lot of success converting various forms of packages to tarballs as a pre-build step. I typically use bazel targets to do that, which makes it nicely reproducible and reasonably hermetic. That reduces the number of files to download when actually building the code using these dependencies, which generally performs better. However, it does increase the number of steps to add/update dependencies (edit WORKSPACE+BUILD, build, upload, edit WORKSPACE again with the new sha256). If you want, it should be straightforwards to convert any of these techniques to a repository rule which downloads the individual files and combines them while loading the repository.

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/4dbda392-61e7-4dd4-a185-74ff2a593525%40googlegroups.com.

psi...@gmail.com

unread,
May 5, 2020, 1:18:29 AM5/5/20
to bazel-discuss
Thanks! I took a look through your codebase and I think I understand the general principles.

Unfortunately, it's certainly not as trivial as I had hoped it would be.  I will try to put together a proof of concept and see how far I get: it seems like doing this properly to construct a set of hermetic repository rules is going to be a similar level of complexity as one of the `rules_*` libraries themselves.

To unsubscribe from this group and stop receiving emails from it, send an email to bazel-...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages