Bazel clean fails

26 views
Skip to first unread message

Stephen Bright

unread,
Jul 20, 2019, 1:21:34 PM7/20/19
to bazel-discuss
I am attempting to use bazel (0.15.0) build a tensorlow-serving(1.12) Docker image (with Oracle Linux 7.6 as the OS) from source

The docker build fails at the `bazel clean` step


The container is running Oracle Linux 7.6 as the base, the host is running Mac OS 10.14.5

Also, I am running `bazel shutdown` after ever bazel build command:



inside the Dockerfile:

```
RUN bazel build --color=yes --curses=yes \
    --jobs 1 --local_resources  2048,.5,1.0 \
    ${TF_SERVING_BAZEL_OPTIONS} \
    --verbose_failures \
    --output_filter=DONT_MATCH_ANYTHING \
    ${TF_SERVING_BUILD_OPTIONS} \
    tensorflow_serving/tools/pip_package:build_pip_package && bazel shutdown


UN bazel clean --expunge_async --color=yes
```


At the `bazel clean` command at the end of the process I am getting the error:

```
Step 67/68 : RUN bazel clean --expunge_async --color=yes
 ---> Running in 73abf4be6c51
Starting local Bazel server and connecting to it...
..........
INFO: Starting clean.
ERROR: /root/.cache/bazel/_bazel_root/f8d1071c69ea316497c31e40fe01608c -> /root/.cache/bazel/_bazel_root/f8d1071c69ea316497c31e40fe01608c_tmp_9 (Invalid cross-device link)
The command '/bin/sh -c bazel clean --expunge_async --color=yes' returned a non-zero code: 36
```

Philipp Wollermann

unread,
Jul 21, 2019, 3:09:28 AM7/21/19
to Stephen Bright, bazel-discuss
Hi Stephen,

I think what happens is this: You're running "bazel build" and "bazel clean" in two different RUN steps in your Dockerfile, thus Docker stores their outputs in two separate layers and thus they're two different mounts. This causes the kernel to reject creating hardlinks and atomic moves to fail between the two layers.

The first RUN step will create this directory and store it in a layer:
/root/.cache/bazel/_bazel_root/f8d1071c69ea316497c31e40fe01608c

Then the second RUN step will create this directory and assume it's on the same device, thus atomic moves between the two should work:
/root/.cache/bazel/_bazel_root/f8d1071c69ea316497c31e40fe01608c_tmp_9

"bazel clean --expunge_async" requires atomic moves to work, otherwise it cannot work safely.

I would recommend:
1) Don't use "--expunge_async" in a container building context. The "async" means it will run in the background, which is nice for interactive sessions so that the developer doesn't have to wait for the clean to finish before they can continue, but when building a container, it just adds complexity (and thus causes the failure) and either doesn't speed things up (in case "docker build" waits for it to finish) or doesn't actually clean anything (in case it doesn't wait). Just a normal "bazel clean --expunge" should work fine.
2) By running the "clean" step in a separate RUN step, the first RUN step will still cause that layer to contain all the intermediate output files and thus the container size to be very big. If this is not intended, I would recommend to pull it into the same RUN step as the build.

What about:

RUN \
  bazel build [...]  && \
  cp bazel-out/... somewhere && \
  bazel clean --expunge

(Note that "bazel clean --expunge" will also shutdown the server, so you don't need the separate "bazel shutdown" step then.)

Hope this helps :)

Cheers,
Philipp
--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/89bb28f3-9af7-4845-a7a2-ba4d898a7775%40googlegroups.com.


--
Philipp Wollermann | Software Engineer | phi...@google.com
Google Germany GmbH | Erika-Mann-Straße 33 | 80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Reply all
Reply to author
Forward
0 new messages