too many downloads

222 views
Skip to first unread message

Martin Probst

unread,
Feb 15, 2018, 4:53:04 AM2/15/18
to bazel-discuss, Alex Eagle
Hi,

I'm using bazel on github.com/angular/tsickle, and a number of other projects. I've been recently trying to work on these projects in a limited connectivity environment (on a train, thanks for the great Wifi Deutsche Bahn :-/).

This turns out to be very problematic, borderline impossible. My WORKSPACEs depend on several external tools (nodejs, go toolchain). I haven't quite understood under what exact circumstances, but after many operations (git rebase, check out a new branch?, move to a different project path), bazel re-downloads the same large distribution files over and over again.

E.g.:
ERROR: /Users/martinprobst/lsrc/tsickle/test/BUILD:73:1: every rule of type nodejs_test implicitly depends upon the target '@nodejs//:node', but this target could not be found because of: no such package '@nodejs//': java.io.IOException: Error downloading [https://mirror.bazel.build/nodejs.org/dist/v8.9.1/node-v8.9.1-darwin-x64.tar.gz, https://nodejs.org/dist/v8.9.1/node-v8.9.1-darwin-x64.tar.gz] to /private/var/tmp/_bazel_martinprobst/b50d5b890249591e47db69431e34e5fb/external/nodejs/node-v8.9.1-darwin-x64.tar.gz: All mirrors are down: [Unknown host: mirror.bazel.build, Unknown host: nodejs.org]

I totally get that this needs to be downloaded once, but what actually seems to happen is that this file downloads over and over again, and I cannot predict after what operations. What's worse, apparently (?) the downloaded file doesn't even get cached when the build fails after the download succeeds. 

These files are commonly in the hundreds of MBs, so loosing the cached file for some reason and having to re-download it means you cannot work for the remainder of your trip.

It seems like bazel invalidates its downloaded file cache together with some hash key of the workspace or so, instead of just taking the sha of the downloaded file. It also seems like downloads aren't shared between repositories.

Is this a known issue?

Martin

Klaus Aehlig

unread,
Feb 15, 2018, 5:16:11 AM2/15/18
to Martin Probst, bazel-discuss, Alex Eagle

Hi,

> [...] check out a new branch?, move to a different project path), bazel
> re-downloads the same large distribution files over and over again.
> [...] Is this a known issue?

yes. And unfortunately, there are some experimental solutions, but the
long-term vision is still a bit unclear.

* If a hash-sum is specified (which is a good idea anyway),
--experimental_repository_cache; don't forget to create an empty directory
where the path points to (if it does not point to an existing directory,
this option is silently ignored).

It caches distribution files based on the specified hash sum (where sha1 and
sha256 are considered diffrent cache keys and the file is only cached under
the key that was specifed on download).

The is no pruning of the cache, so obsolete entries have to be removed
manually (somehow you're supposed to tell if a file is needed or not by
its hash; or you just buy more disk space).

* With bazel at head, there is an additional --experimental_distdir option.
It can be specified several times and before each download, all specified
directories are checked for a file with the same basename and if found,
the hash sum is used to verify it is indeed the correct file (so again,
specifying a hash sum is necessary, and again, it's a good idea anyway).

This allows to manually download all files ahead of time and also makes
it more easy to share distfiles with other build tools
(e.g., --experimental_distdir=/usr/ports/distfiles).

This option handles the directories read-only, so fetching and clean up
has to be done manually.

I hope you can live with these experimental features for the time being.

Best,
Klaus


--
Klaus Aehlig
Google Germany GmbH, Erika-Mann-Str. 33, 80636 Muenchen
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschaeftsfuehrer: Paul Terence Manicle, Halimah DeLaine Prado

Martin Probst

unread,
Feb 15, 2018, 5:31:32 AM2/15/18
to Klaus Aehlig, bazel-discuss, Alex Eagle
Thanks Klaus, that helps. We do have hash sums specified for all our deps, so this should help.

Now back to fighting with "ERROR: /private/var/tmp/_bazel_martinprobst/b50d5b890249591e47db69431e34e5fb/external/local_config_cc/BUILD:50:5: in apple_cc_toolchain rule @local_config_cc//:cc-compiler-darwin_x86_64: Xcode version must be specified to use an Apple CROSSTOOL".

Martin Probst

unread,
Feb 15, 2018, 7:49:37 AM2/15/18
to Klaus Aehlig, bazel-discuss, Alex Eagle
Odd, this doesn't seem to have any effect?

$ cat ~/.bazelrc
# Globally cache downloaded artifacts.
build --experimental_repository_cache=/Users/martinprobst/.bazel_cache/
test --experimental_repository_cache=/Users/martinprobst/.bazel_cache/
run --experimental_repository_cache=/Users/martinprobst/.bazel_cache/

$ bazel test ...
[ ... ]
$ ls -la /Users/martinprobst/.bazel_cache/
total 0
drwxr-xr-x   2 martinprobst  eng    64 15 Feb 11:26 .
drwxr-xr-x+ 94 martinprobst  eng  3008 15 Feb 12:47 ..

Any pointers?

Klaus Aehlig

unread,
Feb 15, 2018, 8:35:49 AM2/15/18
to Martin Probst, bazel-discuss, Alex Eagle
On Thu, Feb 15, 2018 at 01:49:13PM +0100, Martin Probst wrote:
> Odd, this doesn't seem to have any effect?
>
> $ cat ~/.bazelrc
> # Globally cache downloaded artifacts.
> build --experimental_repository_cache=/Users/martinprobst/.bazel_cache/
> test --experimental_repository_cache=/Users/martinprobst/.bazel_cache/
> run --experimental_repository_cache=/Users/martinprobst/.bazel_cache/
>
> $ bazel test ...
> [ ... ]
> $ ls -la /Users/martinprobst/.bazel_cache/
> total 0
> drwxr-xr-x 2 martinprobst eng 64 15 Feb 11:26 .
> drwxr-xr-x+ 94 martinprobst eng 3008 15 Feb 12:47 ..
>
> Any pointers?

I'll probably need more data to debug. Just one things to keep in mind:
the cache is only filled if the artifact is actually downloaded by bazel;
a value already valid (e.g., because the WORKSPACE hasn't changed) is
not put into the cache. Also, which WORKSPACE rules are you using precisely?

Martin Probst

unread,
Feb 15, 2018, 9:17:25 AM2/15/18
to Klaus Aehlig, bazel-discuss, Alex Eagle
I did see a download of the nodejs release, corresponding to this workspace rule:

http_archive(
    name = "build_bazel_rules_nodejs",
    strip_prefix = "rules_nodejs-0.4.1",
    sha256 = "e9bc013417272b17f302dc169ad597f05561bb277451f010043f4da493417607",
)

load("@build_bazel_rules_nodejs//:defs.bzl", "check_bazel_version", "node_repositories")

Anything I can do to reproduce/debug this more?

rodr...@google.com

unread,
Feb 15, 2018, 9:22:27 AM2/15/18
to bazel-discuss
Do you have a .bazelrc in your workspace that would override your ~/.bazelrc?
Reply all
Reply to author
Forward
0 new messages