Security risks of a publicly available remote HTTP cache

Nwky

unread,

Feb 14, 2023, 12:19:20 AM2/14/23

to bazel-discuss

Hello!

I am wondering about the potential security risks of having the remote HTTP cache publicly available.

Regarding how the files are used and stored on the cache, is the following correct?

1. Output files are stored in the /cas directory and are named with the hash of their content.

2. ActionResults are protobuf files stored under the /ac directory and are named with the hash of their action inputs. The content of those files specifies the output files that were generated in the build process.

That's how Bazel can identify whether the result of an action is in the cache and reuse the output files, right?

If that's the case, what happens if a malicious actor is able to change the content of an output file or edit an ActionResult to reference an arbitrary output file? Is Bazel going to "reuse" it and potentially execute arbitrary code?

Thanks!

Fredrik Medley

unread,

Feb 16, 2023, 2:43:59 PM2/16/23

to bazel-discuss

Correctly understood. If you have access to enumerate all the blobs in the CAS (content addressable storage), you could potentially get hold of sensitive information that users have uploaded. Otherwise, the CAS relies on the improbability of hash collisions.

For the AC (action cache), you should restrict uploads to trusted users only, e.g. your remote execution system. Even better is if your remote executors have no external network access. If you are running remote cache only, you might say that your CI servers are trusted. Yes, the whole point of the cache is to reuse the output, so Bazel will blindly trust the ActionResult messages. For arbitrary code execution, simply consider the example of a malicious actor overwriting the action result for building the protoc binary.

/Fredrik

Nwky

unread,

Feb 16, 2023, 3:45:43 PM2/16/23

to bazel-discuss

By " you could potentially get hold of sensitive information", this is only true when building private projects? Building an open-source project shouldn't generate any sensitive information, right?

For the example of arbitrary code execution, as far as I can tell this is a problem only if the compromised output gets deployed or if a Bazel run/test is executed after the build. Or are there any other ways that I'm missing?

Thanks for helping me out!

Alexandre Rostovtsev

unread,

Feb 16, 2023, 4:15:32 PM2/16/23

to bazel-discuss

For arbitrary code execution, consider a tool that bazel builds from source and then executes in order to build other things. For example, a compiler for some language, a linter, etc.

Imagine a malicious client uploads a backdoored foobar-compiler-1.0 binary into your cache. You invoke a bazel build command which somewhere in its build graph has a foobar_library target, which means invoking an action that executes foobar-compiler-1.0. Bazel happily downloads the backdoored binary from the remote cache instead of building from source, executes the backdoored binary, backdoored binary uses a kernel vuln to escape bazel sandbox and own your development workstation.

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/b44ae2f0-e09a-4180-93f5-87d0b4d4b832n%40googlegroups.com.

Fredrik Medley

unread,

Feb 16, 2023, 4:29:00 PM2/16/23

to bazel-discuss

Exactly, my example was protoc which is used for generating code during build.

Note that repository rules can see all your environment variables which may contain tokens and other sensitive information.

The Bazel sandbox on your host only tries to exclude files that are part of your workspace. All files outside your workspace, e.g. SSH keys, are available for all actions running on your host machine.

carpen...@gmail.com

unread,

Feb 20, 2023, 10:22:23 AM2/20/23

to bazel-discuss

I would suggest the trust level of the Bazel cache needs to be equivalate to or greater than your production compilation environment. Bazel trusts that there are no maliciously stored records in the cache.

If you can reproduce the hash of the inputs then you can poison the remote cache with malicious contents, for example you could have a compiler which inserts backdoors on a malicious developers machine upload the results to the cache and have other developers run a normal compilation but get back the compromised executable. You could even just write a record which claims to have been built with the correct compile but in fact you hand coded the malicious contents you put in CAS.

Nwky

unread,

Feb 20, 2023, 11:17:45 AM2/20/23

to bazel-discuss

Thank you all for your contributions on the subject.

I've successfully replicated the case where a modified action result is capable of executing arbitrary code by just referencing another file in the CAS.

Reply all

Reply to author

Forward