TTLs for CAS entries

172 views
Skip to first unread message

Jakob Buchgraber

unread,
May 8, 2019, 2:56:34 PM5/8/19
to Remote Execution APIs Working Group
Hi,

as some of you know in recent months we have been working on virtualizing build outputs in Bazel also referred to as "Builds without the Bytes" [1]. That is, Bazel will no longer routinely fetch most action outputs but only download them on demand.

While this feature brings along many benefits it also creates new engineering challenges. One hard problem are transient build outputs. That is, a remote execution system must ensure that for each build (think bazel build //:target) an output file created by action 1 is still available to action 1000. If its evicted from the CAS before that it's not possible for Bazel to recover as it does no longer have a local copy of the output file.

Technically, Bazel could rerun the generating action of an evicted action output. However for this to be robust it requires all actions to be fully reproducible as else one rerun action could invalidate arbitrary / large parts of the build graph. More generally with Bazel rerunning actions whenever an output goes missing we can no longer guarantee that a build will terminate. So while it's well possible to implement I am not convinced that it would be a smart thing to implement and that we'll be able to build a robust solution around this.

In search for alternative solutions that promise to be more robust I am turning to the API:

message GetActionResultResponse {
  ActionResult result = 1;

  // The minimum amount of time that the remote system guarantees all action outputs
  // to be available for download from the CAS. A value of '0' indicates that no such
  // guarantees can be made.
  Duration outputs_durability = 2;
}

So effectively this would ask servers to maintain TTLs for blobs stored in the CAS. outputs_durability would be the lowest TTL value of all action outputs (files + directories). If an outputs_durability value is too low for a client (i.e. Bazel) it could either decide to download the outputs or call Execute(do_no_cache=true). We'd also add the same outputs_durability field to ExecuteResponse. 

I do realise that this is not a fully spec'd out proposal, but I hope it brings across the idea and is enough to start a conversation (right before our meeting tomorrow). I'd be most interested to hear anyone's feedback / concerns / ideas.

Best,
Jakob


Jakob Buchgraber

Software Engineer


Google Germany GmbH

Erika-Mann-Straße 33

80636 München


Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg


Diese E-Mail ist vertraulich. Falls sie diese fälschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.

    

This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.


Ola Rozenfeld

unread,
May 8, 2019, 3:34:25 PM5/8/19
to Jakob Buchgraber, Remote Execution APIs Working Group
If servers need to maintain TTLs for all items in CAS anyway, then why do we need to return these TTLs in the API? You can just "touch" a blob every time it is accessed (increase its TTL), including by Action Cache. That's what RBE does. It wasn't too simple to implement, but now RBE guarantees that any items returned by Action Cache exist in the CAS. I understand that other server implementations might not be as robust by design (memory-only CAS, shards going down, etc), but that case isn't addressed by your proposal anyway, right?

--
You received this message because you are subscribed to the Google Groups "Remote Execution APIs Working Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to remote-execution...@googlegroups.com.
To post to this group, send email to remote-exe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/remote-execution-apis/CAGQ4vn0vM6wH5o26yaL87E9T%2BVTCzx69dLo8ijb0k%3D1_L5LGYg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Drew Gassaway

unread,
May 8, 2019, 4:00:14 PM5/8/19
to Ola Rozenfeld, Jakob Buchgraber, Remote Execution APIs Working Group
Doesn't the ResultsCachePolicy priority field in Execution specs already allow for flexibility here, along with exposing a server's allowable ranges via the Capabilities API?

Eric Burnett

unread,
May 8, 2019, 4:03:56 PM5/8/19
to Ola Rozenfeld, Jakob Buchgraber, Remote Execution APIs Working Group
I'll ask a similar question to Ola in a different way: what problem does this solve? I'd hypothesize that one of:
  1. The blobs will all live long enough, and so it's sufficient for clients to assume it. No actual need to inform them specifically
  2. Blobs may *not* live long enough, but because they have explicit TTLs that are shorter. In theory informing clients of this so it could keep them alive would make sense, but that's additional client-size logic needed too, and I know of no existing system this is likely to apply to.
  3. Blobs may *not* live long enough, because there's no explicit TTL mechanism (pressure-based evictions, say). Such a system wouldn't be able to implement this signal anyways. 
My hypothesis is that it's sufficient for now and a while into the future for clients to simply assume blobs will live long enough. Whomever runs a client and a remote execution systems will need to make sure this is true, one way or another, but it's arguably easier out-of-band than in-band.



Ola Rozenfeld

unread,
May 8, 2019, 4:21:00 PM5/8/19
to Drew Gassaway, Jakob Buchgraber, Remote Execution APIs Working Group
Drew: ResultsCachePolicy priority is orthogonal to this, I think. That feature allows the client to set a field to tell the server that some particular Action Cache results should live longer than others; the current problem that Jakob is trying to address, IIUC, is the lack of explicit commitment in the API between the Action Cache results availability and its child CAS blobs availability, even for results within a particular priority.

Steven Bergsieker

unread,
May 8, 2019, 4:26:16 PM5/8/19
to Ola Rozenfeld, Drew Gassaway, Jakob Buchgraber, Remote Execution APIs Working Group
This is an interesting theoretical problem, but I don't think it's worth investing in a solution until we know that it's a real-world problem.

However, I do think that Bazel needs to fail with a reasonable error message, likely pointing to the administrator of the remote cache, if it encounters a case of a cached action where the underlying blobs are not available. Note that this would still be helpful even if the cache returned a TTL--if blobs suddenly go missing despite the TTL, it's something the administrator would likely want to know about!

Ola Rozenfeld

unread,
May 8, 2019, 4:39:51 PM5/8/19
to Steven Bergsieker, Drew Gassaway, Jakob Buchgraber, Remote Execution APIs Working Group
TL;DR: Buildfarm does not provide the AC-CAS cross availability guarantees that RBE has, which was not a problem until now, because Bazel has a functionality to rerun the action every time it encounters a cache miss on outputs. However, after Jakob's no-intermediate-outputs change it became a bigger issue, because we are only aware of a cache miss much later when we'd need to rewind a huge sequence of actions, potentially.

Erik Mavrinac

unread,
May 8, 2019, 4:43:43 PM5/8/19
to Jakob Buchgraber, Remote Execution APIs Working Group

FYI the caches for the Microsoft BuildXL and internal build engines running in our datacenters all use the idea of pinning content from the CAS on behalf of a session (typically equivalent to a build session), which combines a presence check for a known set of hashes (say, checking a build process’s predicted inputs and known previous outputs from other sessions) with an extension on any internal TTLs to ensure the content remains for the lifetime of the session. A session has an implicit garbage collection behind it that prevents pinning too long if the service equivalent of a finally{} block is not executed by the build engine.

The pin call is directed at the local CAS microservice but propagates outward to the datacenter cache metadata stores and file cleaners. Knowledge of TTLs is not needed or wanted by the clients.

There are still race conditions where a successful pinning still results in a retrieval failure if all replicas disappear between the pinning and retrieval for build. The cache/CAS engineers cannot prevent it, they just have a target of <1 in 10E8 occurrences, as this results in a full build retry.



Steven Bergsieker

unread,
May 8, 2019, 4:48:10 PM5/8/19
to Ola Rozenfeld, Drew Gassaway, Jakob Buchgraber, Remote Execution APIs Working Group
But in this case, the "correct" solution is likely either to increase the size of the CAS (which effectively increases the TTL for blobs when pressure-based eviction is used) or to decrease the amount of data that Bazel is storing in the CAS.

Having Bazel reexecute Actions and store the results in a CAS that is already under pressure will just lead to thrashing the CAS and Bazel not making progress. It's a cascading failure condition.

George Gensure

unread,
May 8, 2019, 5:04:44 PM5/8/19
to Steven Bergsieker, Ola Rozenfeld, Drew Gassaway, Jakob Buchgraber, Remote Execution APIs Working Group
FWIW I used to have a silent and blocking findMissingBlobs for ActionCache results that accomplished the same behaviors listed - reset on the access time and guarantee of hit response predicated on likely availability of outputs - but that previous bazel behavior made it unnecessary to do so (through augmentations in the client and heuristics on the server).

We're definitely missing an SLA for the likelihood of a missing piece of content as a function of time relative to a successful AC fetch: feel free to correct me if I am wrong, but I don't believe that any documentation on the REAPI dictates that AC results should be invalidated when their blobs go away.

As Ola mentions, we have dual triggers for missing content - the TTL (and ensuing CAS thrash) as one, but the loss of shards of CAS content without warning. Both of these are mitigable, as Steven and basic scaling/availability principles dictate respectively, but it's more about the lack of defined behavior in these time slices, and the propensity of the client to abort, giving me as a service implementor no recourse to improve the situation (without forking the client).

In any case, there is the possibility of evictions. Real time elapses between receiving the ActionResult and downloading the contents, and the behavior of a build to fail (which means any number of concurrent failures) due to a piece of state missing that would have been recoverable in a non-minimal download state is not desirable. Within a single build context I believe there *should* be a way to compensate for this, but this is obviously at odds with the complexity of configurable download behavior, the further complexity of action rewinds, and the growing disinterest in local fallback.

Jakob Buchgraber

unread,
May 8, 2019, 5:18:29 PM5/8/19
to Eric Burnett, Ed Schouten, Ola Rozenfeld, Remote Execution APIs Working Group
  1. Blobs may *not* live long enough, but because they have explicit TTLs that are shorter. In theory informing clients of this so it could keep them alive would make sense, but that's additional client-size logic needed too, and I know of no existing system this is likely to apply to.
I have talked with +Ed Schouten about this in the past and IIUC BuildBarn would be such a system and Bazel would implement the client side logic.
  1. Blobs may *not* live long enough, because there's no explicit TTL mechanism (pressure-based evictions, say). Such a system wouldn't be able to implement this signal anyways. 
Such a system would signal that it doesn't support TTLs and thus the client would fallback to downloading the outputs.
 
My hypothesis is that it's sufficient for now and a while into the future for clients to simply assume blobs will live long enough.

I think at the moment this is only true for RBE because you have O(month) TTLs and unlimited cloud storage. I don't see how this could possibly be true for on premise systems?

Jakob Buchgraber

unread,
May 8, 2019, 5:28:18 PM5/8/19
to Steven Bergsieker, Ola Rozenfeld, Drew Gassaway, Remote Execution APIs Working Group
Having Bazel reexecute Actions and store the results in a CAS that is already under pressure will just lead to thrashing the CAS and Bazel not making progress. It's a cascading failure condition.

I'd like to point out that blobs being evicted from the CAS don't necessarily mean that the CAS is under pressure. It's totally reasonable to always run a CAS at full capacity and evict on new writes so that in steady state every write will evict one or more blobs.

Eric Burnett

unread,
May 8, 2019, 5:29:06 PM5/8/19
to George Gensure, Steven Bergsieker, Ola Rozenfeld, Drew Gassaway, Jakob Buchgraber, Remote Execution APIs Working Group
Ahh, I missed the wrinkle that nobody is currently *extending* TTLs when returning an action cache entry, so it's not "assume files from start of build exist at end", but "assume files from arbitrarily far into the past still exist". Thanks for pointing that out George - you're right, that's a missing detail that needs to be addressed somewhere.

I think documenting on the REAPI either "services SHOULD confirm file existence and extend TTLs if necessary before returning ActionCache entries" or "clients SHOULD call FindMissingBlobs to confirm file existence before proceeding" would be appropriate; my preference is also for it to be server-side. 

Jakob, does that also address the BuildBarn use-case? I.e. is it safe enough to assume TTLs >= build durations, or is that also too strong of an assumption.

George, I wasn't quite clear from your response - would that suffice from your perspective? 

Jakob Buchgraber

unread,
May 8, 2019, 5:35:09 PM5/8/19
to Ola Rozenfeld, Remote Execution APIs Working Group
If servers need to maintain TTLs for all items in CAS anyway, then why do we need to return these TTLs in the API? You can just "touch" a blob every time it is accessed (increase its TTL), including by Action Cache. That's what RBE does. It wasn't too simple to implement, but now RBE guarantees that any items returned by Action Cache exist in the CAS. I understand that other server implementations might not be as robust by design (memory-only CAS, shards going down, etc), but that case isn't addressed by your proposal anyway, right?

RBE's implementation is one possible solution and totally reasonable. I think we can either decide to explicitly require all implementers of the API to do things similar to RBE or, as I am proposing, leave it up to the implementers.

Jakob Buchgraber

unread,
May 9, 2019, 5:18:00 AM5/9/19
to Eric Burnett, Ed Schouten, Ed Baunton, George Gensure, Steven Bergsieker, Ola Rozenfeld, Drew Gassaway, Remote Execution APIs Working Group
I also should have explicitly mentioned this in my e-mail. The "assume files from arbitrarily far into the past still exist" was one of the main motivations for my initial e-mail.
I didn't want the API to dictate the storage implementation of a remote system. If all current (and future) implementers are fine with this for now then so am I.


Jakob Buchgraber

Software Engineer


Google Germany GmbH

Erika-Mann-Straße 33

80636 München


Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg


Diese E-Mail ist vertraulich. Falls sie diese fälschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.

    

This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.



Ed Baunton

unread,
May 10, 2019, 8:31:53 AM5/10/19
to Jakob Buchgraber, Eric Burnett, Ed Schouten, George Gensure, Steven Bergsieker, Ola Rozenfeld, Drew Gassaway, Remote Execution APIs Working Group
I'm not sure TTLs help pin things in the cache: the selection of the TTL value implies you know how long your build will take (which is impossible to know if the ES is shared with many hundreds of developers). What happens if you set it to days or months but run out of space? You are then back to the original problem.

I think the selection of how long things can live in the CAS can only be made by the ES/CAS itself given the context of the actions depending on them. i.e. You might pin things longer based on who is sending the request. The same for prioritisation of actions.

On our side, we do not have infinite storage and plan to run our storage infrastructure at 99% usage and simply operate on LRU policy. We will touch files on FMB like RBE does. TTL is not useful for this and makes matter more complex; it will also increase the chances of CAS exhaustion.

I think the BuildXL approach of having 'sessions' might be a more sound approach – that could be implemented using e.g. the correlated_invocations_id [1] feature in RE to detect sessions. Then when you see a session go away you can clear up after it.

George Gensure

unread,
May 10, 2019, 1:58:01 PM5/10/19
to Eric Burnett, Steven Bergsieker, Ola Rozenfeld, Drew Gassaway, Jakob Buchgraber, Remote Execution APIs Working Group
On Wed, May 8, 2019 at 5:29 PM Eric Burnett <ericb...@google.com> wrote:
Ahh, I missed the wrinkle that nobody is currently *extending* TTLs when returning an action cache entry, so it's not "assume files from start of build exist at end", but "assume files from arbitrarily far into the past still exist". Thanks for pointing that out George - you're right, that's a missing detail that needs to be addressed somewhere.

I think documenting on the REAPI either "services SHOULD confirm file existence and extend TTLs if necessary before returning ActionCache entries" or "clients SHOULD call FindMissingBlobs to confirm file existence before proceeding" would be appropriate; my preference is also for it to be server-side. 

Jakob, does that also address the BuildBarn use-case? I.e. is it safe enough to assume TTLs >= build durations, or is that also too strong of an assumption.

George, I wasn't quite clear from your response - would that suffice from your perspective? 

Yes. My response got into some of the client ramifications that are discussed in the tracking issue, which was why it was a bit muddled.

I think the server-side SHOULD is appropriate, not sure about the "if necessary" part of it, and that there may be some flexibility of the "before returning ActionCache entries" or even if it should return them *regardless* [without any indication] of the existence of blobs in the output set.

I also think the client is a MAY, rather than a SHOULD in that instance, and that closes the loop on the ambiguity of the service's behavior on the cache result's behalf.

Jakob Buchgraber

unread,
May 22, 2019, 9:31:36 AM5/22/19
to George Gensure, Eric Burnett, Steven Bergsieker, Ola Rozenfeld, Drew Gassaway, Remote Execution APIs Working Group
Turns out we mostly already had this statement in the API ... who would have thought? :-) I clarified the statement a bit and added a few words about TTLs https://github.com/bazelbuild/remote-apis/pull/79

Jakob Buchgraber

Software Engineer


Google Germany GmbH

Erika-Mann-Straße 33

80636 München


Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg


Diese E-Mail ist vertraulich. Falls sie diese fälschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.

    

This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.



Reply all
Reply to author
Forward
0 new messages