Jakob Buchgraber
Software Engineer
Google Germany GmbH
Erika-Mann-Straße 33
80636 München
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Diese E-Mail ist vertraulich. Falls sie diese fälschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.
This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.
--
You received this message because you are subscribed to the Google Groups "Remote Execution APIs Working Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to remote-execution...@googlegroups.com.
To post to this group, send email to remote-exe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/remote-execution-apis/CAGQ4vn0vM6wH5o26yaL87E9T%2BVTCzx69dLo8ijb0k%3D1_L5LGYg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
To view this discussion on the web visit https://groups.google.com/d/msgid/remote-execution-apis/CAPV_%3DUr%2Bk3UPxnFnxbr4s0rp4ys7vnCR%3D74rpqiTj5-yM5XWPQ%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/remote-execution-apis/CAPV_%3DUr%2Bk3UPxnFnxbr4s0rp4ys7vnCR%3D74rpqiTj5-yM5XWPQ%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/remote-execution-apis/CAPV_%3DUq3cm2%2B8ZPgbzUJXon0bf4io66dfzhUHnqkAE7vKLkMMw%40mail.gmail.com.
FYI the caches for the Microsoft BuildXL and internal build engines running in our datacenters all use the idea of pinning content from the CAS on behalf of a session (typically equivalent to a build session), which combines a presence check for a known set of hashes (say, checking a build process’s predicted inputs and known previous outputs from other sessions) with an extension on any internal TTLs to ensure the content remains for the lifetime of the session. A session has an implicit garbage collection behind it that prevents pinning too long if the service equivalent of a finally{} block is not executed by the build engine.
The pin call is directed at the local CAS microservice but propagates outward to the datacenter cache metadata stores and file cleaners. Knowledge of TTLs is not needed or wanted by the clients.
There are still race conditions where a successful pinning still results in a retrieval failure if all replicas disappear between the pinning and retrieval for build. The cache/CAS engineers cannot prevent it, they just have a target of <1 in 10E8 occurrences, as this results in a full build retry.
To view this discussion on the web visit https://groups.google.com/d/msgid/remote-execution-apis/CAA3Cs41gbpeujMXpeNQ9FXkh_TWrvYXAuEdW5z266K%2B%2BF45_bA%40mail.gmail.com.
- Blobs may *not* live long enough, but because they have explicit TTLs that are shorter. In theory informing clients of this so it could keep them alive would make sense, but that's additional client-size logic needed too, and I know of no existing system this is likely to apply to.
- Blobs may *not* live long enough, because there's no explicit TTL mechanism (pressure-based evictions, say). Such a system wouldn't be able to implement this signal anyways.
My hypothesis is that it's sufficient for now and a while into the future for clients to simply assume blobs will live long enough.
Having Bazel reexecute Actions and store the results in a CAS that is already under pressure will just lead to thrashing the CAS and Bazel not making progress. It's a cascading failure condition.
To view this discussion on the web visit https://groups.google.com/d/msgid/remote-execution-apis/CAB5czhd12eYf1pXi_zdnsc8qGEP8RtVm8tDpejDAcxHbQGUjRg%40mail.gmail.com.
If servers need to maintain TTLs for all items in CAS anyway, then why do we need to return these TTLs in the API? You can just "touch" a blob every time it is accessed (increase its TTL), including by Action Cache. That's what RBE does. It wasn't too simple to implement, but now RBE guarantees that any items returned by Action Cache exist in the CAS. I understand that other server implementations might not be as robust by design (memory-only CAS, shards going down, etc), but that case isn't addressed by your proposal anyway, right?
Jakob Buchgraber
Software Engineer
Google Germany GmbH
Erika-Mann-Straße 33
80636 München
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Diese E-Mail ist vertraulich. Falls sie diese fälschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.
This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.
Ahh, I missed the wrinkle that nobody is currently *extending* TTLs when returning an action cache entry, so it's not "assume files from start of build exist at end", but "assume files from arbitrarily far into the past still exist". Thanks for pointing that out George - you're right, that's a missing detail that needs to be addressed somewhere.I think documenting on the REAPI either "services SHOULD confirm file existence and extend TTLs if necessary before returning ActionCache entries" or "clients SHOULD call FindMissingBlobs to confirm file existence before proceeding" would be appropriate; my preference is also for it to be server-side.Jakob, does that also address the BuildBarn use-case? I.e. is it safe enough to assume TTLs >= build durations, or is that also too strong of an assumption.George, I wasn't quite clear from your response - would that suffice from your perspective?
Jakob Buchgraber
Software Engineer
Google Germany GmbH
Erika-Mann-Straße 33
80636 München
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Diese E-Mail ist vertraulich. Falls sie diese fälschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.
This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.