Buildfarm is currently using a variant of the Tree message which retains its flattened list of Directories as a map of digest hashes to Directory. This format is useful internally for representations of ready-to-run operations, transformed from their digest referent versions.
The only current usage of Trees within the REAPI is in the OutputDirectory message, but a similar representation exists in the GetTreeResult message, which effects a streamed variant of the Tree.Trees and the Directory merkle hierarchy they reflect are unique in that they are the only place where mutual digest *computation* is required by an implementation in order to present results back to the client, binding a presentation to the choice of hash function - all other situations technically (though not practically, via all client implementations) are expected to provide CAS addresses which are used verbatim.
There was substantial improvement in the constant overhead performance of validation and fetching (+indexing) when we instituted this in buildfarm, as the average input tree contained thousands of directories for individual actions in our hermetic build.
I'd like to propose that we switch the Tree message within REAPI to use the indexed version, and that we also change the streamed response for ContentAddressableStorage::getTree to provide a possibly-partial map per-page, optionally using the Tree as a container.
Thoughts?
-George
--
You received this message because you are subscribed to the Google Groups "Remote Execution APIs Working Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to remote-execution...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/remote-execution-apis/CAB5czhcWsRwgo0No-wVRfLPPrs75haVU8LXYSk5KU5FHk47njQ%40mail.gmail.com.
Just to make sure I understand, is this because on the client you need to compute the digest of each returned Directory message to know where it belongs in the hierarchy?
There was substantial improvement in the constant overhead performance of validation and fetching (+indexing) when we instituted this in buildfarm, as the average input tree contained thousands of directories for individual actions in our hermetic build.Can you elaborate a bit on this part? For inputs the change to GetTree seems to be most applicable - are you referring to resolving the input tree (Action.input_root_digest) for worker staging?
I'd like to propose that we switch the Tree message within REAPI to use the indexed version, and that we also change the streamed response for ContentAddressableStorage::getTree to provide a possibly-partial map per-page, optionally using the Tree as a container.Am I correct in assuming this is a proposal for v3? Or is my assumption that this change would break backward compatibility wrong?
Thoughts?Just thinking out loud here... I'm curious what the inflexion point would be in terms of latency vs IO of inlining the Directory messages vs having "Tree" just be a flat list of Digests and then fetching the Directory blobs from CAS that you don't already have locally. The additional round trip would be bad for output trees that churn all over the tree, which is likely for small trees. For trees that are fairly stable, e.g. source trees, toolchain trees, I would expect this to be a win.
---GeorgeCheers,
Sander--
You received this message because you are subscribed to the Google Groups "Remote Execution APIs Working Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to remote-execution...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/remote-execution-apis/CAB5czhcWsRwgo0No-wVRfLPPrs75haVU8LXYSk5KU5FHk47njQ%40mail.gmail.com.
You received this message because you are subscribed to the Google Groups "Remote Execution APIs Working Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to remote-execution...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/remote-execution-apis/CAND%2B6xmDUZAsf-GXv-bUxyzMar2ucBiTN0EAoKYUFjdqXoTzgQ%40mail.gmail.com.