Tree with map

George Gensure

unread,

Jun 10, 2020, 2:58:31 PM6/10/20

to Remote Execution APIs Working Group

Buildfarm is currently using a variant of the Tree message which retains its flattened list of Directories as a map of digest hashes to Directory. This format is useful internally for representations of ready-to-run operations, transformed from their digest referent versions.

https://github.com/bazelbuild/bazel-buildfarm/blob/5114733ddc73891fc35cc44f062255ae17315dc2/src/main/protobuf/build/buildfarm/v1test/buildfarm.proto#L470-L479

The only current usage of Trees within the REAPI is in the OutputDirectory message, but a similar representation exists in the GetTreeResult message, which effects a streamed variant of the Tree.

Trees and the Directory merkle hierarchy they reflect are unique in that they are the only place where mutual digest *computation* is required by an implementation in order to present results back to the client, binding a presentation to the choice of hash function - all other situations technically (though not practically, via all client implementations) are expected to provide CAS addresses which are used verbatim.

There was substantial improvement in the constant overhead performance of validation and fetching (+indexing) when we instituted this in buildfarm, as the average input tree contained thousands of directories for individual actions in our hermetic build.

I'd like to propose that we switch the Tree message within REAPI to use the indexed version, and that we also change the streamed response for ContentAddressableStorage::getTree to provide a possibly-partial map per-page, optionally using the Tree as a container.

Thoughts?

-George

Sander Striker

unread,

Jun 10, 2020, 5:44:40 PM6/10/20

to Remote Execution APIs Working Group

Hi George,

On Wed, Jun 10, 2020 at 8:58 PM 'George Gensure' via Remote Execution APIs Working Group <remote-exe...@googlegroups.com> wrote:

Buildfarm is currently using a variant of the Tree message which retains its flattened list of Directories as a map of digest hashes to Directory. This format is useful internally for representations of ready-to-run operations, transformed from their digest referent versions.

https://github.com/bazelbuild/bazel-buildfarm/blob/5114733ddc73891fc35cc44f062255ae17315dc2/src/main/protobuf/build/buildfarm/v1test/buildfarm.proto#L470-L479

The only current usage of Trees within the REAPI is in the OutputDirectory message, but a similar representation exists in the GetTreeResult message, which effects a streamed variant of the Tree.

Trees and the Directory merkle hierarchy they reflect are unique in that they are the only place where mutual digest *computation* is required by an implementation in order to present results back to the client, binding a presentation to the choice of hash function - all other situations technically (though not practically, via all client implementations) are expected to provide CAS addresses which are used verbatim.

Just to make sure I understand, is this because on the client you need to compute the digest of each returned Directory message to know where it belongs in the hierarchy?

There was substantial improvement in the constant overhead performance of validation and fetching (+indexing) when we instituted this in buildfarm, as the average input tree contained thousands of directories for individual actions in our hermetic build.

Can you elaborate a bit on this part? For inputs the change to GetTree seems to be most applicable - are you referring to resolving the input tree (Action.input_root_digest) for worker staging?

I'd like to propose that we switch the Tree message within REAPI to use the indexed version, and that we also change the streamed response for ContentAddressableStorage::getTree to provide a possibly-partial map per-page, optionally using the Tree as a container.

Am I correct in assuming this is a proposal for v3? Or is my assumption that this change would break backward compatibility wrong?

Thoughts?

Just thinking out loud here... I'm curious what the inflexion point would be in terms of latency vs IO of inlining the Directory messages vs having "Tree" just be a flat list of Digests and then fetching the Directory blobs from CAS that you don't already have locally. The additional round trip would be bad for output trees that churn all over the tree, which is likely for small trees. For trees that are fairly stable, e.g. source trees, toolchain trees, I would expect this to be a win.

-George

Cheers,

Sander

--
You received this message because you are subscribed to the Google Groups "Remote Execution APIs Working Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to remote-execution...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/remote-execution-apis/CAB5czhcWsRwgo0No-wVRfLPPrs75haVU8LXYSk5KU5FHk47njQ%40mail.gmail.com.

George Gensure

unread,

Jun 12, 2020, 9:26:43 AM6/12/20

to Sander Striker, Remote Execution APIs Working Group

Just to make sure I understand, is this because on the client you need to compute the digest of each returned Directory message to know where it belongs in the hierarchy?

Correct. From the root, child directories can only be resolved if one computes and indexes the `children` or repeated `directories` in the Tree message and GetTree calls, respectively.

There was substantial improvement in the constant overhead performance of validation and fetching (+indexing) when we instituted this in buildfarm, as the average input tree contained thousands of directories for individual actions in our hermetic build.

Can you elaborate a bit on this part? For inputs the change to GetTree seems to be most applicable - are you referring to resolving the input tree (Action.input_root_digest) for worker staging?

Yes, though GetTree was not involved here. Buildfarm's independent Tree message type was used as an input layout container (along with Action/Command) and to reconstruct this index on schedulers and workers (there are possible multiple paths here, I'll just say its a constant overhead cost), substantially cut down on Memory and CPU usage, all in the critical path of processing an operation.

I'd like to propose that we switch the Tree message within REAPI to use the indexed version, and that we also change the streamed response for ContentAddressableStorage::getTree to provide a possibly-partial map per-page, optionally using the Tree as a container.

Am I correct in assuming this is a proposal for v3? Or is my assumption that this change would break backward compatibility wrong?

I believe it can coexist with the current definition with field identifiers. ID 3 and 4 could be root digest hash and map, respectively, with similar behaviors to `output_paths` relative to 2.1 and its deprecation of `output_files` and `output_directories`

Thoughts?

Just thinking out loud here... I'm curious what the inflexion point would be in terms of latency vs IO of inlining the Directory messages vs having "Tree" just be a flat list of Digests and then fetching the Directory blobs from CAS that you don't already have locally. The additional round trip would be bad for output trees that churn all over the tree, which is likely for small trees. For trees that are fairly stable, e.g. source trees, toolchain trees, I would expect this to be a win.

Definitely going to vary by implementation: In situations where the directory trees are needed (validation, priori creation of input tree for exec filesystem), the individual lookup in the cas for each costs substantially more than the cost of constructing (where my validation happens, so 0 latency there) and single blob CAS fetching (workers fetching the ready-to-run action to construct exec). Add in a couple of separate visits to different workers (async failures, timeouts during fetch) and requeue validations, and I've more than made up for it.

-George

Cheers,

Sander

--
You received this message because you are subscribed to the Google Groups "Remote Execution APIs Working Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to remote-execution...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/remote-execution-apis/CAB5czhcWsRwgo0No-wVRfLPPrs75haVU8LXYSk5KU5FHk47njQ%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "Remote Execution APIs Working Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to remote-execution...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/remote-execution-apis/CAND%2B6xmDUZAsf-GXv-bUxyzMar2ucBiTN0EAoKYUFjdqXoTzgQ%40mail.gmail.com.

George Gensure

unread,

Jul 14, 2020, 10:59:09 AM7/14/20

to Sander Striker, Remote Execution APIs Working Group

reping per meeting

Sander Striker

unread,

Jul 14, 2020, 6:44:08 PM7/14/20

to George Gensure, Remote Execution APIs Working Group

Hi George,

Would you mind WIP PR'ing this? I think it would be useful to get an idea of impact on the v2 API.

Cheers,

Sander

Reply all

Reply to author

Forward