Call for help: What should REv3 look like?

Visto 395 veces
Saltar al primer mensaje no leído

Ed Schouten

no leída,
12 sept 2023, 14:54:0712/9/23
a Remote Execution APIs Working Group,Erik Mavrinac
Hi there,

Thanks to everyone attending today's monthly working group meeting.
The discussion we had about the future of the Remote Execution
protocol was very useful. Below is a brief summary of what has been
discussed.

We notice that work on REv2 is stagnating. Most of the useful
proposals that end up getting discussed in our monthly meetings are
too large/intrusive to introduce without breaking backward
compatibility. These changes are often turned into tracking bugs
marked as "REv3 ideas", so that they can be picked up later. Thus far
the working group has had no concrete plans to work on REv3, meaning
that these tracking bugs just sit still.

During today's meeting we didn't necessarily conclude that right now
is the time to start specifying REv3. What we did agree on is that we
should at least start gathering requirements, and describing what
actual issues in REv2 we want to address. REv3 shouldn't just be a
bunch of minor improvements on top of REv2. It should be a significant
step up.

With that in mind I just created a document titled "What should REv3
look like?" The plan is to provide a high-level vision of what such a
new protocol should look like. You may find it here:

https://docs.google.com/document/d/1FxpdOzOhzOCTjjn2loppMlBzjqjU9WpYF4E1K6opxVI/edit?usp=sharing

As you can see, this document is still awfully empty. The reason I'm
sending it out anyway is to invite you to contribute.

Are you a heavy user of REv2 and running into the limits of what the
protocol can do? Or are you currently *not* using REv2, because the
protocol is inadequate for your organisation's needs? Can you think of
changes that would significantly improve the protocol in terms of
performance, bandwidth efficiency, etc.? If so, please reach out to me
or any of the authors listed in the document, so that we can
collaborate.

My hope is that by the end of this year we end up with something nice
that the working group can use to decide whether we should kick off
the design of REv3. To quote what was said in today's meeting: "REv2
allowed us to get by for five years. With REv3, we should try to
double that." Exciting times ahead of us.

Best regards,
--
Ed Schouten <e...@nuxi.nl>

Ed Schouten

no leída,
2 feb 2024, 9:44:042 feb
a Remote Execution APIs Working Group
Hello everyone,

As you may have seen, a couple of months ago I sent out a document to
the remote-execution-apis@ mailing list to discuss at a high level
what kinds of features should be addressed in a potential successor of
the existing remote execution protocol that is used by Bazel and many
other tools, named REv2.

https://docs.google.com/document/d/1FxpdOzOhzOCTjjn2loppMlBzjqjU9WpYF4E1K6opxVI/edit?usp=sharing

I would like to thank all of you who contributed to this document in
any way. Feedback I have received was both insightful and helpful.

With most of the feedback processed, we from the working group would
like to shift the discussion to get an answer to the following
question: How much traction is there within the community to start
working on such a protocol? A new standard is meaningless if the
number of improvements it brings is insufficient to persuade authors
of clients and servers to implement it.

On behalf of the working group, I would like to invite authors of such
tools to give the previously linked document a read and get back to us
to answer the following questions. Note that we're interested in
receiving feedback even if your implementation is proprietary or used
within a private setting.

1. Of the topics discussed in the document, which parts would you like
the working group to focus on most? Phrased differently: which of the
issues described in the document are currently most problematic for
your client/server?

2. Are there any topics that are not covered by the document that you
wish they were?

3. Assuming a new version of the remote execution protocol is released
with the aforementioned changes made, do you think that it brings
enough value that your client/server should be extended to support it?

4. Assuming implementing the parts of the new protocol take a similar
amount of effort as REv2, do you think your project has a sufficient
amount of staffing/headcount/momentum to implement the new protocol?
Note that this does not need to be a firm commitment.

If you want, you can reply to this email, making sure it goes both to
me and remote-exe...@googlegroups.com. If you feel
uncomfortable with replying in public, you may contact me privately,
and I'll make sure to anonymise/aggregate your responses before
sharing them with the rest of the working group.

As it's obviously important that this message reaches as many
client/server authors as possible, I will attempt to reach out to the
maintainers of the tools listed here after this message is sent out:

https://github.com/bazelbuild/remote-apis?tab=readme-ov-file#api-users

I would appreciate it if you could respond to this email before the
end of this month, so that we can discuss your responses during the
working group meeting in March.

Best regards,
Ed Schouten
--
Ed Schouten <e...@nuxi.nl>

Ed Schouten

no leída,
2 feb 2024, 9:56:362 feb
a Remote Execution APIs Working Group
As the maintainer of Buildbarn, here is my response.

Op vr 2 feb 2024 om 15:43 schreef Ed Schouten <e...@nuxi.nl>:
> 1. Of the topics discussed in the document, which parts would you like
> the working group to focus on most? Phrased differently: which of the
> issues described in the document are currently most problematic for
> your client/server?

High importance:

- Generalized execution, for the purpose of hardware testing.
- Elimination of large CAS objects, for the purpose of speeding up
random access to files accessed via bb_clientd.

Medium importance:

- Cryptography, for easier project disclosure enforcement.
- CAS references as first-class citizens, as this makes it easier to
repurpose the same storage architecture for use cases outside the
scope of builds.
- Unifying directories and trees, as this currently makes it harder to
embed large external corpora into build actions, requiring mounting of
network volumes on workers.

Low importance:

- Action graphs.

> 2. Are there any topics that are not covered by the document that you
> wish they were?

No.

> 3. Assuming a new version of the remote execution protocol is released
> with the aforementioned changes made, do you think that it brings
> enough value that your client/server should be extended to support it?

Yes.

> 4. Assuming implementing the parts of the new protocol take a similar
> amount of effort as REv2, do you think your project has a sufficient
> amount of staffing/headcount/momentum to implement the new protocol?
> Note that this does not need to be a firm commitment.

Yes.

Peter Ebden

no leída,
6 feb 2024, 9:40:326 feb
a Ed Schouten,Remote Execution APIs Working Group
Thanks for kicking this off, Ed! My thoughts on behalf of Please:


> 1. Of the topics discussed in the document, which parts would you like the working group to focus on most?

High importance:
 - Unifying directories and trees. We think we're losing some runtime performance here; it might be possible for us to do better within V2 but I'd rather spend the time on a clean and fast V3 implementation.
 - Elimination of large CAS objects (we have observed some hot-spotting here which this would help)

Medium importance:
 - Cryptography
 - Stop using ByteStream / LongRunning

Low importance:
 - Action graphs (I think this is really cool but we'd need a lot of internal change to be able to take advantage of it)
 - Generalizing execution


> 2. Are there any topics that are not covered by the document that you wish they were?

Nothing very well developed. I have a general wish that we don't create anything too complex in REAPIv3; REAPIv2 is already a very complex protocol to grok and hopefully we can create something which is at least in some ways more streamlined.


> 3. Assuming a new version of the remote execution protocol is released
> with the aforementioned changes made, do you think that it brings
> enough value that your client/server should be extended to support it?

Yes


> 4. Assuming implementing the parts of the new protocol take a similar
> amount of effort as REv2, do you think your project has a sufficient
> amount of staffing/headcount/momentum to implement the new protocol?
> Note that this does not need to be a firm commitment.

Yes

--
You received this message because you are subscribed to the Google Groups "Remote Execution APIs Working Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to remote-execution...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/remote-execution-apis/CABh_MK%3DEEmRsem%2Bzfy9fOrN%3DeZ47%2B47CZ5mv3K8KZNh4GRObqA%40mail.gmail.com.

Sakeeb Sabakka

no leída,
7 feb 2024, 13:11:287 feb
a Ed Schouten,Remote Execution APIs Working Group
Posting the collective thoughts of the BuildGrid maintainers:

> 1. Of the topics discussed in the document, which parts would you like
> the working group to focus on most? Phrased differently: which of the
> issues described in the document are currently most problematic for
> your client/server?

High importance:

CAS references as first-class citizens
Stop using the ByteStream/LongRunning protocol
Generalizing output streams

For Generalizing output streams the only aspect we find valuable is interleaving stdout/stderr. 

Medium importance:

Elimination of large CAS objects
Unifying Directories and Trees

Low importance:

Action graphs
Generalizing the Action Cache
Generalizing execution
Leveraging cryptography

> 2. Are there any topics that are not covered by the document that you
> wish they were?

No.

We think Action graphs API does not have to be part of the core REAPI. However it's beneficial to have a standardized API and the core API provide features that allow efficient implementation of all the standards in the API family (as Eric was suggesting in Miscellanea). 

> 3. Assuming a new version of the remote execution protocol is released
> with the aforementioned changes made, do you think that it brings
> enough value that your client/server should be extended to support it?

Yes.

> 4. Assuming implementing the parts of the new protocol take a similar
> amount of effort as REv2, do you think your project has a sufficient
> amount of staffing/headcount/momentum to implement the new protocol?
> Note that this does not need to be a firm commitment.

Yes.


Mostyn Bramley-Moore

no leída,
13 feb 2024, 9:56:2713 feb
a remote-exe...@googlegroups.com
As the maintainer of bazel-remote (note: cache implementation only) and some proprietary clients:

> 1. Of the topics discussed in the document, which parts would you like the working group to focus on most?

* Stop using ByteStream
* Elimination of large CAS objects

> 2. Are there any topics that are not covered by the document that you wish they were?

I wonder if anyone else is interested in solving the "mspdbsrv" problem? In this scenario multiple cl.exe compiler processes synchronise via IPC(?) to spawn a single mspdbsrv process, and each communicate with it, and it eventually produces an output file from those inputs. In REAPIv2 it is generally assumed that actions are completely independent of each other, and only communicate via the filesystem.

> 3. Assuming a new version of the remote execution protocol is released with the aforementioned changes made, do you think that it brings enough value that your client/server should be extended to support it?

Yes.

(2) would open up REAPI to an IMO under-served platform: MSVC-in-the-default-non-Z7-mode.

> 4. Assuming implementing the parts of the new protocol take a similar amount of effort as REv2, do you think your project has a sufficient amount of staffing/headcount/momentum to implement the new protocol?

Yes.


-Mostyn.
--
Mostyn Bramley-Moore
mos...@antipode.se

Klaus T. Aehlig

no leída,
16 feb 2024, 6:11:3916 feb
a Ed Schouten,Remote Execution APIs Working Group

Hello,

my thoughts for justbuild.

On Fri, Feb 02, 2024 at 03:43:35PM +0100, Ed Schouten wrote:
> 1. Of the topics discussed in the document, which parts would you like
> the working group to focus on most? Phrased differently: which of the
> issues described in the document are currently most problematic for
> your client/server?

For us, the important topics are
- 1st-class CAS references
- unifying directories and trees
- elimination of large CAS objects

If the first two items are done in such a way to allow making
use of git tree/blob identifiers that are available anyway from
version-control, that would be particularly nice.

> 2. Are there any topics that are not covered by the document that you
> wish they were?

No

> 3. Assuming a new version of the remote execution protocol is released
> with the aforementioned changes made, do you think that it brings
> enough value that your client/server should be extended to support it?

yes

> 4. Assuming implementing the parts of the new protocol take a similar
> amount of effort as REv2, do you think your project has a sufficient
> amount of staffing/headcount/momentum to implement the new protocol?
> Note that this does not need to be a firm commitment.

yes

Best regards,
Klaus

Steven Bergsieker

no leída,
8 mar 2024, 14:21:398 mar
a Ed Schouten,Remote Execution APIs Working Group
This is a little later than promised, but still hopefully early enough to inform discussion. This is a coordinated response from the various teams at Google, including Bazel, reclient, Goma, RBE, Siso, and other, internal tooling.

On Fri, Feb 2, 2024 at 9:44 AM Ed Schouten <e...@nuxi.nl> wrote:
  • Better handling of large files, addressed by both "Elimination of Large CAS Objects" and "Stop using the ByteStream/LongRunning protocol" (at least the Bytestream portion).
  • Faster client-side handling of large trees. We believe this likely looks like some form of multi-input root to allow for optimized handling of large, constant input trees (e.g., toolchains). It’s also possible that "Unifying Directories and Trees" and "CAS references as first-class citizens" will help here.
  • (Maybe) Separating concerns of the various API layers. In particular, Eric’s suggestion to leverage the Capabilities API in a way that allows independent versioning of the various APIs. It’s possible that “Generalizing Execution” plays a role here, but we’re mostly interested in that insofar as it separates scheduling concerns from execution concerns, allowing them to evolve independently.
 

2. Are there any topics that are not covered by the document that you
wish they were?

  • We’d like to explore whether introducing a full v3 is necessary, or whether some/all of these changes can be introduced in a backwards-compatible way in v2, perhaps also paired with an effort to get all implementations to a certain “minimum supported” version (so we can actually delete some of the deprecated fields). A series of incremental changes is significantly less risky than a big-bang v3, and also has lower risk of creating a long-term split in the community (ala Python v2 vs v3).
    • The biggest sticking point here is likely “Elimination of Large CAS Objects,” which is both our top priority and one of the more disruptive changes to fit into v2.
    • While we’re interested in better separation within the layers of the API, we’d happily give that up if it meant being able to avoid a v3.
  • We’re exploring some forms of two-phase/”fuzzy” caching internally. We’re not ready to discuss this publicly yet, but likely will be during the timeframe of the v3 discussions. What we’re looking at now can probably be fit into either v2 or v3 by the addition of a few extra fields and APIs.
  • As noted above, multi-input root is interesting to Google because it can help optimize handling of large trees.
 

3. Assuming a new version of the remote execution protocol is released
with the aforementioned changes made, do you think that it brings
enough value that your client/server should be extended to support it?

Yes
 

4. Assuming implementing the parts of the new protocol take a similar
amount of effort as REv2, do you think your project has a sufficient
amount of staffing/headcount/momentum to implement the new protocol?
Note that this does not need to be a firm commitment.

Yes, with one caveat: the Action Graph proposals are disruptive enough to Bazel that it’s unlikely that Bazel would implement action graphs within the foreseeable future (should that be included in v3).

Responder a todos
Responder al autor
Reenviar
0 mensajes nuevos