gRFC L58: Async API for gRPC Python

848 views
Skip to first unread message

Lidi Zheng

unread,
Jul 22, 2019, 1:49:20 PM7/22/19
to grpc.io
Hi gRPC users,

This proposal is about adding a new set of Async IO native API to gRPC Python, which solves concurrency issues and performance issues for gRPC Python.
The new API will be isolated with current API, so the migration can happen gradually.


Comments and suggestions are welcomed!

Lidi Zheng

Kailash Sethuraman

unread,
Jul 22, 2019, 6:06:31 PM7/22/19
to Lidi Zheng, grpc.io
Hooray! 

Some comments:

"Granularity per channel, per server level". Can you please clarify? Does this mean that a server cannot have 2 or services, some async and others sync? If so, what about pre-packaged services like healthcheck? It will also be a hurdle for easy and gradual adoption of the async API into servers and organizations. 

Will running functions in executors be fully supported? It is a common pattern to run sync tasks in executors on the asyncio event loop using https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.loop.run_in_executor. The executors can be process or thread pools, will both be supported out of the box?

grpc.aio.Call functions return asyncio.Future, but grpc.ServerContext.send_initial_metadata returns a Task object. Could you please elaborate on the rationale here? 

grpc.aio.channel_ready_future not returning a future, but a coroutine. This can be surprising. 

asyncio Exceptions ( will they be mapped to gRPC Errors where possible)? While the RFC states that the RPCError exceptions will be the same, it does not speak to how

There is a comparison between the current gRPC API and the proposed, but it might be valuable to do a similar take from the asyncio perspective -- what are the functions/facilities that will be unavailable with the gRPC async API and how can the similar actions be performed. Eg: asyncio.* functions such as gather,shield, wait_for, wait, etc

- It would be very nice if the gRFC could include the full API, given that this is a new package, and list functions that will be exported ( ie: added to  __all__)  

Thanks!




--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/92d5e57f-332a-4e48-bdb2-ccc4d515a5bf%40googlegroups.com.

meh...@dropbox.com

unread,
Jul 22, 2019, 7:32:49 PM7/22/19
to grpc.io
Thanks for the gRFC. It's certainly a lot to digest. I also recommend trying to revisit some of the poorly designed APIs and not simply support/replicate everything under the async model.

Without going into the implementation, I have a few key concerns:

- The grpc.aio namespace as a user-facing thing that users would import would be terrible. Considering going-forward this will hopefully be the default, we should try making it work under the grpc module as much as possible. I think for the most part we can do that and there aren't that many technical reasons they could not coexist under the same namespace. (RpcMethodHandler could add a new attribute `async` for example, that makes the machinery invoke the handler and assume it's an async implementation, for instance, or server could have a new add_async_rpc_handlers method). On the channel side, the async methods could easily live side-by-side, leaving no reason to have it under a separate namespace/object.
- I'd like to echo Kailash's comment as well, which is we probably want to be able to host non-async servicers under an existing async server. As the testing section admits, this should not be too hard to emulate. In fact, it is probably ideal to just build everything on asyncio and simply wait on the future in the sync version by default.
- As a user, we have come to learn that writing streaming RPCs with coroutine generator based iterators is super painful today and ideally you want to just call send and receive from the context API.
- The gRFC does not address how flow control would interact with async. Please note that proper gRPC support requires these two simultaneously: (1) you should be able to have a pending sending recv and send op simultaneously. (2) gRPC C-core enforces the rule of at most a single pending read op and a single pending write op.

Other things of note that are potentially unrelated but can probably be taken away in an opportunity for change:
- I don't think we should feel constrained with the current Java-in-Python-abstract-class mess status-quo. Idiomatic Python can simply return a conforming object without necessarily inheriting from abstract classes in __init__.py.
- The server-side interceptor API current can modify which path can be called and potentially rewrite it. That feature is detrimental for high performance servers. It would be great to think about it.
- The client-side channel subscribe design should support a "subscribe only once" mechanism (instead of subscribe never ending till unsubscribe) so that we can kill the additional polling thread. This matches with the C-core API model better.

The gRFC is long, so there are likely additional things, but this should be good to get some brainstorming going. Can we separate the general async model from each API minutia? I don't think we should carry over each API element without revisiting it carefully. I hope we don't rush through this. You don't get many chances to naturally do a clean redesign like this.
To unsubscribe from this group and stop receiving emails from it, send an email to grp...@googlegroups.com.

Kailash Sethuraman

unread,
Jul 22, 2019, 9:03:58 PM7/22/19
to meh...@dropbox.com, grpc.io


On Mon, Jul 22, 2019 at 7:32 PM mehrdad via grpc.io <grp...@googlegroups.com> wrote:
Thanks for the gRFC. It's certainly a lot to digest. I also recommend trying to revisit some of the poorly designed APIs and not simply support/replicate everything under the async model.

Without going into the implementation, I have a few key concerns:

- The grpc.aio namespace as a user-facing thing that users would import would be terrible.

I am not so sure of this -- I think that separating into a new namespace for a future API offers an opportunity for a clean break without too much change to the current API or behavior and is least surprising for the user. 

It avoids  sprawl in the existing module.

It means no complications from the need to be compatible with python 2. 

In Asyncio, a similar approach was taken with asyncio.Queue. 


Considering going-forward this will hopefully be the default, we should try making it work under the grpc module as much as possible. I think for the most part we can do that and there aren't that many technical reasons they could not coexist under the same namespace. (RpcMethodHandler could add a new attribute `async` for example, that makes the machinery invoke the handler and assume it's an async implementation, for instance, or server could have a new add_async_rpc_handlers method). On the channel side, the async methods could easily live side-by-side, leaving no reason to have it under a separate namespace/object.
- I'd like to echo Kailash's comment as well, which is we probably want to be able to host non-async servicers under an existing async server. As the testing section admits, this should not be too hard to emulate. In fact, it is probably ideal to just build everything on asyncio and simply wait on the future in the sync version by default.
- As a user, we have come to learn that writing streaming RPCs with coroutine generator based iterators is super painful today and ideally you want to just call send and receive from the context API.
- The gRFC does not address how flow control would interact with async. Please note that proper gRPC support requires these two simultaneously: (1) you should be able to have a pending sending recv and send op simultaneously. (2) gRPC C-core enforces the rule of at most a single pending read op and a single pending write op.

Other things of note that are potentially unrelated but can probably be taken away in an opportunity for change:
- I don't think we should feel constrained with the current Java-in-Python-abstract-class mess status-quo. Idiomatic Python can simply return a conforming object without necessarily inheriting from abstract classes in __init__.py.

+1

- The server-side interceptor API current can modify which path can be called and potentially rewrite it. That feature is detrimental for high performance servers. It would be great to think about it.
- The client-side channel subscribe design should support a "subscribe only once" mechanism (instead of subscribe never ending till unsubscribe) so that we can kill the additional polling thread. This matches with the C-core API model better.

The gRFC is long, so there are likely additional things, but this should be good to get some brainstorming going. Can we separate the general async model from each API minutia? I don't think we should carry over each API element without revisiting it carefully. I hope we don't rush through this. You don't get many chances to naturally do a clean redesign like this.

A point that we also missed out was coexistence with other (asyncio) event loop. Both client and server side. Not sure if this needs to be first class, but we should consider making this possible. 


To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/641a8416-63c3-48af-999e-006c6f4f67c1%40googlegroups.com.

Lidi Zheng

unread,
Jul 23, 2019, 2:42:17 PM7/23/19
to Kailash Sethuraman, meh...@dropbox.com, grpc.io
Nice to hear from you, Kailash and Mehrdad!

Let me try to answer some of your comments.
I'm open to changes even rewrites if I'm convinced there is a better way.

---

For Kailash's suggestions:

1. "Granularity per channel, per server"

Yes, what you interpret is correct, we don't want to promote mixing async and sync RPC method handlers.
The pre-packaged services should be updated to have two implementation. Thank you for reminding me of them...
For simple service, it is easy to add "async def" prefix for those handlers. For complex service, it is error-prone to simply wrap normal functions into async functions.
The changes in threading model may cause a lot of issues that only observable in runtime. This is my concern.
Since you all want to ease the transition, I will make it a warning instead of hard failure.

2. "Executors"

I'm aware of this API, and considered to use it by default to seamlessly mock current API.
But 1) if the application fully adopted `asyncio`, there shouldn't be any need for the executors.
2) there are issues with both executors. The performance of ThreadPoolExecutor is terrible; the fork operation in ProcessExecutor is unsupported by gRPC Python (yet).
Even if we want to support them in future, I hope they are disabled by default. This should be considered a separated feature.

3. Future vs. Task
Access the result of an RPC in client-side (grpc.aio.Call), I assume users do care about the result, hence they will "await" on them.
On the other hand, "send_initial_metadata" is an operation which users may fire-and-forget, so the Task class is more suitable.

4. "grpc.aio.channel_ready_future" naming
Agree. Updating in later version.

5. Exceptions in asyncio vs. grpcio
Sadly, the grpc.RpcError is a class of it own which covered all kinds of errors in gRPC space.
In the gRFC, my point is to get rid of existing `grpc.FutureTimeoutError` and `grpc.FutureCancelledError`.

6. Write the difference in the view of `asyncio` user
This is a really good idea. For now, this gRFC should be a guide for existing users to adopt `asyncio` practice.
Probably we should write a migration tutorial after it is done.

7. Full API Definition
My original thought is that the full API definition is more than one thousand lines of code, and might change during implementation.
I can do an actual PR for the new interfaces, and link in the gRFC.

---

For Mehrdad's suggestions:
The overall effort of refactoring is called "grpc-easy" that should solve usability issues and integrate Python 3 features.
I have been naive to defer the usability improvement to later part, but you are right, probably this gRFC is the place to perform the improvements.

1. Remove "grpc.aio" namespace
The answer from Kailash is better than what I have.
Mine is to minimize the impact we might have to existing users.
Also, the adoption of `asyncio` is not a mindless change, I don't want to tangle up coroutines and normal functions.
Many improvements (including ones you suggested) can't be done without regression if we stick to existing interfaces.

2. Async & Sync Method Handlers
Will change to warning instead of hard failure. Thanks for pointing out.

3. Generator is hard to use
I want to change it as well. But it will be a breaking change.
Several questions need to be answered:
    1) How to mix generator based handlers and direct invocation handlers?
    2) How do we deal with existing iterator semantic? E.g. the request_iterator allows program to iterator through with a single for loop. But for "recv" API, Python needs to use "while" loop.
    3) How do we define the end of an RPC?
I will try to propose a way of doing it in later commits. Suggestions are welcomed.

4. Simultaneous read and write is prohibited in C-Core
I'm not aware of that C-Core rule before. I have to investigate to talk more.

5. Concrete classes instead of Java-in-Python-abstract-class
I would love to change it if possible.

6. Server-side interceptor API hurts performance
I would say the impact to performance depends on how the interceptor is implemented.
Do you have more concrete example about this impact?

7. Subscribe API Redesign
Can you please give me the use case of "subscribe only once"?
Thanks to `asyncio`, the `grpc.aio` version doesn't need the polling thread.

8. Separate gRFC for each API
This gRFC won't be merged until all APIs are implemented.
There are plenty of time for us to discuss the details.

---

I'll ping the thread once I have resolved all your comments.
Thanks again for commenting ;)

Lidi Zheng


Kailash Sethuraman

unread,
Jul 23, 2019, 6:28:30 PM7/23/19
to Lidi Zheng, meh...@dropbox.com, grpc.io
On Tue, Jul 23, 2019 at 2:42 PM Lidi Zheng <li...@google.com> wrote:
Nice to hear from you, Kailash and Mehrdad!

Let me try to answer some of your comments.
I'm open to changes even rewrites if I'm convinced there is a better way.

---

For Kailash's suggestions:

1. "Granularity per channel, per server"

Yes, what you interpret is correct, we don't want to promote mixing async and sync RPC method handlers.
The pre-packaged services should be updated to have two implementation. Thank you for reminding me of them...
For simple service, it is easy to add "async def" prefix for those handlers. For complex service, it is error-prone to simply wrap normal functions into async functions.
The changes in threading model may cause a lot of issues that only observable in runtime. This is my concern.
Since you all want to ease the transition, I will make it a warning instead of hard failure.


Consider companies/teams who have implemented services in that are 'base components' of any application service that is written. The two implementations  -- reflection and healthcheck could just be two of many that exist in such a context. If coexistence is supported, then the adoption of the async framework can be gradual.
If its all or nothing, existing services would not benefit from this effort - a significant hurdle.
 

2. "Executors"

I'm aware of this API, and considered to use it by default to seamlessly mock current API.
But 1) if the application fully adopted `asyncio`, there shouldn't be any need for the executors.
2) there are issues with both executors. The performance of ThreadPoolExecutor is terrible; the fork operation in ProcessExecutor is unsupported by gRPC Python (yet).
Even if we want to support them in future, I hope they are disabled by default. This should be considered a separated feature.


ThreadPoolExecutor is intended for I/O bound functions, eg File Operations/DB lookups, while Process is for CPU bound, so the performance of the former should not be a major consideration. However, these are familiar tools for teams to incrementally move codebases into an async model. 

Combined, the limitations 1 and 2 make the proposed API surprising to both gRPC and Asyncio users..

 
3. Future vs. Task
Access the result of an RPC in client-side (grpc.aio.Call), I assume users do care about the result, hence they will "await" on them.
On the other hand, "send_initial_metadata" is an operation which users may fire-and-forget, so the Task class is more suitable.

4. "grpc.aio.channel_ready_future" naming
Agree. Updating in later version.

5. Exceptions in asyncio vs. grpcio
Sadly, the grpc.RpcError is a class of it own which covered all kinds of errors in gRPC space.
In the gRFC, my point is to get rid of existing `grpc.FutureTimeoutError` and `grpc.FutureCancelledError`.


Tasks/Futures can be cancelled and will throw exceptions that are defined in the asyncio namespace, these may look "similar but not quite" to gRPC Exceptions.
 
6. Write the difference in the view of `asyncio` user
This is a really good idea. For now, this gRFC should be a guide for existing users to adopt `asyncio` practice.
Probably we should write a migration tutorial after it is done.

7. Full API Definition
My original thought is that the full API definition is more than one thousand lines of code, and might change during implementation.
I can do an actual PR for the new interfaces, and link in the gRFC.

This is a large endeavor - having a skeleton framework of the proposed API might help spur discussion that improves the outcome.

Mehrdad Afshari

unread,
Jul 24, 2019, 2:49:23 AM7/24/19
to Lidi Zheng, Kailash Sethuraman, grpc.io
> 4. Simultaneous read and write is prohibited in C-Core
I'm not aware of that C-Core rule before. I have to investigate to talk more.

Just to clarify, the restriction is at most 1 pending write and 1 pending read. You can certainly read and write concurrently (and you should definitely allow this for correctness, otherwise you can get the programmer in a queue deadlock in bidi-streaming cases that would be impossible to resolve by the programmer). 

Will get back to you later on other points. 

Lidi Zheng

unread,
Aug 8, 2019, 7:36:32 PM8/8/19
to Mehrdad Afshari, Kailash Sethuraman, grpc.io
@meh...@dropbox.com @Kailash Sethuraman 

Hi Mehrdad and Kailash,

I have updated the design doc based on our recent discussion. Any suggestions are welcomed.

| No. | Issue                                           | Section                                                                                                                                                               | In this Doc?  |
| --- | ----------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------- |
| 1   | Merge `__call__`, `with_call`, `futures`        | [Unified Stub Call](https://github.com/lidizheng/proposal/blob/grpc-python-async-api/L58-python-async-api.md#unified-stub-call)                                       | Yes           |
| 2   | Should we eliminate interfaces?                 | [Concrete Class Instead of Interfaces](https://github.com/lidizheng/proposal/blob/grpc-python-async-api/L58-python-async-api.md#concrete-class-instead-of-interfaces) | Yes           |
| 3   | C-Core constraint about read/write concurrency. | [Flow Control Enforcement](https://github.com/lidizheng/proposal/blob/grpc-python-async-api/L58-python-async-api.md#flow-control-enforcement)                         | Yes           |
| 4   | Should we support run_in_executor?              | [Support Thread And Process Executors](https://github.com/lidizheng/proposal/blob/grpc-python-async-api/L58-python-async-api.md#support-thread-and-process-executors) | Yes           |
| 5   | Story for grpcio-* package under async API?     | [Other Official Packages](https://github.com/lidizheng/proposal/blob/grpc-python-async-api/L58-python-async-api.md#other-official-packages)                           | Yes           |
| 6   | Should we allow mixing async & sync handlers?   |                                                                                                                                                                       | Merged with 4 |
| 7   | Streaming API without iterator.                 |                                                                                                                                                                       | No yet        |
| 8   | Redesign connectivity API.                      |                                                                                                                                                                       | No yet        |
| 9   | Ease usage of channel arguments.                |                                                                                                                                                                       | No yet        |
| 10  | Exceptions between `asyncio` and `grpcio`.      |                                                                                                                                                                       | No yet        |


Thanks,
Lidi Zheng

Mehrdad Afshari

unread,
Aug 12, 2019, 3:12:30 PM8/12/19
to Lidi Zheng, Kailash Sethuraman, grpc.io
Hi Lidi,

Thanks for the update. Let's start with the core packaging strategy first. I am still perplexed about the general packaging approach and which problems it'd solve. First, keeping the stubs interface intact and making passing a different type of channel to them return a totally different return type violates the Liskov Substitution Principle and breaks any library that takes a channel and does something with it, so in effect you are breaking existing code. It is a valid option as far as I am concerned, but if you are going to do that, why not just bite the bullet and call this thing grpcio v2 (like C# does) and start from scratch and not have that ugly ".aio" in the name? If you are seeing both of these coexist in some fashion in a program in any shape or form, I think it's unreasonable for them to operate the way they do now. Async-ness is not a property of the channel. It's more a property of a the stub, and other solutions are feasible too: e.g. you could be importing `test_pb2_grpc_async` instead of `test_pb2_grpc`. I am not at all clear what problems this "overloading stub based on channel's inherent type" is going to solve. To add, this basically will prevent you from ever eliminating the old implementation from the library because new programs might be using it in some areas. I feel you are straddling between keeping the old API and new one in a way is uncomfortable enough that won't help any users practically, but not big enough to liberate yourself into innovating. My suggestion is either emulate the old API almost completely and support side-by-side sync and async on servers and channels, or simply break away and launch this new thing as grpcio v2.x under "grpc" namespace. You can keep maintaining grpc v1.x and freeze it at v1.y at your peril for compatibility reasons. I think going with v2 will help you completely cleanup the bad stuff like bidi-streaming with generators. 

Cheers,
Mehrdad

Ruslan Nigmatullin

unread,
Aug 12, 2019, 3:27:16 PM8/12/19
to Mehrdad Afshari, Lidi Zheng, Kailash Sethuraman, grpc.io
Using generators for streaming leads to unidiomatic python and
effectively disallows service developer to control the execution flow.

For example, in case of server-side bidirectional streaming it is
impossible to cleanup resources once RPC is done, as execution can
abort somewhere outside of the application stack.
It's also very hard to use bidirectional streaming on client side,
especially if we need to construct requests based on the responses
from server. And it's mostly impossible if you want to coordinate
communication between multiple concurrent streams.
I have never seen yet a real-world gRPC application where
generator-based API would simplify the implementation, it's always
more verbose and error prone compared to direct-invocation-based
approach in grpc-go.

This makes me believe that we should take asyncio as an opportunity to
design a proper streaming API from scratch.

We can also go into technical details regarding how should this API
look like, but first we need to agree on the direction itself.

On Mon, Aug 12, 2019 at 12:12 PM 'Mehrdad Afshari' via grpc.io
> To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/CAM963G1%3DaqxYEcmBPfNXts9NA%2BPESKDjWA0P1acHbgV94aBgtw%40mail.gmail.com.

Lidi Zheng

unread,
Aug 16, 2019, 5:35:51 PM8/16/19
to Mehrdad Afshari, Kailash Sethuraman, Ruslan Nigmatullin, grpc.io
Hi Mehrdad,

What I keep in mind during design the new API is not fallen into Hype Driven Development.

Async IO is not the silver bullet to solve everything. Especially in large-scale distributed system.
Most asynchronous programming in other languages support proper thread, or interruption mechanism.
But not Python, it's very like to cause starvation problem with current Async IO implementation in CPython.

Also, to adopt Async IO, it is not something that requires zero code changes.
Not only the entry point of existing code has to be changed.
Once you tried to implement a complex application using Async IO, you will understand the impact of changing from multi-threading to event loop.
For large companies, the migration will be even harder.
It's not really effective use of time for our users to redesign their big project just to upgrade a framework.
So, we want to help users get through this process smoothly, instead of introducing a completely incompatible version.

Async IO is a great solution for concurrency in Python. And it does solve flaws in current design.
I might be wrong, but I'm in doubt that Async IO will be adopted to a degree of absolute majority soon.
Even if, in the future, Async IO becomes the absolute majority, we can promote the new API and pin those old API as legacy (or removed).
Until then, there isn't a compelling motivation to go for all in Async IO, and break all existing users.

---

As for changing of API, I can feel your pain about bidi-streaming design.
I feel like it is a bigger topic that needs more thinking.

I tried to write my desired usage of streaming API as code snippets in https://github.com/lidizheng/grpc-api-examples/blob/master/Examples.md.
And a slightly larger example of gRPC web crawler in https://github.com/lidizheng/grpc-api-examples.
If you feel strongly, please take a look, or come up with your version of examples.

Bests,
Lidi

Lidi Zheng

unread,
Aug 16, 2019, 5:39:23 PM8/16/19
to Ruslan Nigmatullin, Kailash Sethuraman, grpc.io
Hi Ruslan,

As my previous reply to mmx@, I agree we should redesign the streaming API.
Can you also contribute some code snippets and examples to demonstrate how the new API going to simplify application logic?

Lidi

Ruslan Nigmatullin

unread,
Aug 16, 2019, 6:59:28 PM8/16/19
to Lidi Zheng, Kailash Sethuraman, grpc.io
>> For large companies, the migration will be even harder.
>> It's not really effective use of time for our users to redesign their big project just to upgrade a framework.
>> So, we want to help users get through this process smoothly, instead of introducing a completely incompatible version.

That's a fair point. In addition to that I'd like to point out that
for companies with large codebase it's important to be able to
intermix async and sync stubs over the same channel/server.
Imagine if you have a server with tens of servicers. It will be
impossible to atomically move all of them to asyncio-based interface,
and separating them to different grpc-core servers would lead to
operational headache during the migration (one have to think about
capacity, it will require client-side changes for every individual
servicer change on server-side, and so on).

There is also similar notion at client-side, it's important to be able
to share channel objects between async and sync stubs to reduce
overhead to resolver, balancer, and similar components, and to not
change the state of underlying TCP connections.

Either of this points makes me believe that it's necessary to avoid
separation of channels/servers objects between async and sync
interfaces, and limit changes to public API to the generated stubs
(and perhaps add asyncio version of some currently blocking methods),
e.g. generate `ServicerClient` and `AsyncServicerClient` stubs similar
to how C++ does it. Channel objects can expose `async_unary_unary`
method in addition to `[sync_]unary_unary`, so implementation of
python binding itself can be separated or re-used based on the
complexity of either approach and short/long-term goals.

> Can you also contribute some code snippets and examples to demonstrate how the new API going to simplify application logic?

Sure, let me have a look. Based on quick glance your direct-invocation
design [grpc-api-examples] is the move to right direction and we can
iterate on details.
Reply all
Reply to author
Forward
0 new messages