Safe to modify message object after yielded to a request/response stream? (Python)

30 views
Skip to first unread message

james.oldf...@gmail.com

unread,
Mar 29, 2018, 7:56:18 AM3/29/18
to grpc.io
Consider the following snippet:

    def my_iter(some_request, extra_data_list):
        for item in extra_data_list:
            some_request.extra = item
            yield some_request
            some_request.Clear()
   
    # ... some time later ...
    response = my_stub.streaming_request(my_iter(RequestType(), data))

My question is: Is this safe? My concern is that the yielded request message object could be sent off to another thread for serialisation, but my code could mutate it (to create the next item in the stream) before it gets there.

I suspect that the yielded message is serialised to bytes before the iterator is allowed to continue, which would mean that this is actually safe. But I'd like to be sure that's what's happening, and is guaranteed.

Many thanks!

james.oldf...@gmail.com

unread,
Mar 29, 2018, 8:04:37 AM3/29/18
to grpc.io
Follow up: Would the answer be any different if I was using the non-blocking API:

    request_iter = my_iter(RequestType(), data)
    response_future = my_stub.streaming_request.future(request_iter)

Nathaniel Manista

unread,
Mar 29, 2018, 9:24:38 AM3/29/18
to james.oldf...@gmail.com, grpc.io
On Thu, Mar 29, 2018 at 4:56 AM, <james.oldf...@gmail.com> wrote:
Consider the following snippet:

    def my_iter(some_request, extra_data_list):
        for item in extra_data_list:
            some_request.extra = item
            yield some_request
            some_request.Clear()
   
    # ... some time later ...
    response = my_stub.streaming_request(my_iter(RequestType(), data))

My question is: Is this safe? My concern is that the yielded request message object could be sent off to another thread for serialisation, but my code could mutate it (to create the next item in the stream) before it gets there.

I suspect that the yielded message is serialised to bytes before the iterator is allowed to continue, which would mean that this is actually safe.

This happens to be the case today...

But I'd like to be sure that's what's happening, and is guaranteed.

... but I don't think it's something that we want to guarantee. What's going on in your use case that has you wanting to clear and reuse the same message rather than just create and yield a new one each time?
-Nathaniel 

Nathaniel Manista

unread,
Mar 29, 2018, 9:25:39 AM3/29/18
to james.oldf...@gmail.com, grpc.io
The answer is currently the same and... it's very hard to imagine that we'd change it.
-N

james.oldf...@gmail.com

unread,
Mar 29, 2018, 11:08:09 AM3/29/18
to grpc.io
Thanks a lot for responding Nathaniel.

In honesty the use case is very slight simplification to a utility generator function. The difference is only a couple of lines, and arguably it would be clearer to explicitly create a new request each time anyway. As you don't guarantee this behaviour, even if a change is very unlikely, I'll just go for the safe option.

Sorry to piggyback something else: I often send binary data and associated metadata together over gRPC. The utility function in question is to chunk the binary data into a request stream while including the metadata in the first request of the stream. It's a pity that gRPC doesn't do something like this built-in. I appreciate it's hard given the separation between gRPC and protobuf, but I think a slightly leaky API that takes both a protobuf and a binary payload (or a named list of binary payloads) would be a reasonable compromise. In Python this could even use the buffer protocol for near-C++ efficiency (imagine: here's my protobuf and some numpy arrays). I notice that TensorFlow solves this problem by having a load of custom gRPC code (which I don't understand to be honest!), which I think shows a gap in gRPC's API. But I'm not paying for gRPC, so who am I to complain :=)



On Thursday, March 29, 2018 at 2:24:38 PM UTC+1, Nathaniel Manista wrote:
This happens to be the case today...

Nathaniel Manista

unread,
Apr 2, 2018, 5:28:39 PM4/2/18
to James Oldfield, grpc.io
On Thu, Mar 29, 2018 at 8:08 AM, <james.oldf...@gmail.com> wrote:
In honesty the use case is very slight simplification to a utility generator function. The difference is only a couple of lines, and arguably it would be clearer to explicitly create a new request each time anyway. As you don't guarantee this behaviour, even if a change is very unlikely, I'll just go for the safe option.


def my_iter(extra_data_list):
    for item in extra_data_list:
        yield my_module_pb2.MyMessageClass(extra=item)

Sorry to piggyback something else: I often send binary data and associated metadata together over gRPC. The utility function in question is to chunk the binary data into a request stream while including the metadata in the first request of the stream. It's a pity that gRPC doesn't do something like this built-in.

... are you aware of the "metadata" feature built into gRPC that has nothing to do with Protocol Buffers? Maybe you overlooked it or maybe it's not right for you (it gets translated into HTTP/2 headers on the wire), but... its entire job is to transmit once-per-RPC before-all-requests values from the invocation side of the RPC to the service side (well, there's also "initial metadata" and "trailing metadata" that are transmitted from the service-side to the invocation-side, but we can ignore those for now).

I appreciate it's hard given the separation between gRPC and protobuf, but I think a slightly leaky API that takes both a protobuf and a binary payload (or a named list of binary payloads) would be a reasonable compromise. In Python this could even use the buffer protocol for near-C++ efficiency (imagine: here's my protobuf and some numpy arrays). I notice that TensorFlow solves this problem by having a load of custom gRPC code (which I don't understand to be honest!), which I think shows a gap in gRPC's API. But I'm not paying for gRPC, so who am I to complain :=)

There could be something I'm overlooking or not understanding (one abstraction layer's metadata is another abstraction layer's data, amirite?) but it really, really sounds like you're talking about the metadata feature already present in the gRPC Python API. How big is this binary data? Fifty kilobytes or less?
-Nathaniel
Reply all
Reply to author
Forward
0 new messages