Guarantees on 'completeness' when streaming big data

21 views
Skip to first unread message

olivier barthelemy

unread,
Sep 9, 2024, 9:23:29 AMSep 9
to grpc.io
I'm currently trying to implement a grpc service.

Some rpcs need to send / receive big ammounts of data (bigger  thatnthe max size of a grpc commands, potentially up to a few gygabytes).

This is notdata that is 'streamed' where you only need the latest state, but really a bit data chunk that neds to be fully sent

Now, it looks like to me that using streaming rpcs  wil not provides enough guarantees.
As far as i have understand, grpc guarantees that
"Streamed items will be received in the same order that they were sent"

But it does not guarantee that "all sent streamed 'parts' will be received".

It also looks like that "The sender will not be able to tell which parts were not received
And that "streamed parts that were not received will not be sent again"

Finally, i'm not even sure what would happen if grpc is able to detect that a streamed part is not received.
Does it consider that a streaming rpc updates a 'state' that you only need the latest, so it will not make a fatal error if a state can be detected as 'not received'.
Or will it fail the straming RPC as soon as something is detected to not have been received, because it might be part of a whole thing that needs to be complete?

So basically, the question, is
"Can i expect a streaming rpc to make sure the receiver has received all parts before telling the operation succeeds? Or if not, can i have the guarantee that it will tell me that it knows not everything was received? If it does, is it able to tell me which part could not be received? Is there something i can/have to add above grpc to ensure that behaviour (that would still be lighter than me implementing my own 'chunking' algo manually)?

Eric Anderson

unread,
Sep 9, 2024, 11:50:38 AMSep 9
to olivier barthelemy, grpc.io
On Mon, Sep 9, 2024 at 6:23 AM olivier barthelemy <perso.olivie...@gmail.com> wrote:
But it does not guarantee that "all sent streamed 'parts' will be received".

Correct. RPCs don't have guaranteed delivery. The RPC itself can fail.

It also looks like that "The sender will not be able to tell which parts were not received
And that "streamed parts that were not received will not be sent again"

The receiver needs to send something back to the sender to inform it what happened. For example, if the sender is a client, then the server responds with a message and OK after it receives everything. Normally if the client doesn't receive the entire response (it gets an error code instead of OK) then it will retry the RPC.

Streams do provide "receiving a message on a stream guarantees all previous messages on the stream were received." If something is lost, then you are guaranteed to get an error on client-side. Servers sometimes will get cancellation, but there are parts of the RPC lifecycle where a server can't know about a failure (e.g., the client OS received the response+OK, and then the client crashed before processing it).

Or will it fail the straming RPC as soon as something is detected to not have been received, because it might be part of a whole thing that needs to be complete?

gRPC will fail the RPC, but the sender won't know exactly what the receiver had received, unless the receiver tells the sender in a message. If you want a resumable transfer, then you can use a bidi stream for the receiver to send ACKs as it receives data. In your case though, assuming you are transferring that data within a data center (such that failures aren't that frequent and bandwidth is cheap), then you'd probably just re-transmit the data if an error occurs. Transfers across continents or across the Internet are the sorts of things that benefit most from resumption.
Reply all
Reply to author
Forward
0 new messages