A few questions:1) Under this design, is it possible to add a load balancing constraints for retried/hedged requests? Especially during hedging, I'd like to be able to try a different server since the original server might be garbage collecting or have otherwise collected a queue of requests such that a retry/hedge to this server will not be very useful. Or, perhaps the key I'm looking up lives on a specific subset of storage servers and therefore should be balanced to that specific subset. While that's the domain of a LB policy, what information will hedging/retries provide to the LB policy?2) "Clients cannot override retry policy set by the service config." -- is this intended for inside Google? How about gRPC users outside of Google which don't use the DNS mechanism to push configuration? It seems like having a client override for retry/hedging policy is pragmatic.3) Retry backoff time -- if I'm reading it right, it will always retry in random(0, current_backoff) milliseconds. What's your feeling on this vs. a retry w/ configurable jitter parameter (e.x. linear 1000ms increase w/ 10% jitter). Is it OK if there's no minimum backoff?
random(0, current_backoff)
.random(current_backoff*(1-jitter), current_backoff*(1+jitter))
where jitter would be 0.25, for example, to indicate 25% jitter.Regards,Michael
On Friday, February 10, 2017 at 5:31:01 PM UTC-7, ncte...@google.com wrote:I've created a gRFC describing the design and implementation plan for gRPC Retries.Take a look at the gRPC on Github.
CONFIDENTIALITY NOTICE: This email message, and any documents, files or previous e-mail messages attached to it is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. --
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/62809dba-3349-4a60-9aa9-ccc044d27f53%40googlegroups.com.
A few questions:1) Under this design, is it possible to add a load balancing constraints for retried/hedged requests? Especially during hedging, I'd like to be able to try a different server since the original server might be garbage collecting or have otherwise collected a queue of requests such that a retry/hedge to this server will not be very useful. Or, perhaps the key I'm looking up lives on a specific subset of storage servers and therefore should be balanced to that specific subset. While that's the domain of a LB policy, what information will hedging/retries provide to the LB policy?
2) "Clients cannot override retry policy set by the service config." -- is this intended for inside Google? How about gRPC users outside of Google which don't use the DNS mechanism to push configuration? It seems like having a client override for retry/hedging policy is pragmatic.
3) Retry backoff time -- if I'm reading it right, it will always retry in random(0, current_backoff) milliseconds. What's your feeling on this vs. a retry w/ configurable jitter parameter (e.x. linear 1000ms increase w/ 10% jitter). Is it OK if there's no minimum backoff?
Regards,Michael
On Friday, February 10, 2017 at 5:31:01 PM UTC-7, ncte...@google.com wrote:I've created a gRFC describing the design and implementation plan for gRPC Retries.Take a look at the gRPC on Github.
Hi Michael,
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
Hi Michael,Thanks for the feedback. Responses to your questions (and Josh's follow-up question on retry backoff times) are inline below.On Sat, Feb 11, 2017 at 1:57 PM, 'Michael Rose' via grpc.io <grp...@googlegroups.com> wrote:A few questions:1) Under this design, is it possible to add a load balancing constraints for retried/hedged requests? Especially during hedging, I'd like to be able to try a different server since the original server might be garbage collecting or have otherwise collected a queue of requests such that a retry/hedge to this server will not be very useful. Or, perhaps the key I'm looking up lives on a specific subset of storage servers and therefore should be balanced to that specific subset. While that's the domain of a LB policy, what information will hedging/retries provide to the LB policy?We are not supporting explicit load balancing constraints for retries. The retry attempt or hedged RPC will be re-resolved through the load-balancer, so it's up to the service owner to ensure that this has a low-likelihood of issuing the request to the same backend. This is part of a decision to keep the retry design as simple as possible while satisfying the majority of use cases. If your load-balancing policy has a high likelihood of sending requests to the same server each time, hedging (and to some extent retries) will be less useful regardless. There will be metadata attached to the call indicating that it's a retry, but it won't include information about which servers the previous requests went to.2) "Clients cannot override retry policy set by the service config." -- is this intended for inside Google? How about gRPC users outside of Google which don't use the DNS mechanism to push configuration? It seems like having a client override for retry/hedging policy is pragmatic.In general, we don't want to support client specification of retry policies. The necessary information about what methods are safe to retry or hedge, the potential for increased load, etc., are really decisions that should be left to the service owner. The retry policy will definitely be a part of the service config. While there are still some security-related discussions about the exact delivery mechanism for the service config and retry policies, I think your concern here should be part of the service config design discussion rather than something specific to retry support.3) Retry backoff time -- if I'm reading it right, it will always retry in random(0, current_backoff) milliseconds. What's your feeling on this vs. a retry w/ configurable jitter parameter (e.x. linear 1000ms increase w/ 10% jitter). Is it OK if there's no minimum backoff?You are reading the backoff time correctly. There are a number of ways of doing this, (see https://www.awsarchitectureblog.com/2015/03/backoff.html) but choosing between random(0, current_backoff) is done intentionally and should generally give the best results. We do not want a configurable "jitter" parameter. Empirically, the retries should have more varied backoff time, and we also do not want to let service owners specify very low values for jitter (e.g., 1% or even 0), as this would cluster all retries tightly together and further contribute to server overloading.
--Best,Eric GribkoffRegards,Michael
On Friday, February 10, 2017 at 5:31:01 PM UTC-7, ncte...@google.com wrote:I've created a gRFC describing the design and implementation plan for gRPC Retries.Take a look at the gRPC on Github.
CONFIDENTIALITY NOTICE: This email message, and any documents, files or previous e-mail messages attached to it is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. --
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/62809dba-3349-4a60-9aa9-ccc044d27f53%40googlegroups.com.
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/CALUXJ7hL9Y%2BZo8iyPC2RjtFgSQbHcEmdDQcA4BwHp2hkpiEMhQ%40mail.gmail.com.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/ce59f63d-1dee-46ff-a3eb-c813d15fc2dc%40googlegroups.com.
I've created a gRFC describing the design and implementation plan for gRPC Retries.Take a look at the gRPC on Github.
--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/30e29cbc-439c-46c4-b54f-6e97637a0735%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
While talking with Craig on Friday, we realized that we need to make the wire protocol a bit stricter in order to implement retries.Currently, the spec allows status to be sent either as part of initial metadata or trailing metadata. However, as per the When Retries are Valid section of the gRFC, an RFC becomes committed when "the client receives a non-error response (either an explicit OK status or any response message) from the server". This means that in a case where the server sends a retryable status, if the status is not included in the initial metadata, the client will consider the RPC committed as soon as it receives the initial metadata, even if the only thing sent after that is the trailing metadata that includes the status. Thus, we need to require that whenever the server sends status without sending any messages, the server should include the status in the initial metadata (and then close the stream without bothering to send trailing metadata) instead of sending both initial metadata and then trailing metadata.Noah, can you please add a note about this to the gRFC?Based on a previously encounted interop problem (see https://github.com/markdroth/grpc/pull/3, which was included in https://github.com/grpc/grpc/pull/7201), I believe that grpc-go already does the right thing here (although Saila and Menghan should confirm that). However, since that previously encountered problem did not show up with Java or C++, I suspect that those stacks do not do the right thing here.Craig has confirmed that C-core needs to be fixed in this regard, and I've filed https://github.com/grpc/grpc/issues/9883 for that change.Eric and Penn, can you confirm that Java will need to be changed? I'm hoping that this isn't too invasive of a change, but please let us know if you foresee any problems.Please let me know if anyone has any questions or problems with any of this. Thanks!
I've created a gRFC describing the design and implementation plan for gRPC Retries.Take a look at the gRPC on Github.
--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/30e29cbc-439c-46c4-b54f-6e97637a0735%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Right, let me chip in and continue the discussion from: https://github.com/grpc/proposal/pull/12#issuecomment-283063869 here. My comments are based on experience building a gRPC-Go interceptor for retries and using it in production at Improbable. It's important to note that we're pretty heavy users of gRPC (using it across 3 languages: Go, Java and C++), as we have quite a few people around (myself included) who are familiar with gRPC/Stubby from their prior jobs.Now, to recap the points:Thank you for the comments. We are trying to keep high-level discussion on the email thread (see here) but my responses to your points (b) and (c) are below.> > b) retry logic would benefit a lot from knowing whether the method is idempotent or not. II understand that this is supposed to be handled by "service configs", but realistically they're really hard to use. Few people would put their retry logic in DNS TXT entries, and even fewer people operate the gRPC LB protocol. Can we consider adding a .proto option (annotation) to Method definitions?> The re are two concerns here. One is that saying a method is idempotent really just means "retry on status codes x y and z". If we pick a canonical set of idempotent status codes, we are forcing every service owner to obey these semantics if they want to use retries. However, the gRPC status code mechanism is very flexible (even allowing services to return UNAVAILABLE arbitrarily) so we'd prefer to force service owners to consider the semantics of their application and pick a concrete set of status codes rather than just flipping an "idempotent" switch. The second concern is around the ease of use of the service config. The intent is for the service config to be a universally useful mechanism, and we want to avoid just baking everything into the proto. Concerns about the delivery mechanism for service config shouldn't invalidate its use for encoding retry policy, and may be something we have to tackle separately.I appreciate the push for the service config, and given my past SRE experience, I totally appreciate it. However, in the open source world simple solutions seem to get most traction. I would be hesitant to tie a very very important feature (such as retries) to the service config adoption.I understand your concerns around the flexibility of service codes, and I wasn't advocating being prescriptive about it. However, I do think that having an option inside the .proto is a very valid approach. As a user of gRPC for internal and external purposes there are three cases how I use .protos as interfaces:* have internal teams use each-other's services, in which case they code against code-generated interfaces that come out of .proto files* to our end users provide a set of "published" .proto files and guides of how to generate and use them in the language of their choice through gRPC code generation (the true power of gRPC).* to our end users provide rich client APIs for any language, in which case all bets are off and any thing can be implementedAs such, for both external and internal services, the .proto is the canonical contract, controlled by the team building the service. Thus having something like the following is satisfying the most common use case:rpc RemoveTag(TagRemoveRequest) returns (google.protobuf.Empty) {option (grpc.extensions) = {retriable_codes: ["UNAVAILABLE", "RESOURCE_EXHAUSTED"]};};
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/40e2a470-0e4c-4687-b250-a00afc76f38f%40googlegroups.com.
Right, let me chip in and continue the discussion from: https://github.com/grpc/proposal/pull/12#issuecomment-283063869 here. My comments are based on experience building a gRPC-Go interceptor for retries and using it in production at Improbable. It's important to note that we're pretty heavy users of gRPC (using it across 3 languages: Go, Java and C++), as we have quite a few people around (myself included) who are familiar with gRPC/Stubby from their prior jobs.Now, to recap the points:Thank you for the comments. We are trying to keep high-level discussion on the email thread (see here) but my responses to your points (b) and (c) are below.> > b) retry logic would benefit a lot from knowing whether the method is idempotent or not. II understand that this is supposed to be handled by "service configs", but realistically they're really hard to use. Few people would put their retry logic in DNS TXT entries, and even fewer people operate the gRPC LB protocol. Can we consider adding a .proto option (annotation) to Method definitions?> The re are two concerns here. One is that saying a method is idempotent really just means "retry on status codes x y and z". If we pick a canonical set of idempotent status codes, we are forcing every service owner to obey these semantics if they want to use retries. However, the gRPC status code mechanism is very flexible (even allowing services to return UNAVAILABLE arbitrarily) so we'd prefer to force service owners to consider the semantics of their application and pick a concrete set of status codes rather than just flipping an "idempotent" switch. The second concern is around the ease of use of the service config. The intent is for the service config to be a universally useful mechanism, and we want to avoid just baking everything into the proto. Concerns about the delivery mechanism for service config shouldn't invalidate its use for encoding retry policy, and may be something we have to tackle separately.I appreciate the push for the service config, and given my past SRE experience, I totally appreciate it. However, in the open source world simple solutions seem to get most traction. I would be hesitant to tie a very very important feature (such as retries) to the service config adoption.I understand your concerns around the flexibility of service codes, and I wasn't advocating being prescriptive about it. However, I do think that having an option inside the .proto is a very valid approach. As a user of gRPC for internal and external purposes there are three cases how I use .protos as interfaces:* have internal teams use each-other's services, in which case they code against code-generated interfaces that come out of .proto files* to our end users provide a set of "published" .proto files and guides of how to generate and use them in the language of their choice through gRPC code generation (the true power of gRPC).* to our end users provide rich client APIs for any language, in which case all bets are off and any thing can be implementedAs such, for both external and internal services, the .proto is the canonical contract, controlled by the team building the service. Thus having something like the following is satisfying the most common use case:rpc RemoveTag(TagRemoveRequest) returns (google.protobuf.Empty) {option (grpc.extensions) = {retriable_codes: ["UNAVAILABLE", "RESOURCE_EXHAUSTED"]};};> > c) One thing I found out useful "in the wild" is the ability to limit the Deadline of the retriable call. For example, the "parent" RPC call (user invoked) has a deadline of 5s, but each retriable call only 1s. This allows you to skip a "deadlining" server and retry against one that works.> This is covered by our hedging policy. There doesn't seem to be any reason to cancel the first RPC in your scenario, as it may be just about to complete on the server and cancellation implies the work done so far is wasted. Instead, hedging allows you to send the first request, wait one second, send a second request, and accept the results of whichever completes first.Ok, this makes sense. However, can we make sure that if the second hedged request completes before the first one, we make sure that the spec expects to CANCEL the first call, so we can potentially free up resources? One unfortunate thing of working outside an environment where everything is Stubby, is that a lot of the time request handling holds up resources. For example, you establish another HTTP1.1 connection to a backend as part of serving your RPC. I'll augment my gRPC interceptors for Go to use this.
On Monday, 27 February 2017 16:53:20 UTC, Mark D. Roth wrote:While talking with Craig on Friday, we realized that we need to make the wire protocol a bit stricter in order to implement retries.Currently, the spec allows status to be sent either as part of initial metadata or trailing metadata. However, as per the When Retries are Valid section of the gRFC, an RFC becomes committed when "the client receives a non-error response (either an explicit OK status or any response message) from the server". This means that in a case where the server sends a retryable status, if the status is not included in the initial metadata, the client will consider the RPC committed as soon as it receives the initial metadata, even if the only thing sent after that is the trailing metadata that includes the status. Thus, we need to require that whenever the server sends status without sending any messages, the server should include the status in the initial metadata (and then close the stream without bothering to send trailing metadata) instead of sending both initial metadata and then trailing metadata.Noah, can you please add a note about this to the gRFC?Based on a previously encounted interop problem (see https://github.com/markdroth/grpc/pull/3, which was included in https://github.com/grpc/grpc/pull/7201), I believe that grpc-go already does the right thing here (although Saila and Menghan should confirm that). However, since that previously encountered problem did not show up with Java or C++, I suspect that those stacks do not do the right thing here.Craig has confirmed that C-core needs to be fixed in this regard, and I've filed https://github.com/grpc/grpc/issues/9883 for that change.Eric and Penn, can you confirm that Java will need to be changed? I'm hoping that this isn't too invasive of a change, but please let us know if you foresee any problems.Please let me know if anyone has any questions or problems with any of this. Thanks!On Fri, Feb 10, 2017 at 4:31 PM, ncteisen via grpc.io <grp...@googlegroups.com> wrote:--I've created a gRFC describing the design and implementation plan for gRPC Retries.Take a look at the gRPC on Github.
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/30e29cbc-439c-46c4-b54f-6e97637a0735%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/40e2a470-0e4c-4687-b250-a00afc76f38f%40googlegroups.com.
While talking with Craig on Friday, we realized that we need to make the wire protocol a bit stricter in order to implement retries.Currently, the spec allows status to be sent either as part of initial metadata or trailing metadata.
However, as per the When Retries are Valid section of the gRFC, an RFC becomes committed when "the client receives a non-error response (either an explicit OK status or any response message) from the server".
This means that in a case where the server sends a retryable status, if the status is not included in the initial metadata, the client will consider the RPC committed as soon as it receives the initial metadata, even if the only thing sent after that is the trailing metadata that includes the status.
Thus, we need to require that whenever the server sends status without sending any messages, the server should include the status in the initial metadata (and then close the stream without bothering to send trailing metadata) instead of sending both initial metadata and then trailing metadata.
Based on a previously encounted interop problem (see https://github.com/markdroth/grpc/pull/3, which was included in https://github.com/grpc/grpc/pull/7201), I believe that grpc-go already does the right thing here (although Saila and Menghan should confirm that). However, since that previously encountered problem did not show up with Java or C++, I suspect that those stacks do not do the right thing here.
On Mon, Feb 27, 2017 at 8:53 AM, 'Mark D. Roth' via grpc.io <grp...@googlegroups.com> wrote:However, as per the When Retries are Valid section of the gRFC, an RFC becomes committed when "the client receives a non-error response (either an explicit OK status or any response message) from the server".Just to be clear, the only time "an explicit OK status" would matter is with a streaming call. In a unary call the OK status will always be after the response message.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/CADujcOsb0k4OWuArqsW6HFcSZweAvVzeNaumGQBnrLdV%3D%3DZkYA%40mail.gmail.com.
On Mon, Feb 27, 2017 at 8:53 AM, 'Mark D. Roth' via grpc.io <grp...@googlegroups.com> wrote:While talking with Craig on Friday, we realized that we need to make the wire protocol a bit stricter in order to implement retries.Currently, the spec allows status to be sent either as part of initial metadata or trailing metadata.Currently the spec doesn't say when it is appropriate. This is because the spec is only on the HTTP/2 level and doesn't actually define gRPC semantics.I think you mean HTTP headers and trailers instead of using the term "metadata." gRPC always has trailing metadata, but may not have initial metadata. Status must come on the trailing metadata. In HTTP parlance, it may come in the initial headers only when those initial headers are the end of the response.However, as per the When Retries are Valid section of the gRFC, an RFC becomes committed when "the client receives a non-error response (either an explicit OK status or any response message) from the server".Just to be clear, the only time "an explicit OK status" would matter is with a streaming call. In a unary call the OK status will always be after the response message.
This means that in a case where the server sends a retryable status, if the status is not included in the initial metadata, the client will consider the RPC committed as soon as it receives the initial metadata, even if the only thing sent after that is the trailing metadata that includes the status.What? That does not seem to be a proper understanding of the text, or the text is wrongly worded. Why would the RPC be "committed as soon as it receives the initial metadata"? That isn't in the text... In your example it seems it would be committed at "the trailing metadata that includes a status" as long as that status was OK, as per the "an explicit OK status" in the text.
Thus, we need to require that whenever the server sends status without sending any messages, the server should include the status in the initial metadata (and then close the stream without bothering to send trailing metadata) instead of sending both initial metadata and then trailing metadata.This is generally good practice assuming you mean "headers" instead of "metadata". But I don't see any argument here for requiring it and I don't see any impact to retry.Since an application can force initial headers to be sent (at least in Java), this can't really be a strong requirement. Java does do this generally though, as was required for our Auth support and similar conversion of gRPC status codes to HTTP status codes.
Based on a previously encounted interop problem (see https://github.com/markdroth/grpc/pull/3, which was included in https://github.com/grpc/grpc/pull/7201), I believe that grpc-go already does the right thing here (although Saila and Menghan should confirm that). However, since that previously encountered problem did not show up with Java or C++, I suspect that those stacks do not do the right thing here.If my correction of the nomenclature is correct, then Java already does this for the most part. This isn't something that can be enforced in Java. But the normal stub delays sending the initial metadata until the first response message. If the call is completed without any message, then only trailing metadata is sent.
--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/CA%2B4M1oM3EeiLrbDUCkpafUZV7%3DBXN6KRp%3DJ6z8Q-OtYqCmpKCQ%40mail.gmail.com.
On Wed, Mar 1, 2017 at 10:20 AM, 'Eric Anderson' via grpc.io <grp...@googlegroups.com> wrote:What? That does not seem to be a proper understanding of the text, or the text is wrongly worded. Why would the RPC be "committed as soon as it receives the initial metadata"? That isn't in the text... In your example it seems it would be committed at "the trailing metadata that includes a status" as long as that status was OK, as per the "an explicit OK status" in the text.The language in the above quote is probably not as specific as it should be, at least with respect to the wire protocol. The intent here is that the RPC should be considered committed when it receives either initial metadata or a payload message.
It is necessary that receiving initial metadata commits the RPC, because we need to report the initial metadata to the caller when it arrives.
If my correction of the nomenclature is correct, then Java already does this for the most part. This isn't something that can be enforced in Java. But the normal stub delays sending the initial metadata until the first response message. If the call is completed without any message, then only trailing metadata is sent.Interesting. If that's the case, then why did that interop test only fail with Go, not with Java?
--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/CA%2B4M1oMXxH55qXb8Mne9mYJgp1L2eF_C29Z%2B6pLT0cB1gxBaHw%40mail.gmail.com.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/ce59f63d-1dee-46ff-a3eb-c813d15fc2dc%40googlegroups.com.
We’re hiring awesome people!
See our open positionsTo view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/CAJgRgou8J%2BLJ3zngxBsVz61rgSWuCVmvkeMxCt-GaiiMdXPYSA%40mail.gmail.com.
I think the terminology here gets confusing between initial/trailing metadata, gRPC rule names, and HTTP/2 frame types. Our retry design doc was indeed underspecified in regards to dealing with initial metadata, and will be updated. I go over all of the considerations in detail below.For clarity, I will use all caps for the names of HTTP/2 frame types, e.g., HEADERS frame, and use the capitalized gRPC rule names from the specification.The gRPC specification ensures that a status (containing a gRPC status code) is only sent in Trailers, which is contained in an HTTP/2 HEADERS frame. The only way that the gRPC status code can be contained in the first HTTP/2 frame received is if the server sends a Trailers-Only response.Otherwise, the gRPC spec mandates that the first frame sent be the Response-Headers (again, sent in an HTTP/2 HEADERS frame). Response-Headers includes (optional) Custom-Metadata, which is usually what we are talking about when we say "initial metadata".Regardless of whether the Response-Headers includes anything in its Custom-Metadata, if the gRPC client library notifies the client application layer of what metadata is (or is not) included, we now have to view the RPC as committed, aka no longer retryable. This is the only option, as a later retry attempt could receive different Custom-Metadata, contradicting what we've already told the client application layer.We cannot include gRPC status codes in the Response-Headers along with "initial metadata". It's perfectly valid according to the spec for a server to send metadata along a stream in its Response-Headers, wait for one hour, then (without having sent any messages), close the stream with a retryable error.However, the proposal that a server include the gRPC status code (if known) in the initial response is still sound. Concretely, this means: if a gRPC server has not yet sent Response-Headers and receives an error response, it should send a Trailers-Only response containing the gRPC status code. This would allow retry attempts on the client-side to proceed, if applicable. This is going to be superior to sending Response-Headers immediately followed by Trailers, which would cause the RPC to become committed on the client side (if the Response-Header metadata is made available to the client application layer) and stop retry attempts.We still can encounter the case where a server intentionally sends Response-Headers to open a stream, then eventually closes the stream with an error without ever sending any messages. Such cases would not be retryable, but I think it's fair to argue that if the server *has* to send metadata in advance of sending any responses, that metadata is actually a response, and should be treated as such (i.e., their metadata just ensured the RPC will be committed on the client-side).Rather than either explicitly disallowing such behavior by modifying some specification (this behavior is currently entirely unspecified, so while specification is worthwhile, it should be separate from the retry policy design currently under discussion), we can just change the default server behavior of C++, and Go if necessary, to match Java. In Java servers, the Response-Headers are delayed until some response message is sent. If the server application returns an error status before sending a message, then Trailers-Only is sent instead of Response-Headers.We can also leave it up to the gRPC client library implementation to decide when an RPC is committed based on received Response-Headers. If and while the client library can guarantee that the presence (or absence) of initial metadata is not visible to the client application layer, the RPC can be considered uncommitted. This is an implementation detail that should very rarely be necessary if the above change is made to default server behavior, but it would not violate anything in the retry spec or semantics.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/CALUXJ7g%3DqP5qu7jTFBmka3GvSeuH1D2SSKN%3DEHEwV-jLGt4zzg%40mail.gmail.com.
On Wed, Mar 1, 2017 at 2:47 PM, 'Eric Gribkoff' via grpc.io <grp...@googlegroups.com> wrote:I think the terminology here gets confusing between initial/trailing metadata, gRPC rule names, and HTTP/2 frame types. Our retry design doc was indeed underspecified in regards to dealing with initial metadata, and will be updated. I go over all of the considerations in detail below.For clarity, I will use all caps for the names of HTTP/2 frame types, e.g., HEADERS frame, and use the capitalized gRPC rule names from the specification.The gRPC specification ensures that a status (containing a gRPC status code) is only sent in Trailers, which is contained in an HTTP/2 HEADERS frame. The only way that the gRPC status code can be contained in the first HTTP/2 frame received is if the server sends a Trailers-Only response.Otherwise, the gRPC spec mandates that the first frame sent be the Response-Headers (again, sent in an HTTP/2 HEADERS frame). Response-Headers includes (optional) Custom-Metadata, which is usually what we are talking about when we say "initial metadata".Regardless of whether the Response-Headers includes anything in its Custom-Metadata, if the gRPC client library notifies the client application layer of what metadata is (or is not) included, we now have to view the RPC as committed, aka no longer retryable. This is the only option, as a later retry attempt could receive different Custom-Metadata, contradicting what we've already told the client application layer.We cannot include gRPC status codes in the Response-Headers along with "initial metadata". It's perfectly valid according to the spec for a server to send metadata along a stream in its Response-Headers, wait for one hour, then (without having sent any messages), close the stream with a retryable error.However, the proposal that a server include the gRPC status code (if known) in the initial response is still sound. Concretely, this means: if a gRPC server has not yet sent Response-Headers and receives an error response, it should send a Trailers-Only response containing the gRPC status code. This would allow retry attempts on the client-side to proceed, if applicable. This is going to be superior to sending Response-Headers immediately followed by Trailers, which would cause the RPC to become committed on the client side (if the Response-Header metadata is made available to the client application layer) and stop retry attempts.We still can encounter the case where a server intentionally sends Response-Headers to open a stream, then eventually closes the stream with an error without ever sending any messages. Such cases would not be retryable, but I think it's fair to argue that if the server *has* to send metadata in advance of sending any responses, that metadata is actually a response, and should be treated as such (i.e., their metadata just ensured the RPC will be committed on the client-side).Rather than either explicitly disallowing such behavior by modifying some specification (this behavior is currently entirely unspecified, so while specification is worthwhile, it should be separate from the retry policy design currently under discussion), we can just change the default server behavior of C++, and Go if necessary, to match Java. In Java servers, the Response-Headers are delayed until some response message is sent. If the server application returns an error status before sending a message, then Trailers-Only is sent instead of Response-Headers.We can also leave it up to the gRPC client library implementation to decide when an RPC is committed based on received Response-Headers. If and while the client library can guarantee that the presence (or absence) of initial metadata is not visible to the client application layer, the RPC can be considered uncommitted. This is an implementation detail that should very rarely be necessary if the above change is made to default server behavior, but it would not violate anything in the retry spec or semantics.I think that leaving this unspecified will lead to interoperability problems in the future. I would rather have the spec be explicit about this, so that all future client and server implementations can interoperate cleanly.
I've update the gRFC document to include the latest discussions here.On Thu, Mar 2, 2017 at 7:20 AM, Mark D. Roth <ro...@google.com> wrote:On Wed, Mar 1, 2017 at 2:47 PM, 'Eric Gribkoff' via grpc.io <grp...@googlegroups.com> wrote:I think the terminology here gets confusing between initial/trailing metadata, gRPC rule names, and HTTP/2 frame types. Our retry design doc was indeed underspecified in regards to dealing with initial metadata, and will be updated. I go over all of the considerations in detail below.For clarity, I will use all caps for the names of HTTP/2 frame types, e.g., HEADERS frame, and use the capitalized gRPC rule names from the specification.The gRPC specification ensures that a status (containing a gRPC status code) is only sent in Trailers, which is contained in an HTTP/2 HEADERS frame. The only way that the gRPC status code can be contained in the first HTTP/2 frame received is if the server sends a Trailers-Only response.Otherwise, the gRPC spec mandates that the first frame sent be the Response-Headers (again, sent in an HTTP/2 HEADERS frame). Response-Headers includes (optional) Custom-Metadata, which is usually what we are talking about when we say "initial metadata".Regardless of whether the Response-Headers includes anything in its Custom-Metadata, if the gRPC client library notifies the client application layer of what metadata is (or is not) included, we now have to view the RPC as committed, aka no longer retryable. This is the only option, as a later retry attempt could receive different Custom-Metadata, contradicting what we've already told the client application layer.We cannot include gRPC status codes in the Response-Headers along with "initial metadata". It's perfectly valid according to the spec for a server to send metadata along a stream in its Response-Headers, wait for one hour, then (without having sent any messages), close the stream with a retryable error.However, the proposal that a server include the gRPC status code (if known) in the initial response is still sound. Concretely, this means: if a gRPC server has not yet sent Response-Headers and receives an error response, it should send a Trailers-Only response containing the gRPC status code. This would allow retry attempts on the client-side to proceed, if applicable. This is going to be superior to sending Response-Headers immediately followed by Trailers, which would cause the RPC to become committed on the client side (if the Response-Header metadata is made available to the client application layer) and stop retry attempts.We still can encounter the case where a server intentionally sends Response-Headers to open a stream, then eventually closes the stream with an error without ever sending any messages. Such cases would not be retryable, but I think it's fair to argue that if the server *has* to send metadata in advance of sending any responses, that metadata is actually a response, and should be treated as such (i.e., their metadata just ensured the RPC will be committed on the client-side).Rather than either explicitly disallowing such behavior by modifying some specification (this behavior is currently entirely unspecified, so while specification is worthwhile, it should be separate from the retry policy design currently under discussion), we can just change the default server behavior of C++, and Go if necessary, to match Java. In Java servers, the Response-Headers are delayed until some response message is sent. If the server application returns an error status before sending a message, then Trailers-Only is sent instead of Response-Headers.We can also leave it up to the gRPC client library implementation to decide when an RPC is committed based on received Response-Headers. If and while the client library can guarantee that the presence (or absence) of initial metadata is not visible to the client application layer, the RPC can be considered uncommitted. This is an implementation detail that should very rarely be necessary if the above change is made to default server behavior, but it would not violate anything in the retry spec or semantics.I think that leaving this unspecified will lead to interoperability problems in the future. I would rather have the spec be explicit about this, so that all future client and server implementations can interoperate cleanly.It's fair to say in the retry design that we must count an RPC as committed as soon the Response-Headers arrive, and the doc now states this explicitly.If you mean that we also need to change the gRPC spec to say *when* the server sends Response-Headers, I disagree. This is outside of the scope of a retry design. Retries will work fine whenever servers choose to send Response-Headers: since Response-Headers include initial metadata, which can contain arbitrary information, this is exactly the same from a retry perspective as the server sending any other response, and it commits the RPC. We can go so far as saying servers *should* delay sending Response-Headers until a message is sent by the server application layer, and the doc now states this explicitly.Changing the gRPC spec to say that servers *must* delay sending Response-Headers until a message is sent may be a good idea, but it is not a requirement for retries and, in my opinion, should be left to a separate discussion. The semantics and operations of a retry policy are already clear, regardless of when servers choose to send Response-Headers, and the existing spec already allows the desirable behavior for retries with the Trailers-Only frame.
On Thu, Mar 2, 2017 at 8:09 AM, Eric Gribkoff <ericgr...@google.com> wrote:I've update the gRFC document to include the latest discussions here.On Thu, Mar 2, 2017 at 7:20 AM, Mark D. Roth <ro...@google.com> wrote:On Wed, Mar 1, 2017 at 2:47 PM, 'Eric Gribkoff' via grpc.io <grp...@googlegroups.com> wrote:I think the terminology here gets confusing between initial/trailing metadata, gRPC rule names, and HTTP/2 frame types. Our retry design doc was indeed underspecified in regards to dealing with initial metadata, and will be updated. I go over all of the considerations in detail below.For clarity, I will use all caps for the names of HTTP/2 frame types, e.g., HEADERS frame, and use the capitalized gRPC rule names from the specification.The gRPC specification ensures that a status (containing a gRPC status code) is only sent in Trailers, which is contained in an HTTP/2 HEADERS frame. The only way that the gRPC status code can be contained in the first HTTP/2 frame received is if the server sends a Trailers-Only response.Otherwise, the gRPC spec mandates that the first frame sent be the Response-Headers (again, sent in an HTTP/2 HEADERS frame). Response-Headers includes (optional) Custom-Metadata, which is usually what we are talking about when we say "initial metadata".Regardless of whether the Response-Headers includes anything in its Custom-Metadata, if the gRPC client library notifies the client application layer of what metadata is (or is not) included, we now have to view the RPC as committed, aka no longer retryable. This is the only option, as a later retry attempt could receive different Custom-Metadata, contradicting what we've already told the client application layer.We cannot include gRPC status codes in the Response-Headers along with "initial metadata". It's perfectly valid according to the spec for a server to send metadata along a stream in its Response-Headers, wait for one hour, then (without having sent any messages), close the stream with a retryable error.However, the proposal that a server include the gRPC status code (if known) in the initial response is still sound. Concretely, this means: if a gRPC server has not yet sent Response-Headers and receives an error response, it should send a Trailers-Only response containing the gRPC status code. This would allow retry attempts on the client-side to proceed, if applicable. This is going to be superior to sending Response-Headers immediately followed by Trailers, which would cause the RPC to become committed on the client side (if the Response-Header metadata is made available to the client application layer) and stop retry attempts.We still can encounter the case where a server intentionally sends Response-Headers to open a stream, then eventually closes the stream with an error without ever sending any messages. Such cases would not be retryable, but I think it's fair to argue that if the server *has* to send metadata in advance of sending any responses, that metadata is actually a response, and should be treated as such (i.e., their metadata just ensured the RPC will be committed on the client-side).Rather than either explicitly disallowing such behavior by modifying some specification (this behavior is currently entirely unspecified, so while specification is worthwhile, it should be separate from the retry policy design currently under discussion), we can just change the default server behavior of C++, and Go if necessary, to match Java. In Java servers, the Response-Headers are delayed until some response message is sent. If the server application returns an error status before sending a message, then Trailers-Only is sent instead of Response-Headers.We can also leave it up to the gRPC client library implementation to decide when an RPC is committed based on received Response-Headers. If and while the client library can guarantee that the presence (or absence) of initial metadata is not visible to the client application layer, the RPC can be considered uncommitted. This is an implementation detail that should very rarely be necessary if the above change is made to default server behavior, but it would not violate anything in the retry spec or semantics.I think that leaving this unspecified will lead to interoperability problems in the future. I would rather have the spec be explicit about this, so that all future client and server implementations can interoperate cleanly.It's fair to say in the retry design that we must count an RPC as committed as soon the Response-Headers arrive, and the doc now states this explicitly.If you mean that we also need to change the gRPC spec to say *when* the server sends Response-Headers, I disagree. This is outside of the scope of a retry design. Retries will work fine whenever servers choose to send Response-Headers: since Response-Headers include initial metadata, which can contain arbitrary information, this is exactly the same from a retry perspective as the server sending any other response, and it commits the RPC. We can go so far as saying servers *should* delay sending Response-Headers until a message is sent by the server application layer, and the doc now states this explicitly.Changing the gRPC spec to say that servers *must* delay sending Response-Headers until a message is sent may be a good idea, but it is not a requirement for retries and, in my opinion, should be left to a separate discussion. The semantics and operations of a retry policy are already clear, regardless of when servers choose to send Response-Headers, and the existing spec already allows the desirable behavior for retries with the Trailers-Only frame.I agree that we don't need to say anything about whether or not the server delays sending Response-Headers until a message is sent. However, I think we should say that if the server is going to immediately signal failure without sending any messages, it should send Trailers-Only instead of Response-Headers followed by Trailers.
On Thu, Mar 2, 2017 at 8:15 AM, Mark D. Roth <ro...@google.com> wrote:On Thu, Mar 2, 2017 at 8:09 AM, Eric Gribkoff <ericgr...@google.com> wrote:I've update the gRFC document to include the latest discussions here.On Thu, Mar 2, 2017 at 7:20 AM, Mark D. Roth <ro...@google.com> wrote:On Wed, Mar 1, 2017 at 2:47 PM, 'Eric Gribkoff' via grpc.io <grp...@googlegroups.com> wrote:I think the terminology here gets confusing between initial/trailing metadata, gRPC rule names, and HTTP/2 frame types. Our retry design doc was indeed underspecified in regards to dealing with initial metadata, and will be updated. I go over all of the considerations in detail below.For clarity, I will use all caps for the names of HTTP/2 frame types, e.g., HEADERS frame, and use the capitalized gRPC rule names from the specification.The gRPC specification ensures that a status (containing a gRPC status code) is only sent in Trailers, which is contained in an HTTP/2 HEADERS frame. The only way that the gRPC status code can be contained in the first HTTP/2 frame received is if the server sends a Trailers-Only response.Otherwise, the gRPC spec mandates that the first frame sent be the Response-Headers (again, sent in an HTTP/2 HEADERS frame). Response-Headers includes (optional) Custom-Metadata, which is usually what we are talking about when we say "initial metadata".Regardless of whether the Response-Headers includes anything in its Custom-Metadata, if the gRPC client library notifies the client application layer of what metadata is (or is not) included, we now have to view the RPC as committed, aka no longer retryable. This is the only option, as a later retry attempt could receive different Custom-Metadata, contradicting what we've already told the client application layer.We cannot include gRPC status codes in the Response-Headers along with "initial metadata". It's perfectly valid according to the spec for a server to send metadata along a stream in its Response-Headers, wait for one hour, then (without having sent any messages), close the stream with a retryable error.However, the proposal that a server include the gRPC status code (if known) in the initial response is still sound. Concretely, this means: if a gRPC server has not yet sent Response-Headers and receives an error response, it should send a Trailers-Only response containing the gRPC status code. This would allow retry attempts on the client-side to proceed, if applicable. This is going to be superior to sending Response-Headers immediately followed by Trailers, which would cause the RPC to become committed on the client side (if the Response-Header metadata is made available to the client application layer) and stop retry attempts.We still can encounter the case where a server intentionally sends Response-Headers to open a stream, then eventually closes the stream with an error without ever sending any messages. Such cases would not be retryable, but I think it's fair to argue that if the server *has* to send metadata in advance of sending any responses, that metadata is actually a response, and should be treated as such (i.e., their metadata just ensured the RPC will be committed on the client-side).Rather than either explicitly disallowing such behavior by modifying some specification (this behavior is currently entirely unspecified, so while specification is worthwhile, it should be separate from the retry policy design currently under discussion), we can just change the default server behavior of C++, and Go if necessary, to match Java. In Java servers, the Response-Headers are delayed until some response message is sent. If the server application returns an error status before sending a message, then Trailers-Only is sent instead of Response-Headers.We can also leave it up to the gRPC client library implementation to decide when an RPC is committed based on received Response-Headers. If and while the client library can guarantee that the presence (or absence) of initial metadata is not visible to the client application layer, the RPC can be considered uncommitted. This is an implementation detail that should very rarely be necessary if the above change is made to default server behavior, but it would not violate anything in the retry spec or semantics.I think that leaving this unspecified will lead to interoperability problems in the future. I would rather have the spec be explicit about this, so that all future client and server implementations can interoperate cleanly.It's fair to say in the retry design that we must count an RPC as committed as soon the Response-Headers arrive, and the doc now states this explicitly.If you mean that we also need to change the gRPC spec to say *when* the server sends Response-Headers, I disagree. This is outside of the scope of a retry design. Retries will work fine whenever servers choose to send Response-Headers: since Response-Headers include initial metadata, which can contain arbitrary information, this is exactly the same from a retry perspective as the server sending any other response, and it commits the RPC. We can go so far as saying servers *should* delay sending Response-Headers until a message is sent by the server application layer, and the doc now states this explicitly.Changing the gRPC spec to say that servers *must* delay sending Response-Headers until a message is sent may be a good idea, but it is not a requirement for retries and, in my opinion, should be left to a separate discussion. The semantics and operations of a retry policy are already clear, regardless of when servers choose to send Response-Headers, and the existing spec already allows the desirable behavior for retries with the Trailers-Only frame.I agree that we don't need to say anything about whether or not the server delays sending Response-Headers until a message is sent. However, I think we should say that if the server is going to immediately signal failure without sending any messages, it should send Trailers-Only instead of Response-Headers followed by Trailers.This is in the retry gRFC doc now (https://github.com/ncteisen/proposal/blob/ad060be281c45c262e71a56e5777d26616dad69f/A6.md#when-retries-are-valid). The wire spec almost says it: "Trailers-Only is permitted for calls that produce an immediate error" (https://github.com/grpc/grpc/blob/master/doc/PROTOCOL-HTTP2.md). Do you want this changed in the wire spec itself or is the inclusion in the gRFC for retries sufficient?
On Thu, Mar 2, 2017 at 8:24 AM, Eric Gribkoff <ericgr...@google.com> wrote:On Thu, Mar 2, 2017 at 8:15 AM, Mark D. Roth <ro...@google.com> wrote:I agree that we don't need to say anything about whether or not the server delays sending Response-Headers until a message is sent. However, I think we should say that if the server is going to immediately signal failure without sending any messages, it should send Trailers-Only instead of Response-Headers followed by Trailers.This is in the retry gRFC doc now (https://github.com/ncteisen/proposal/blob/ad060be281c45c262e71a56e5777d26616dad69f/A6.md#when-retries-are-valid).
The client receives a non-error response from the server. Because of the gRPC wire specification, this will always be a Response-Headers frame containing the initial metadata.
The wire spec almost says it: "Trailers-Only is permitted for calls that produce an immediate error" (https://github.com/grpc/grpc/blob/master/doc/PROTOCOL-HTTP2.md). Do you want this changed in the wire spec itself or is the inclusion in the gRFC for retries sufficient?I think it would be good to also change the wire spec doc. We should do something like changing "is permitted" to "SHOULD be used". We may even want to specifically mention that this is important for retry functionality to work right.
On Thu, Mar 2, 2017 at 8:38 AM, Mark D. Roth <ro...@google.com> wrote:On Thu, Mar 2, 2017 at 8:24 AM, Eric Gribkoff <ericgr...@google.com> wrote:On Thu, Mar 2, 2017 at 8:15 AM, Mark D. Roth <ro...@google.com> wrote:I agree that we don't need to say anything about whether or not the server delays sending Response-Headers until a message is sent. However, I think we should say that if the server is going to immediately signal failure without sending any messages, it should send Trailers-Only instead of Response-Headers followed by Trailers.This is in the retry gRFC doc now (https://github.com/ncteisen/proposal/blob/ad060be281c45c262e71a56e5777d26616dad69f/A6.md#when-retries-are-valid).The language is still confusing:The client receives a non-error response from the server. Because of the gRPC wire specification, this will always be a Response-Headers frame containing the initial metadata.What does "non-error response" mean there? I would have expected that means receiving a Status in some way (which is part of Response), as otherwise how is "error" decided. But the next part shows that isn't the case since Status isn't in Response-Headers.
The wire spec almost says it: "Trailers-Only is permitted for calls that produce an immediate error" (https://github.com/grpc/grpc/blob/master/doc/PROTOCOL-HTTP2.md). Do you want this changed in the wire spec itself or is the inclusion in the gRFC for retries sufficient?I think it would be good to also change the wire spec doc. We should do something like changing "is permitted" to "SHOULD be used". We may even want to specifically mention that this is important for retry functionality to work right.Changing to 'should' sounds fine. Although maybe there should be a note that clients can't decide if something is an 'immediate error' so there must not be any validation for it client-side.
--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/CA%2B4M1oON-6sgSW%3DLLJZLABLm_RFCFgNb%2Bki6%2BbwJuxMMPXMxUA%40mail.gmail.com.
On Thu, Mar 2, 2017 at 9:03 AM, 'Eric Anderson' via grpc.io <grp...@googlegroups.com> wrote:On Thu, Mar 2, 2017 at 8:38 AM, Mark D. Roth <ro...@google.com> wrote:On Thu, Mar 2, 2017 at 8:24 AM, Eric Gribkoff <ericgr...@google.com> wrote:On Thu, Mar 2, 2017 at 8:15 AM, Mark D. Roth <ro...@google.com> wrote:I agree that we don't need to say anything about whether or not the server delays sending Response-Headers until a message is sent. However, I think we should say that if the server is going to immediately signal failure without sending any messages, it should send Trailers-Only instead of Response-Headers followed by Trailers.This is in the retry gRFC doc now (https://github.com/ncteisen/proposal/blob/ad060be281c45c262e71a56e5777d26616dad69f/A6.md#when-retries-are-valid).The language is still confusing:The client receives a non-error response from the server. Because of the gRPC wire specification, this will always be a Response-Headers frame containing the initial metadata.What does "non-error response" mean there? I would have expected that means receiving a Status in some way (which is part of Response), as otherwise how is "error" decided. But the next part shows that isn't the case since Status isn't in Response-Headers.The second sentence is defining what non-error response means: a Response-Headers frame. The only alternative (an "error" response) is Trailers-Only. I can chose a name other than "non-error response" to make this clear.
--The wire spec almost says it: "Trailers-Only is permitted for calls that produce an immediate error" (https://github.com/grpc/grpc/blob/master/doc/PROTOCOL-HTTP2.md). Do you want this changed in the wire spec itself or is the inclusion in the gRFC for retries sufficient?I think it would be good to also change the wire spec doc. We should do something like changing "is permitted" to "SHOULD be used". We may even want to specifically mention that this is important for retry functionality to work right.Changing to 'should' sounds fine. Although maybe there should be a note that clients can't decide if something is an 'immediate error' so there must not be any validation for it client-side.
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/CA%2B4M1oON-6sgSW%3DLLJZLABLm_RFCFgNb%2Bki6%2BbwJuxMMPXMxUA%40mail.gmail.com.
On Thu, Mar 2, 2017 at 9:13 AM, Eric Gribkoff <ericgr...@google.com> wrote:On Thu, Mar 2, 2017 at 9:03 AM, 'Eric Anderson' via grpc.io <grp...@googlegroups.com> wrote:The language is still confusing:The client receives a non-error response from the server. Because of the gRPC wire specification, this will always be a Response-Headers frame containing the initial metadata.What does "non-error response" mean there? I would have expected that means receiving a Status in some way (which is part of Response), as otherwise how is "error" decided. But the next part shows that isn't the case since Status isn't in Response-Headers.The second sentence is defining what non-error response means: a Response-Headers frame. The only alternative (an "error" response) is Trailers-Only. I can chose a name other than "non-error response" to make this clear.It would probably be simpler to simply say "The RPC is committed when the client receives Response-Headers."
gRPC servers must delay sending Response-Headers until the server's first response (a Length-Prefixed-Message) is to be sent on the stream.
To avoid unnecessarily committing an RPC on the client, gRPC servers must delay sending Response-Headers until the server's first response (a Length-Prefixed-Message) is to be sent on the stream.
If Response-Headers are always immediately sent, retries will never be possible. Hence, gRPC servers should delay the Response-Header to avoid unnecessarily committing an RPC. Once Response-Headers are sent, retries will not be possible.
--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/CALUXJ7hGct2GioJwr-35q-oqEo%2BoE7TG3JLntqCBK_hN1-iBYg%40mail.gmail.com.
Your quote is missing the first part of the sentence.To avoid unnecessarily committing an RPC on the client, gRPC servers must delay sending Response-Headers until the server's first response (a Length-Prefixed-Message) is to be sent on the stream.The intent was "in order to achieve A, you must do B.", not you must always do B. If Response-Headers are always immediately sent, retries will never be possible. Hence, gRPC servers "must" delay the Response-Header to avoid unnecessarily committing an RPC.
Using should here instead would almost convey the same message, but needs further qualification.
How about:If Response-Headers are always immediately sent, retries will never be possible.
Hence, gRPC servers should delay the Response-Header to avoid unnecessarily committing an RPC.
Once Response-Headers are sent, retries will not be possible.
gRPC servers should delay the Response-Headers until the first response message or until the application code chooses to send headers. If the application code closes the stream with an error before sending headers or any response messages, gRPC servers should send the error in Trailers-Only.
gRPC servers should delay the Response-Headers until the first response message or until the application code chooses to send headers. If the application code closes the stream with an error before sending headers or any response messages, gRPC servers should send the error in Trailers-Only.
The value for this field will be a human-readable integer.
I've created a gRFC describing the design and implementation plan for gRPC Retries.Take a look at the gRPC on Github.
--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/30e29cbc-439c-46c4-b54f-6e97637a0735%40googlegroups.com.
--I've created a gRFC describing the design and implementation plan for gRPC Retries.Take a look at the gRPC on Github.
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/30e29cbc-439c-46c4-b54f-6e97637a0735%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
After much discussion with the DNS and security folks, we've decided on a way to address the potential security issue of allowing an attacker to inject a service config with a large number of retries or hedged requests. We will do this by imposing an upper bound on the max number of retries or hedged requests that are configurable via the service config. That upper bound will be 5 by default, but applications will be able to explicitly override it if needed via a channel argument.This approach not only limits the damage that can be caused by a malicious attacker but also damage that can be caused by a simple typo.Noah, can you please add a section about this to the design doc? Thanks!
I've created a gRFC describing the design and implementation plan for gRPC Retries.Take a look at the gRPC on Github.
--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/30e29cbc-439c-46c4-b54f-6e97637a0735%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/d4134c00-fea6-4e23-a136-5ae730c261e6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/ac930e31-8f6c-4437-b880-47c0694661e8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
When gRPC receives a non-OK response status from a server, this status is checked against the set of retryable status codes in retryableStatusCodes to determine if a retry attempt should be made.
--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/b78c2861-49ea-4fe3-a0dd-70e5ed199432%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to a topic in the Google Groups "grpc.io" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/grpc-io/zzHIICbwTZE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to grpc-io+u...@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/CAJgPXp7_9vhjoJEy%2Bb-t%2B70ooZwbZ8FWZte2wiL93M1LAAN6hg%40mail.gmail.com.
I've created a gRFC describing the design and implementation plan for gRPC Retries.
Take a look at the gRPC on Github.
--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/d39a80df-619b-41f2-bf83-37844ad08f19%40googlegroups.com.
Hi Mark,
Thanks for the update
> The current implementation in C-core is only partially complete. The basic retry code is there, but there is still an outstanding design question of how we handle stats for retries, and there is not yet any support for transparent retries nor for hedging. And even the basic retry code is extremely invasive and has not yet received any production testing, so there are probably numerous bugs waiting to be found.
What do you mean by "there is not yet any support for transparent
retries"? I thought hat retries were done under the hood transparently
when some conditions are met - like using the response codes.
Regarding the stats thing is it related to having the capacity of
retrying when the number of "errors" is lower than a specific
threshold?
Thanks for the advice on not using this in production. Does it mean
that gRPC community still believe that the way to go for having a all
of the needs covered for calling external dependencies - retrying,
circuit breakers, etc - it's by implementing their own wrappers on top
of the gRPC clients? What is being done within google right now?
Thanks!
--
--pau
--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/4c444986-66c1-4a6a-b551-b7cfdc3935deo%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/33fc1235-073a-48ca-b387-b964b74e366fn%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/6319a287-42c4-424d-8f40-3dfd97a8be5dn%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/4c3a51ea-4609-4ac5-b3e7-5dea86fee824n%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/CAJgPXp44%3DNF95XfmGCk4nzBCuO89HaR886XSKeYZMCC-i6m3Fg%40mail.gmail.com.