--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/CAJgPXp5VBE%3DBVJq8JKXAdKV%3D3-nrjFDHFd0sTwUW%3DGOr%2B3q6Tw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
How does the percentage field work?Do clients roll a die to determine if they're in the canary subset? Or is there a deterministic way of determining this?
I've created a gRFC describing how service configs will be encoded in DNS:https://github.com/grpc/proposal/pull/5I'd welcome feedback, especially on the proposed use of TXT records.Please keep discussion in this thread. Thanks!--
--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/CAJgPXp5VBE%3DBVJq8JKXAdKV%3D3-nrjFDHFd0sTwUW%3DGOr%2B3q6Tw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/CAAvp3oM7boP7X4GeGXmXf0C5pHCBSXZ8_vCF7jFGwceHyjDbDg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
It's obviously going to have to be a heuristic, since we don't have any way of knowing the full set of clients a priori. I was thinking that we would take a hash of the client's hostname and pid, which unfortunately wouldn't really be that deterministic. But I'd welcome suggestions for a more deterministic algorithm.
How does the percentage field work?Do clients roll a die to determine if they're in the canary subset? Or is there a deterministic way of determining this?
I've created a gRFC describing how service configs will be encoded in DNS:https://github.com/grpc/proposal/pull/5I'd welcome feedback, especially on the proposed use of TXT records.Please keep discussion in this thread. Thanks!--
--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/CAJgPXp5VBE%3DBVJq8JKXAdKV%3D3-nrjFDHFd0sTwUW%3DGOr%2B3q6Tw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/CAAvp3oM7boP7X4GeGXmXf0C5pHCBSXZ8_vCF7jFGwceHyjDbDg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Initial thoughts:* percentage needs to be declared to be an integer (as opposed to number). This will make it consistent internally and externally.
* TXT records are limitted to ASCII chars. What will happen if the method name, programming language, or load balancing policy is not pure ascii?
* Are TXT records for a superdomain applicable? For example, if there was a SC for foo.bar.com, but not sub.foo.bar.com, does it apply?
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/ccc7f85d-3f0d-4f59-a438-cedd6c8dd074%40googlegroups.com.
On Fri, Jan 20, 2017 at 9:47 AM, 'Carl Mastrangelo' via grpc.io <grp...@googlegroups.com> wrote:Initial thoughts:* percentage needs to be declared to be an integer (as opposed to number). This will make it consistent internally and externally.Done. (There's no reason to require that these fields be consistent internally and externally, since they're each going to be read by independent resolver implementations. But I agree that integer makes more sense.)* TXT records are limitted to ASCII chars. What will happen if the method name, programming language, or load balancing policy is not pure ascii?Good question. As a data point, do we know if protobuf allows non-ASCII chars in service or method names?
* Are TXT records for a superdomain applicable? For example, if there was a SC for foo.bar.com, but not sub.foo.bar.com, does it apply?No, the name has to exactly match the server name given to the client. Otherwise, we'd need to make a bunch of additional DNS lookups for each server name.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/ccc7f85d-3f0d-4f59-a438-cedd6c8dd074%40googlegroups.com.
Initial thoughts:* percentage needs to be declared to be an integer (as opposed to number). This will make it consistent internally and externally.
* TXT records are limitted to ASCII chars. What will happen if the method name, programming language, or load balancing policy is not pure ascii?
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/ccc7f85d-3f0d-4f59-a438-cedd6c8dd074%40googlegroups.com.
On Monday, January 23, 2017 at 7:31:51 AM UTC-8, Mark D. Roth wrote:On Fri, Jan 20, 2017 at 9:47 AM, 'Carl Mastrangelo' via grpc.io <grp...@googlegroups.com> wrote:Initial thoughts:* percentage needs to be declared to be an integer (as opposed to number). This will make it consistent internally and externally.Done. (There's no reason to require that these fields be consistent internally and externally, since they're each going to be read by independent resolver implementations. But I agree that integer makes more sense.)* TXT records are limitted to ASCII chars. What will happen if the method name, programming language, or load balancing policy is not pure ascii?Good question. As a data point, do we know if protobuf allows non-ASCII chars in service or method names?I don't think protobuf allows it for method or service names that it generates, but protobuf may not be used as the IDL. Also, the other fields wouldn't be affected by protobuf restrictions.
* Are TXT records for a superdomain applicable? For example, if there was a SC for foo.bar.com, but not sub.foo.bar.com, does it apply?No, the name has to exactly match the server name given to the client. Otherwise, we'd need to make a bunch of additional DNS lookups for each server name.What about FQDNs? Does "foo.bar.com" match "foo.bar.com." ?
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/d88e0282-26f6-45d9-afa2-b560c16bbd24%40googlegroups.com.
On Fri, Jan 20, 2017 at 12:47 PM, 'Carl Mastrangelo' via grpc.io <grp...@googlegroups.com> wrote:Initial thoughts:* percentage needs to be declared to be an integer (as opposed to number). This will make it consistent internally and externally.What if 1% is too many? Can we have a deci-percent? I'm sure Mark will be thrilled to deal with deci-quantities. ;-)More in general, is this "pull" design set in stone? By that, I mean that clients generate a random number and figure if they fall e.g. in the 1% that should try the new configuration. There's no guarantee that any clients will pick it up (or that 10 out of 100, i.e. 10%, won't).
I started a similar discussion on canarying ConfigMaps under Kubernetes. While this design is the way at least one push mechanism inside Google works, there's also a more predictable, push-like one (GDP's), where you pick N candidates, tell them to get a new config and watch them for T minutes, making sure they don't die. That, of course, assumes you keep track of the clients, through e.g. grpclb, and can also track their health (Borg and Kubernetes both do). Would you consider that for future developments?
* TXT records are limitted to ASCII chars. What will happen if the method name, programming language, or load balancing policy is not pure ascii?To also tackle the 65K TXT record size limit, there's the old hack used by a service that ended up with enormous command lines and ran into the Linux 131072 byte command line limit: compress the whole thing, then encode that stream in Base64 or similar.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/CAPzY348RNufro5TR%2BWneYSA%2BBPDVfPcMqt8YPL5XLY2f8UVUtw%40mail.gmail.com.
On Mon, Jan 23, 2017 at 2:54 PM, 'Carl Mastrangelo' via grpc.io <grp...@googlegroups.com> wrote:
On Monday, January 23, 2017 at 7:31:51 AM UTC-8, Mark D. Roth wrote:On Fri, Jan 20, 2017 at 9:47 AM, 'Carl Mastrangelo' via grpc.io <grp...@googlegroups.com> wrote:Initial thoughts:* percentage needs to be declared to be an integer (as opposed to number). This will make it consistent internally and externally.Done. (There's no reason to require that these fields be consistent internally and externally, since they're each going to be read by independent resolver implementations. But I agree that integer makes more sense.)* TXT records are limitted to ASCII chars. What will happen if the method name, programming language, or load balancing policy is not pure ascii?Good question. As a data point, do we know if protobuf allows non-ASCII chars in service or method names?I don't think protobuf allows it for method or service names that it generates, but protobuf may not be used as the IDL. Also, the other fields wouldn't be affected by protobuf restrictions.Right. I was just asking about this as one data point.The simplest solution would be to impose a restriction that these fields all have to be ASCII. Are there any problems with that approach?
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/d88e0282-26f6-45d9-afa2-b560c16bbd24%40googlegroups.com.
On Tuesday, January 24, 2017 at 10:21:20 AM UTC-8, Mark D. Roth wrote:On Mon, Jan 23, 2017 at 2:54 PM, 'Carl Mastrangelo' via grpc.io <grp...@googlegroups.com> wrote:
On Monday, January 23, 2017 at 7:31:51 AM UTC-8, Mark D. Roth wrote:On Fri, Jan 20, 2017 at 9:47 AM, 'Carl Mastrangelo' via grpc.io <grp...@googlegroups.com> wrote:Initial thoughts:* percentage needs to be declared to be an integer (as opposed to number). This will make it consistent internally and externally.Done. (There's no reason to require that these fields be consistent internally and externally, since they're each going to be read by independent resolver implementations. But I agree that integer makes more sense.)* TXT records are limitted to ASCII chars. What will happen if the method name, programming language, or load balancing policy is not pure ascii?Good question. As a data point, do we know if protobuf allows non-ASCII chars in service or method names?I don't think protobuf allows it for method or service names that it generates, but protobuf may not be used as the IDL. Also, the other fields wouldn't be affected by protobuf restrictions.Right. I was just asking about this as one data point.The simplest solution would be to impose a restriction that these fields all have to be ASCII. Are there any problems with that approach?I don't think there are any problems, just would like it called out in the spec. The restriction isn't obvious from the spec.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/e29aed3d-c539-4efd-bbff-d15f21429899%40googlegroups.com.
I started a similar discussion on canarying ConfigMaps under Kubernetes. While this design is the way at least one push mechanism inside Google works, there's also a more predictable, push-like one (GDP's), where you pick N candidates, tell them to get a new config and watch them for T minutes, making sure they don't die. That, of course, assumes you keep track of the clients, through e.g. grpclb, and can also track their health (Borg and Kubernetes both do). Would you consider that for future developments?The problem is that we have no mechanism for tracking clients that way. Tracking them via grpclb won't work, because clients should be able to use the service config without using grpclb. And more generally, any tracking mechanism like this would require a lot of communication between servers and clients, which would add a lot of complexity. I don't think that the advantage of getting slightly more accurate canary percentages is really worth that complexity.
I'd prefer to avoid that if possible, because I think it's valuable for debugging purposes to have the TXT record in human-readable form. However, we can certainly add support for something like this if/when people start running into the length limitation.
On Tue, Jan 24, 2017 at 1:37 PM, Mark D. Roth <ro...@google.com> wrote:I started a similar discussion on canarying ConfigMaps under Kubernetes. While this design is the way at least one push mechanism inside Google works, there's also a more predictable, push-like one (GDP's), where you pick N candidates, tell them to get a new config and watch them for T minutes, making sure they don't die. That, of course, assumes you keep track of the clients, through e.g. grpclb, and can also track their health (Borg and Kubernetes both do). Would you consider that for future developments?The problem is that we have no mechanism for tracking clients that way. Tracking them via grpclb won't work, because clients should be able to use the service config without using grpclb. And more generally, any tracking mechanism like this would require a lot of communication between servers and clients, which would add a lot of complexity. I don't think that the advantage of getting slightly more accurate canary percentages is really worth that complexity.Sorry, I don't know what I was thinking when I mentioned grpclb. And I should have explained things further.For additional background, I also created an issue for TXT record support in Kubernetes' kube-dns, citing gRPC as an use case: https://github.com/kubernetes/dns/issues/38My idea is that there would be a 1:1 mapping between a Kubernetes service and a gRPC service, something that developers seem to like. For example, you'd point clients to mysvc.myns.svc.mycluster.example.com (or a shorter relative name). That's a DNS hostname managed by kube-dns. With the new support, it would return the appropriate TXT record. More in detail:There's a controller (daemon) in the Kubernetes cluster in charge of pushing gRPC config changes. It sends an update request on the mysvc service object. kube-dns, also running in the cluster, sees the object change and starts serving new TXT records. So far, gRPC itself doesn't need to be involved, i.e. there's no special protocol between servers and clients (or grpclb). Clients keep using DNS.With your current design, the controller pushes a config with e.g. a 1% canary. Accurate percentages is not even the main issue; determinism and proper health check tracking are. Assuming that exactly 1% of clients pick up the change, now you have the issue of figuring if a replica that just crashed did it because of the new config or for unrelated reasons. Engineers really hate it when a configuration change gets rolled back automatically because of a random external event. Conversely, when an unhealthy service is seeing an elevated crash rate and you try to push a new config to stop the bleeding, it's very valuable to know if the new settings are making instances stable again.In both cases, you need to track which configuration each client is using. You could do this through e.g. monitoring (each client reports the current config version) and correlate that with health checks.Or, more simply, you could have the controller pick up N victims, push a config with a field along the lines of`"hosts": "host1:port,host2:port,..."`wait n=TTL seconds and then keep tracking the health status for those clients. This is all done in the controller, which is responsible for discovery, tracking clients through the proper APIs. The example above involves Kubernetes, but, in general, the same mechanism applies to every other environment. The only client change is for the grpc client to match its own host:port against "hosts". A proper config should have either "percentage" or "hosts", not both. Or maybe the latter always wins. The idea is that you canary with a small number of clients, then switch to percentages, so that the config doesn't get bloated with tens or hundreds of hostnames.This approach has also the advantage that, if you have N groups of clients (e.g. three different frontend services that talk to a shared database), you can treat them and push to each of them independently. "clientLanguage" might be too coarse, especially when all your clients are in the same language. :-)
I'd prefer to avoid that if possible, because I think it's valuable for debugging purposes to have the TXT record in human-readable form. However, we can certainly add support for something like this if/when people start running into the length limitation.I agree. It's really ugly and a true last resort to be implemented only when the actual need arises.--Rudi Chiarito — Infrastructure — Clarifai, Inc."Trust me, I know what I'm doing." (Sledge Hammer!)
We might be able to ameliorate some of that by allowing some sort of simple pattern-matching language, although I'd prefer to avoid taking an external dependency on a regexp library, so it would probably need to be something very simple -- like maybe simple wildcard matching triggered by a '*' character. So, for example, if your three different frontend services run on hosts with the names as follows:Frontend service "Foo": foofrontend1, foofrontend2, foofrontend3, ...Frontend service "Bar": barfrontend1, barfrontend2, barfrontend3, ...Frontend service "Baz": bazfrontend1, bazfrontend2, bazfrontend3, ...Then you could select only the frontends from the first service by saying something like "foofrontend*". But we probably would not allow something like "foofrontend[12]" to get just the first two frontends of service "Foo"; instead, you would need to list them separately. Would something like that be useful in your use case?
Do we actually want to select the client port in any of these cases? I'm not sure that's useful, since the client port would presumably be different for each backend it's connected to, and it would change any time it reconnected to a given backend. Is there a use-case where selecting on the client port is useful?
In terms of how this is encoded in JSON, I would probably want it to be a list of strings rather than a single string with a delimiter character. In other words, instead of 'hosts': 'host1,host2,...', it would be something like 'hosts': ['host1','host2',...].What do you think?
On Thu, Jan 26, 2017 at 12:52 PM, 'Mark D. Roth' via grpc.io <grp...@googlegroups.com> wrote:We might be able to ameliorate some of that by allowing some sort of simple pattern-matching language, although I'd prefer to avoid taking an external dependency on a regexp library, so it would probably need to be something very simple -- like maybe simple wildcard matching triggered by a '*' character. So, for example, if your three different frontend services run on hosts with the names as follows:Frontend service "Foo": foofrontend1, foofrontend2, foofrontend3, ...Frontend service "Bar": barfrontend1, barfrontend2, barfrontend3, ...Frontend service "Baz": bazfrontend1, bazfrontend2, bazfrontend3, ...Then you could select only the frontends from the first service by saying something like "foofrontend*". But we probably would not allow something like "foofrontend[12]" to get just the first two frontends of service "Foo"; instead, you would need to list them separately. Would something like that be useful in your use case?In our use case, pods (think Borg tasks) typically have their own IP and hostname, like myservice-RANDOMFIVELETTERHASH. The only exception are StatefulSets, an abstraction that lets pods in a replica set have unique per-instance settings, including, in this case, pod names: myservice-0, myservice-1, etc.So the wildcard syntax would be of some use to us, while the [12] regex one wouldn't be yet, until the day we actually start using StatefulSets. I'm sure you'll find others that have more control over their IPs and would be interested. Even with StatefulSets, I don't think we'd use the feature, because they are typically used for services that have a small number of replicas.The subnet notation is interesting, too, but in our case, IP addresses are, for most purposes, random. Plus, most people don't want to think about addresses. :-)I would imagine that, in terms of scenarios covered, you'd probably see diminishing returns, from highest to lowest:- raw list- trailing wildcards- free wildcards (anywhere in the string)- regexes- network masksI would implement them in that order, but only when actual user demand materialises.
I agree with you that bloat is not just a hypothetical concern. That's why I suggested that pushes progress e.g. from one host, to two hosts, then three, then through percentages. One twist I forgot to mention is that once a client has been picked through explicit mention, the config should be sticky, i.e. it should keep the new one and shouldn't roll the dice when the config changes to a percentage. I guess you would have the same issue when you ramp from e.g. 1% to 10%: do you really want a client to potentially alternate between new and old config whenever the percentage changes? Unless you require people to always canary at only X%, then go straight to 100%.
Do we actually want to select the client port in any of these cases? I'm not sure that's useful, since the client port would presumably be different for each backend it's connected to, and it would change any time it reconnected to a given backend. Is there a use-case where selecting on the client port is useful?I guess this would be theoretically useful if you run e.g. four different client processes on the same host. But I was really thinking of server host:ports, which are the ones that get advertised and discovered. Client ports are not advertised and are usually random, as you point out, unless you explicitly bind to them. So ignore the port part of my comment. (At some point, though, someone else will come up with the scenario I just described.)
In terms of how this is encoded in JSON, I would probably want it to be a list of strings rather than a single string with a delimiter character. In other words, instead of 'hosts': 'host1,host2,...', it would be something like 'hosts': ['host1','host2',...].What do you think?That sounds like a good start. Perhaps we can reserve the right in the future to add ports if enough people show compelling uses for it, but for now we don't parse them or use them, only mention that in docs.
Thanks!--Rudi Chiarito — Infrastructure — Clarifai, Inc."Trust me, I know what I'm doing." (Sledge Hammer!)
--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/CAPzY349UjDMBJXY-H5Zzkt%2BQTHkFCgpHqFq-hTBxFQ9O9eMyxA%40mail.gmail.com.