gRFC A2: Service Configs in DNS

Mark D. Roth

unread,

Jan 19, 2017, 11:57:54 AM1/19/17

to grpc. io, Abhishek Kumar, Yuchen Zeng, Carl Mastrangelo

I've created a gRFC describing how service configs will be encoded in DNS:

https://github.com/grpc/proposal/pull/5

I'd welcome feedback, especially on the proposed use of TXT records.

Please keep discussion in this thread. Thanks!

--

Mark D. Roth <ro...@google.com>
Software Engineer
Google, Inc.

Craig Tiller

unread,

Jan 19, 2017, 12:18:57 PM1/19/17

to Mark D. Roth, grpc. io, Abhishek Kumar, Yuchen Zeng, Carl Mastrangelo

How does the percentage field work?

Do clients roll a die to determine if they're in the canary subset? Or is there a deterministic way of determining this?

--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/CAJgPXp5VBE%3DBVJq8JKXAdKV%3D3-nrjFDHFd0sTwUW%3DGOr%2B3q6Tw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Mark D. Roth

unread,

Jan 19, 2017, 2:42:29 PM1/19/17

to Craig Tiller, grpc. io, Abhishek Kumar, Yuchen Zeng, Carl Mastrangelo

It's obviously going to have to be a heuristic, since we don't have any way of knowing the full set of clients a priori. I was thinking that we would take a hash of the client's hostname and pid, which unfortunately wouldn't really be that deterministic. But I'd welcome suggestions for a more deterministic algorithm.

On Thu, Jan 19, 2017 at 9:18 AM, 'Craig Tiller' via grpc.io <grp...@googlegroups.com> wrote:

How does the percentage field work?

Do clients roll a die to determine if they're in the canary subset? Or is there a deterministic way of determining this?

On Thu, Jan 19, 2017 at 8:57 AM 'Mark D. Roth' via grpc.io <grp...@googlegroups.com> wrote:

I've created a gRFC describing how service configs will be encoded in DNS:

https://github.com/grpc/proposal/pull/5

I'd welcome feedback, especially on the proposed use of TXT records.

Please keep discussion in this thread. Thanks!

--
Mark D. Roth <ro...@google.com>
Software Engineer
Google, Inc.

--
You received this message because you are subscribed to the Google Groups "grpc.io" group.

To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.

To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/CAJgPXp5VBE%3DBVJq8JKXAdKV%3D3-nrjFDHFd0sTwUW%3DGOr%2B3q6Tw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--

You received this message because you are subscribed to the Google Groups "grpc.io" group.

To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.

To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.

To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/CAAvp3oM7boP7X4GeGXmXf0C5pHCBSXZ8_vCF7jFGwceHyjDbDg%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

Carl Mastrangelo

unread,

Jan 20, 2017, 12:47:17 PM1/20/17

to grpc.io, cti...@google.com, abhi...@google.com, z...@google.com, not...@google.com

Initial thoughts:

* percentage needs to be declared to be an integer (as opposed to number). This will make it consistent internally and externally.

* TXT records are limitted to ASCII chars. What will happen if the method name, programming language, or load balancing policy is not pure ascii?

* Are TXT records for a superdomain applicable? For example, if there was a SC for foo.bar.com, but not sub.foo.bar.com, does it apply?

On Thursday, January 19, 2017 at 11:42:29 AM UTC-8, Mark D. Roth wrote:

It's obviously going to have to be a heuristic, since we don't have any way of knowing the full set of clients a priori. I was thinking that we would take a hash of the client's hostname and pid, which unfortunately wouldn't really be that deterministic. But I'd welcome suggestions for a more deterministic algorithm.

On Thu, Jan 19, 2017 at 9:18 AM, 'Craig Tiller' via grpc.io <grp...@googlegroups.com> wrote:

How does the percentage field work?

Do clients roll a die to determine if they're in the canary subset? Or is there a deterministic way of determining this?

On Thu, Jan 19, 2017 at 8:57 AM 'Mark D. Roth' via grpc.io <grp...@googlegroups.com> wrote:

I've created a gRFC describing how service configs will be encoded in DNS:

https://github.com/grpc/proposal/pull/5

I'd welcome feedback, especially on the proposed use of TXT records.

Please keep discussion in this thread. Thanks!

--
Mark D. Roth <ro...@google.com>
Software Engineer
Google, Inc.

--
You received this message because you are subscribed to the Google Groups "grpc.io" group.

To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.

To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/CAJgPXp5VBE%3DBVJq8JKXAdKV%3D3-nrjFDHFd0sTwUW%3DGOr%2B3q6Tw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "grpc.io" group.

To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.

To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/CAAvp3oM7boP7X4GeGXmXf0C5pHCBSXZ8_vCF7jFGwceHyjDbDg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Mark D. Roth

unread,

Jan 23, 2017, 10:31:51 AM1/23/17

to Carl Mastrangelo, grpc.io, Craig Tiller, Abhishek Kumar, Yuchen Zeng

On Fri, Jan 20, 2017 at 9:47 AM, 'Carl Mastrangelo' via grpc.io <grp...@googlegroups.com> wrote:

Initial thoughts:

* percentage needs to be declared to be an integer (as opposed to number). This will make it consistent internally and externally.

Done. (There's no reason to require that these fields be consistent internally and externally, since they're each going to be read by independent resolver implementations. But I agree that integer makes more sense.)

* TXT records are limitted to ASCII chars. What will happen if the method name, programming language, or load balancing policy is not pure ascii?

Good question. As a data point, do we know if protobuf allows non-ASCII chars in service or method names?

* Are TXT records for a superdomain applicable? For example, if there was a SC for foo.bar.com, but not sub.foo.bar.com, does it apply?

No, the name has to exactly match the server name given to the client. Otherwise, we'd need to make a bunch of additional DNS lookups for each server name.

To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.

To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.

To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/ccc7f85d-3f0d-4f59-a438-cedd6c8dd074%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Carl Mastrangelo

unread,

Jan 23, 2017, 5:54:27 PM1/23/17

to grpc.io, not...@google.com, cti...@google.com, abhi...@google.com, z...@google.com

On Monday, January 23, 2017 at 7:31:51 AM UTC-8, Mark D. Roth wrote:

On Fri, Jan 20, 2017 at 9:47 AM, 'Carl Mastrangelo' via grpc.io <grp...@googlegroups.com> wrote:
Initial thoughts:

* percentage needs to be declared to be an integer (as opposed to number). This will make it consistent internally and externally.

Done. (There's no reason to require that these fields be consistent internally and externally, since they're each going to be read by independent resolver implementations. But I agree that integer makes more sense.)

* TXT records are limitted to ASCII chars. What will happen if the method name, programming language, or load balancing policy is not pure ascii?

Good question. As a data point, do we know if protobuf allows non-ASCII chars in service or method names?

I don't think protobuf allows it for method or service names that it generates, but protobuf may not be used as the IDL. Also, the other fields wouldn't be affected by protobuf restrictions.

* Are TXT records for a superdomain applicable? For example, if there was a SC for foo.bar.com, but not sub.foo.bar.com, does it apply?

No, the name has to exactly match the server name given to the client. Otherwise, we'd need to make a bunch of additional DNS lookups for each server name.

What about FQDNs? Does "foo.bar.com" match "foo.bar.com." ?

To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/ccc7f85d-3f0d-4f59-a438-cedd6c8dd074%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Rudi Chiarito

unread,

Jan 23, 2017, 10:21:47 PM1/23/17

to Carl Mastrangelo, grpc.io, cti...@google.com, abhi...@google.com, z...@google.com

On Fri, Jan 20, 2017 at 12:47 PM, 'Carl Mastrangelo' via grpc.io <grp...@googlegroups.com> wrote:

Initial thoughts:

* percentage needs to be declared to be an integer (as opposed to number). This will make it consistent internally and externally.

What if 1% is too many? Can we have a deci-percent? I'm sure Mark will be thrilled to deal with deci-quantities. ;-)

More in general, is this "pull" design set in stone? By that, I mean that clients generate a random number and figure if they fall e.g. in the 1% that should try the new configuration. There's no guarantee that any clients will pick it up (or that 10 out of 100, i.e. 10%, won't). I started a similar discussion on canarying ConfigMaps under Kubernetes. While this design is the way at least one push mechanism inside Google works, there's also a more predictable, push-like one (GDP's), where you pick N candidates, tell them to get a new config and watch them for T minutes, making sure they don't die. That, of course, assumes you keep track of the clients, through e.g. grpclb, and can also track their health (Borg and Kubernetes both do). Would you consider that for future developments?

* TXT records are limitted to ASCII chars. What will happen if the method name, programming language, or load balancing policy is not pure ascii?

To also tackle the 65K TXT record size limit, there's the old hack used by a service that ended up with enormous command lines and ran into the Linux 131072 byte command line limit: compress the whole thing, then encode that stream in Base64 or similar.

To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.

To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.

To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/ccc7f85d-3f0d-4f59-a438-cedd6c8dd074%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Rudi Chiarito — Infrastructure — Clarifai, Inc.

"Trust me, I know what I'm doing." (Sledge Hammer!)

Mark D. Roth

unread,

Jan 24, 2017, 1:21:20 PM1/24/17

to Carl Mastrangelo, grpc.io, Craig Tiller, Abhishek Kumar, Yuchen Zeng

On Mon, Jan 23, 2017 at 2:54 PM, 'Carl Mastrangelo' via grpc.io <grp...@googlegroups.com> wrote:

On Monday, January 23, 2017 at 7:31:51 AM UTC-8, Mark D. Roth wrote:
On Fri, Jan 20, 2017 at 9:47 AM, 'Carl Mastrangelo' via grpc.io <grp...@googlegroups.com> wrote:
Initial thoughts:

* percentage needs to be declared to be an integer (as opposed to number). This will make it consistent internally and externally.

Done. (There's no reason to require that these fields be consistent internally and externally, since they're each going to be read by independent resolver implementations. But I agree that integer makes more sense.)

* TXT records are limitted to ASCII chars. What will happen if the method name, programming language, or load balancing policy is not pure ascii?

Good question. As a data point, do we know if protobuf allows non-ASCII chars in service or method names?

I don't think protobuf allows it for method or service names that it generates, but protobuf may not be used as the IDL. Also, the other fields wouldn't be affected by protobuf restrictions.

Right. I was just asking about this as one data point.

The simplest solution would be to impose a restriction that these fields all have to be ASCII. Are there any problems with that approach?

* Are TXT records for a superdomain applicable? For example, if there was a SC for foo.bar.com, but not sub.foo.bar.com, does it apply?

No, the name has to exactly match the server name given to the client. Otherwise, we'd need to make a bunch of additional DNS lookups for each server name.

What about FQDNs? Does "foo.bar.com" match "foo.bar.com." ?

The presence or absence of the trailing dot affects how DNS is searched for a match, but the same search rules apply for the addresses as for the TXT records -- if you find one, you find them both.

To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.

To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.

To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/d88e0282-26f6-45d9-afa2-b560c16bbd24%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Mark D. Roth

unread,

Jan 24, 2017, 1:37:07 PM1/24/17

to Rudi Chiarito, Carl Mastrangelo, grpc.io, Craig Tiller, Abhishek Kumar, Yuchen Zeng

On Mon, Jan 23, 2017 at 7:21 PM, Rudi Chiarito <ru...@clarifai.com> wrote:

On Fri, Jan 20, 2017 at 12:47 PM, 'Carl Mastrangelo' via grpc.io <grp...@googlegroups.com> wrote:
Initial thoughts:

* percentage needs to be declared to be an integer (as opposed to number). This will make it consistent internally and externally.

What if 1% is too many? Can we have a deci-percent? I'm sure Mark will be thrilled to deal with deci-quantities. ;-)

More in general, is this "pull" design set in stone? By that, I mean that clients generate a random number and figure if they fall e.g. in the 1% that should try the new configuration. There's no guarantee that any clients will pick it up (or that 10 out of 100, i.e. 10%, won't).

It's true that there's no guarantee. However, the service owner should be able to estimate how many clients they have (at least as an order of magnitude), which should allow setting the percentage to a reasonable value for getting the desired subset of canary clients.

For services with a large number of clients, even if we don't hit the target percentage exactly, we'll likely be close enough that it makes no difference. For example, if there are 1000 clients and we set the percentage to 10, then it's probably fine if the actual number of canary clients is anywhere between 80 and 120 (which it very likely will be with any sort of decent hash algorithm).

For services with a very small number of clients, service owners will need to carefully set the percentage such that they can be confident that at least one client is selected. For example, if there are 10 clients and we set the percentage to 10, it may not select any clients, but you can set it to 20 or 30 instead to avoid that. However, at this kind of small scale, if you need greater precision, it's probably not that hard to directly coordinate changes to individual clients (i.e., it nullifies some of the advantages of the service config mechanism anyway).

I started a similar discussion on canarying ConfigMaps under Kubernetes. While this design is the way at least one push mechanism inside Google works, there's also a more predictable, push-like one (GDP's), where you pick N candidates, tell them to get a new config and watch them for T minutes, making sure they don't die. That, of course, assumes you keep track of the clients, through e.g. grpclb, and can also track their health (Borg and Kubernetes both do). Would you consider that for future developments?

The problem is that we have no mechanism for tracking clients that way. Tracking them via grpclb won't work, because clients should be able to use the service config without using grpclb. And more generally, any tracking mechanism like this would require a lot of communication between servers and clients, which would add a lot of complexity. I don't think that the advantage of getting slightly more accurate canary percentages is really worth that complexity.

* TXT records are limitted to ASCII chars. What will happen if the method name, programming language, or load balancing policy is not pure ascii?

To also tackle the 65K TXT record size limit, there's the old hack used by a service that ended up with enormous command lines and ran into the Linux 131072 byte command line limit: compress the whole thing, then encode that stream in Base64 or similar.

I'd prefer to avoid that if possible, because I think it's valuable for debugging purposes to have the TXT record in human-readable form. However, we can certainly add support for something like this if/when people start running into the length limitation.

To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/CAPzY348RNufro5TR%2BWneYSA%2BBPDVfPcMqt8YPL5XLY2f8UVUtw%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

Carl Mastrangelo

unread,

Jan 24, 2017, 5:50:31 PM1/24/17

to grpc.io, not...@google.com, cti...@google.com, abhi...@google.com, z...@google.com

On Tuesday, January 24, 2017 at 10:21:20 AM UTC-8, Mark D. Roth wrote:

On Mon, Jan 23, 2017 at 2:54 PM, 'Carl Mastrangelo' via grpc.io <grp...@googlegroups.com> wrote:

On Monday, January 23, 2017 at 7:31:51 AM UTC-8, Mark D. Roth wrote:
On Fri, Jan 20, 2017 at 9:47 AM, 'Carl Mastrangelo' via grpc.io <grp...@googlegroups.com> wrote:
Initial thoughts:

* percentage needs to be declared to be an integer (as opposed to number). This will make it consistent internally and externally.

Done. (There's no reason to require that these fields be consistent internally and externally, since they're each going to be read by independent resolver implementations. But I agree that integer makes more sense.)

* TXT records are limitted to ASCII chars. What will happen if the method name, programming language, or load balancing policy is not pure ascii?

Good question. As a data point, do we know if protobuf allows non-ASCII chars in service or method names?

I don't think protobuf allows it for method or service names that it generates, but protobuf may not be used as the IDL. Also, the other fields wouldn't be affected by protobuf restrictions.

Right. I was just asking about this as one data point.

The simplest solution would be to impose a restriction that these fields all have to be ASCII. Are there any problems with that approach?

I don't think there are any problems, just would like it called out in the spec. The restriction isn't obvious from the spec.

To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/d88e0282-26f6-45d9-afa2-b560c16bbd24%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Mark D. Roth

unread,

Jan 25, 2017, 10:39:40 AM1/25/17

to Carl Mastrangelo, grpc.io, Craig Tiller, Abhishek Kumar, Yuchen Zeng

On Tue, Jan 24, 2017 at 2:50 PM, 'Carl Mastrangelo' via grpc.io <grp...@googlegroups.com> wrote:

On Tuesday, January 24, 2017 at 10:21:20 AM UTC-8, Mark D. Roth wrote:
On Mon, Jan 23, 2017 at 2:54 PM, 'Carl Mastrangelo' via grpc.io <grp...@googlegroups.com> wrote:

On Monday, January 23, 2017 at 7:31:51 AM UTC-8, Mark D. Roth wrote:
On Fri, Jan 20, 2017 at 9:47 AM, 'Carl Mastrangelo' via grpc.io <grp...@googlegroups.com> wrote:
Initial thoughts:

* percentage needs to be declared to be an integer (as opposed to number). This will make it consistent internally and externally.

Done. (There's no reason to require that these fields be consistent internally and externally, since they're each going to be read by independent resolver implementations. But I agree that integer makes more sense.)

* TXT records are limitted to ASCII chars. What will happen if the method name, programming language, or load balancing policy is not pure ascii?

Good question. As a data point, do we know if protobuf allows non-ASCII chars in service or method names?

I don't think protobuf allows it for method or service names that it generates, but protobuf may not be used as the IDL. Also, the other fields wouldn't be affected by protobuf restrictions.

Right. I was just asking about this as one data point.

The simplest solution would be to impose a restriction that these fields all have to be ASCII. Are there any problems with that approach?

I don't think there are any problems, just would like it called out in the spec. The restriction isn't obvious from the spec.

I've added a note about this to the spec.

To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.

To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.

To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/e29aed3d-c539-4efd-bbff-d15f21429899%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Rudi Chiarito

unread,

Jan 25, 2017, 9:39:59 PM1/25/17

to Mark D. Roth, Carl Mastrangelo, grpc.io, Craig Tiller, Abhishek Kumar, Yuchen Zeng

On Tue, Jan 24, 2017 at 1:37 PM, Mark D. Roth <ro...@google.com> wrote:

I started a similar discussion on canarying ConfigMaps under Kubernetes. While this design is the way at least one push mechanism inside Google works, there's also a more predictable, push-like one (GDP's), where you pick N candidates, tell them to get a new config and watch them for T minutes, making sure they don't die. That, of course, assumes you keep track of the clients, through e.g. grpclb, and can also track their health (Borg and Kubernetes both do). Would you consider that for future developments?

The problem is that we have no mechanism for tracking clients that way. Tracking them via grpclb won't work, because clients should be able to use the service config without using grpclb. And more generally, any tracking mechanism like this would require a lot of communication between servers and clients, which would add a lot of complexity. I don't think that the advantage of getting slightly more accurate canary percentages is really worth that complexity.

Sorry, I don't know what I was thinking when I mentioned grpclb. And I should have explained things further.

For additional background, I also created an issue for TXT record support in Kubernetes' kube-dns, citing gRPC as an use case: https://github.com/kubernetes/dns/issues/38

My idea is that there would be a 1:1 mapping between a Kubernetes service and a gRPC service, something that developers seem to like. For example, you'd point clients to mysvc.myns.svc.mycluster.example.com (or a shorter relative name). That's a DNS hostname managed by kube-dns. With the new support, it would return the appropriate TXT record. More in detail:

There's a controller (daemon) in the Kubernetes cluster in charge of pushing gRPC config changes. It sends an update request on the mysvc service object. kube-dns, also running in the cluster, sees the object change and starts serving new TXT records. So far, gRPC itself doesn't need to be involved, i.e. there's no special protocol between servers and clients (or grpclb). Clients keep using DNS.

With your current design, the controller pushes a config with e.g. a 1% canary. Accurate percentages is not even the main issue; determinism and proper health check tracking are. Assuming that exactly 1% of clients pick up the change, now you have the issue of figuring if a replica that just crashed did it because of the new config or for unrelated reasons. Engineers really hate it when a configuration change gets rolled back automatically because of a random external event. Conversely, when an unhealthy service is seeing an elevated crash rate and you try to push a new config to stop the bleeding, it's very valuable to know if the new settings are making instances stable again.

In both cases, you need to track which configuration each client is using. You could do this through e.g. monitoring (each client reports the current config version) and correlate that with health checks.

Or, more simply, you could have the controller pick up N victims, push a config with a field along the lines of

`"hosts": "host1:port,host2:port,..."`

wait n=TTL seconds and then keep tracking the health status for those clients. This is all done in the controller, which is responsible for discovery, tracking clients through the proper APIs. The example above involves Kubernetes, but, in general, the same mechanism applies to every other environment. The only client change is for the grpc client to match its own host:port against "hosts". A proper config should have either "percentage" or "hosts", not both. Or maybe the latter always wins. The idea is that you canary with a small number of clients, then switch to percentages, so that the config doesn't get bloated with tens or hundreds of hostnames.

This approach has also the advantage that, if you have N groups of clients (e.g. three different frontend services that talk to a shared database), you can treat them and push to each of them independently. "clientLanguage" might be too coarse, especially when all your clients are in the same language. :-)

I'd prefer to avoid that if possible, because I think it's valuable for debugging purposes to have the TXT record in human-readable form. However, we can certainly add support for something like this if/when people start running into the length limitation.

I agree. It's really ugly and a true last resort to be implemented only when the actual need arises.

Mark D. Roth

unread,

Jan 26, 2017, 12:52:48 PM1/26/17

to Rudi Chiarito, Carl Mastrangelo, grpc.io, Craig Tiller, Abhishek Kumar, Yuchen Zeng

On Wed, Jan 25, 2017 at 6:39 PM, Rudi Chiarito <ru...@clarifai.com> wrote:

On Tue, Jan 24, 2017 at 1:37 PM, Mark D. Roth <ro...@google.com> wrote:
I started a similar discussion on canarying ConfigMaps under Kubernetes. While this design is the way at least one push mechanism inside Google works, there's also a more predictable, push-like one (GDP's), where you pick N candidates, tell them to get a new config and watch them for T minutes, making sure they don't die. That, of course, assumes you keep track of the clients, through e.g. grpclb, and can also track their health (Borg and Kubernetes both do). Would you consider that for future developments?

The problem is that we have no mechanism for tracking clients that way. Tracking them via grpclb won't work, because clients should be able to use the service config without using grpclb. And more generally, any tracking mechanism like this would require a lot of communication between servers and clients, which would add a lot of complexity. I don't think that the advantage of getting slightly more accurate canary percentages is really worth that complexity.

Sorry, I don't know what I was thinking when I mentioned grpclb. And I should have explained things further.

For additional background, I also created an issue for TXT record support in Kubernetes' kube-dns, citing gRPC as an use case: https://github.com/kubernetes/dns/issues/38

My idea is that there would be a 1:1 mapping between a Kubernetes service and a gRPC service, something that developers seem to like. For example, you'd point clients to mysvc.myns.svc.mycluster.example.com (or a shorter relative name). That's a DNS hostname managed by kube-dns. With the new support, it would return the appropriate TXT record. More in detail:

There's a controller (daemon) in the Kubernetes cluster in charge of pushing gRPC config changes. It sends an update request on the mysvc service object. kube-dns, also running in the cluster, sees the object change and starts serving new TXT records. So far, gRPC itself doesn't need to be involved, i.e. there's no special protocol between servers and clients (or grpclb). Clients keep using DNS.

With your current design, the controller pushes a config with e.g. a 1% canary. Accurate percentages is not even the main issue; determinism and proper health check tracking are. Assuming that exactly 1% of clients pick up the change, now you have the issue of figuring if a replica that just crashed did it because of the new config or for unrelated reasons. Engineers really hate it when a configuration change gets rolled back automatically because of a random external event. Conversely, when an unhealthy service is seeing an elevated crash rate and you try to push a new config to stop the bleeding, it's very valuable to know if the new settings are making instances stable again.

In both cases, you need to track which configuration each client is using. You could do this through e.g. monitoring (each client reports the current config version) and correlate that with health checks.

Or, more simply, you could have the controller pick up N victims, push a config with a field along the lines of

`"hosts": "host1:port,host2:port,..."`

wait n=TTL seconds and then keep tracking the health status for those clients. This is all done in the controller, which is responsible for discovery, tracking clients through the proper APIs. The example above involves Kubernetes, but, in general, the same mechanism applies to every other environment. The only client change is for the grpc client to match its own host:port against "hosts". A proper config should have either "percentage" or "hosts", not both. Or maybe the latter always wins. The idea is that you canary with a small number of clients, then switch to percentages, so that the config doesn't get bloated with tens or hundreds of hostnames.

This approach has also the advantage that, if you have N groups of clients (e.g. three different frontend services that talk to a shared database), you can treat them and push to each of them independently. "clientLanguage" might be too coarse, especially when all your clients are in the same language. :-)

Thanks for the detailed explanation of this use-case. As I think I mentioned up-thread, I certainly agree that providing some mechanism to allow deterministic client selection would be useful.

I'm warming to the idea of adding a 'hosts' selector field, but I do worry that people could easily start running into the TXT record length limitation if they start creating very long lists of hosts.

We might be able to ameliorate some of that by allowing some sort of simple pattern-matching language, although I'd prefer to avoid taking an external dependency on a regexp library, so it would probably need to be something very simple -- like maybe simple wildcard matching triggered by a '*' character. So, for example, if your three different frontend services run on hosts with the names as follows:

Frontend service "Foo": foofrontend1, foofrontend2, foofrontend3, ...

Frontend service "Bar": barfrontend1, barfrontend2, barfrontend3, ...

Frontend service "Baz": bazfrontend1, bazfrontend2, bazfrontend3, ...

Then you could select only the frontends from the first service by saying something like "foofrontend*". But we probably would not allow something like "foofrontend[12]" to get just the first two frontends of service "Foo"; instead, you would need to list them separately. Would something like that be useful in your use case?

Another possibility here would be to select by IP address. In that case, we could even allow subnet notation to select a whole range of IP addresses at once. (Though there could be some complexity here with regard to multi-homed hosts and how they'd figure out which IP would apply to them.) Would something like that be useful?

Do we actually want to select the client port in any of these cases? I'm not sure that's useful, since the client port would presumably be different for each backend it's connected to, and it would change any time it reconnected to a given backend. Is there a use-case where selecting on the client port is useful?

In terms of how this is encoded in JSON, I would probably want it to be a list of strings rather than a single string with a delimiter character. In other words, instead of 'hosts': 'host1,host2,...', it would be something like 'hosts': ['host1','host2',...].

What do you think?

I'd prefer to avoid that if possible, because I think it's valuable for debugging purposes to have the TXT record in human-readable form. However, we can certainly add support for something like this if/when people start running into the length limitation.

I agree. It's really ugly and a true last resort to be implemented only when the actual need arises.

--
Rudi Chiarito — Infrastructure — Clarifai, Inc.
"Trust me, I know what I'm doing." (Sledge Hammer!)

Rudi Chiarito

unread,

Jan 26, 2017, 3:42:50 PM1/26/17

to Mark D. Roth, Carl Mastrangelo, grpc.io, Craig Tiller, Abhishek Kumar, Yuchen Zeng

On Thu, Jan 26, 2017 at 12:52 PM, 'Mark D. Roth' via grpc.io <grp...@googlegroups.com> wrote:

We might be able to ameliorate some of that by allowing some sort of simple pattern-matching language, although I'd prefer to avoid taking an external dependency on a regexp library, so it would probably need to be something very simple -- like maybe simple wildcard matching triggered by a '*' character. So, for example, if your three different frontend services run on hosts with the names as follows:

Frontend service "Foo": foofrontend1, foofrontend2, foofrontend3, ...
Frontend service "Bar": barfrontend1, barfrontend2, barfrontend3, ...
Frontend service "Baz": bazfrontend1, bazfrontend2, bazfrontend3, ...

Then you could select only the frontends from the first service by saying something like "foofrontend*". But we probably would not allow something like "foofrontend[12]" to get just the first two frontends of service "Foo"; instead, you would need to list them separately. Would something like that be useful in your use case?

In our use case, pods (think Borg tasks) typically have their own IP and hostname, like myservice-RANDOMFIVELETTERHASH. The only exception are StatefulSets, an abstraction that lets pods in a replica set have unique per-instance settings, including, in this case, pod names: myservice-0, myservice-1, etc.

So the wildcard syntax would be of some use to us, while the [12] regex one wouldn't be yet, until the day we actually start using StatefulSets. I'm sure you'll find others that have more control over their IPs and would be interested. Even with StatefulSets, I don't think we'd use the feature, because they are typically used for services that have a small number of replicas.

The subnet notation is interesting, too, but in our case, IP addresses are, for most purposes, random. Plus, most people don't want to think about addresses. :-)

I would imagine that, in terms of scenarios covered, you'd probably see diminishing returns, from highest to lowest:

- raw list

- trailing wildcards

- free wildcards (anywhere in the string)

- regexes

- network masks

I would implement them in that order, but only when actual user demand materialises.

I agree with you that bloat is not just a hypothetical concern. That's why I suggested that pushes progress e.g. from one host, to two hosts, then three, then through percentages. One twist I forgot to mention is that once a client has been picked through explicit mention, the config should be sticky, i.e. it should keep the new one and shouldn't roll the dice when the config changes to a percentage. I guess you would have the same issue when you ramp from e.g. 1% to 10%: do you really want a client to potentially alternate between new and old config whenever the percentage changes? Unless you require people to always canary at only X%, then go straight to 100%.

Do we actually want to select the client port in any of these cases? I'm not sure that's useful, since the client port would presumably be different for each backend it's connected to, and it would change any time it reconnected to a given backend. Is there a use-case where selecting on the client port is useful?

I guess this would be theoretically useful if you run e.g. four different client processes on the same host. But I was really thinking of server host:ports, which are the ones that get advertised and discovered. Client ports are not advertised and are usually random, as you point out, unless you explicitly bind to them. So ignore the port part of my comment. (At some point, though, someone else will come up with the scenario I just described.)

In terms of how this is encoded in JSON, I would probably want it to be a list of strings rather than a single string with a delimiter character. In other words, instead of 'hosts': 'host1,host2,...', it would be something like 'hosts': ['host1','host2',...].

What do you think?

That sounds like a good start. Perhaps we can reserve the right in the future to add ports if enough people show compelling uses for it, but for now we don't parse them or use them, only mention that in docs.

Thanks!

Mark D. Roth

unread,

Jan 30, 2017, 1:15:10 PM1/30/17

to Rudi Chiarito, Carl Mastrangelo, grpc.io, Craig Tiller, Abhishek Kumar, Yuchen Zeng

On Thu, Jan 26, 2017 at 12:42 PM, Rudi Chiarito <ru...@clarifai.com> wrote:

On Thu, Jan 26, 2017 at 12:52 PM, 'Mark D. Roth' via grpc.io <grp...@googlegroups.com> wrote:
We might be able to ameliorate some of that by allowing some sort of simple pattern-matching language, although I'd prefer to avoid taking an external dependency on a regexp library, so it would probably need to be something very simple -- like maybe simple wildcard matching triggered by a '*' character. So, for example, if your three different frontend services run on hosts with the names as follows:

Frontend service "Foo": foofrontend1, foofrontend2, foofrontend3, ...
Frontend service "Bar": barfrontend1, barfrontend2, barfrontend3, ...
Frontend service "Baz": bazfrontend1, bazfrontend2, bazfrontend3, ...

Then you could select only the frontends from the first service by saying something like "foofrontend*". But we probably would not allow something like "foofrontend[12]" to get just the first two frontends of service "Foo"; instead, you would need to list them separately. Would something like that be useful in your use case?

In our use case, pods (think Borg tasks) typically have their own IP and hostname, like myservice-RANDOMFIVELETTERHASH. The only exception are StatefulSets, an abstraction that lets pods in a replica set have unique per-instance settings, including, in this case, pod names: myservice-0, myservice-1, etc.

So the wildcard syntax would be of some use to us, while the [12] regex one wouldn't be yet, until the day we actually start using StatefulSets. I'm sure you'll find others that have more control over their IPs and would be interested. Even with StatefulSets, I don't think we'd use the feature, because they are typically used for services that have a small number of replicas.

The subnet notation is interesting, too, but in our case, IP addresses are, for most purposes, random. Plus, most people don't want to think about addresses. :-)

I would imagine that, in terms of scenarios covered, you'd probably see diminishing returns, from highest to lowest:

- raw list
- trailing wildcards
- free wildcards (anywhere in the string)
- regexes
- network masks

I would implement them in that order, but only when actual user demand materialises.

Okay, it sounds like we should add the ability to select based on the client hostname for now, and then wait to see if we need the other options later. I've added this to the doc.

I agree with you that bloat is not just a hypothetical concern. That's why I suggested that pushes progress e.g. from one host, to two hosts, then three, then through percentages. One twist I forgot to mention is that once a client has been picked through explicit mention, the config should be sticky, i.e. it should keep the new one and shouldn't roll the dice when the config changes to a percentage. I guess you would have the same issue when you ramp from e.g. 1% to 10%: do you really want a client to potentially alternate between new and old config whenever the percentage changes? Unless you require people to always canary at only X%, then go straight to 100%.

It sounds like the doc needs to be more explicit about the semantics of the selector fields, especially when they're used in combination. Here's how I am expecting that it will work.

In order for a config choice to be selected, all of the selectors must be considered a match for the client. If a selector field is unset (or is set to an empty list), then it is considered a match for all clients. If a selector field is non-empty, then the client must match the value (or, in the case of a list, one of the values) in order to be considered a match.

In other words, the code to determine which choice to use will look something like this (pseudo-code):

for each choice {

for each selector {
if selector does not match, then skip to next choice

}

if we are still here (i.e., all selectors matched), then use this choice
}

So, the net effect of this is that if you use both the client host selector and the percentage selector in the same choice, then the choice will only be used if both selectors match, which means that you won't be guaranteed that the specified hosts will use the choice as the percentage changes. However, if you do want to guarantee that the specified hosts will use the same data unconditionally, then you can specify two choice: first a choice that specifies the hosts, and then one that specifies the percentage. This requires you to duplicate the config data, but it provides a generic way for the config author to control whether they get "AND" or "OR" semantics between multiple selector fields, which I suspect will be important as new selector fields are added in the future.

I've added some comments to the doc to describe the matching algorithm.

Do we actually want to select the client port in any of these cases? I'm not sure that's useful, since the client port would presumably be different for each backend it's connected to, and it would change any time it reconnected to a given backend. Is there a use-case where selecting on the client port is useful?

I guess this would be theoretically useful if you run e.g. four different client processes on the same host. But I was really thinking of server host:ports, which are the ones that get advertised and discovered. Client ports are not advertised and are usually random, as you point out, unless you explicitly bind to them. So ignore the port part of my comment. (At some point, though, someone else will come up with the scenario I just described.)

Okay, sounds like we don't need to worry about ports.

Just to be clear, the hostname selector we discussed above will be the client hostname, not the server hostname. I don't think it makes sense to allow selecting on the server hostname, because the service config parameters are not things that we would want to be different depending on which backend the RPC happens to be sent to. (Different RPCs can be sent to different backends at the discretion of the load balancing policy, and using different defaults for different backends would cause confusion for policies like round_robin.)

In terms of how this is encoded in JSON, I would probably want it to be a list of strings rather than a single string with a delimiter character. In other words, instead of 'hosts': 'host1,host2,...', it would be something like 'hosts': ['host1','host2',...].

What do you think?

That sounds like a good start. Perhaps we can reserve the right in the future to add ports if enough people show compelling uses for it, but for now we don't parse them or use them, only mention that in docs.

I think that I won't bother mentioning this at all for now. We can add this functionality later if and when it becomes necessary.

Thanks!

--
Rudi Chiarito — Infrastructure — Clarifai, Inc.
"Trust me, I know what I'm doing." (Sledge Hammer!)

--

You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.

To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/CAPzY349UjDMBJXY-H5Zzkt%2BQTHkFCgpHqFq-hTBxFQ9O9eMyxA%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward