Why is subdomain needed in POD for DNS entries and what is its typically suggested values

1,330 views
Skip to first unread message

krma...@gmail.com

unread,
Aug 15, 2018, 7:55:10 PM8/15/18
to Kubernetes developer/contributor discussion
Hi 

From this documentation https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/ , it seems that the only way we can add DNS entries for POD of  a Deployment are, when we specify the subdomain field and add a headless service of the same name in that namespace.

I have the following questions:-
1: What is the rationale for always requiring the headless service to be present for the DNS entries to be added ? The only reason i can think of is this allows you to add entries for only ready endpoints or Pods. Is there a better reason or something i am missing ?

2: What do typically people use in the subdomain field ?  If a team(identified by namespace) has many deployments, do people use the Deployment name in the subdomain or they choose to put something else. Any guidance on this will be useful

Thanks
Mayank

Tim Hockin

unread,
Aug 15, 2018, 8:09:08 PM8/15/18
to krma...@gmail.com, Kubernetes developer/contributor discussion
Pods in a Deployment all have the same template so they all have the
same hostname. That is generally not very useful (and in fact
broken).

Kubernetes DNS is *about* Services. The DNS server does not watch
Pods. Pods can come and go to their hearts content and DNS doesn't
care until you create a Service (headless or otherwise).

Subdomain must match the name of the headless Service. Remember that
a pod can be "in" many Services. Setting subdomain effectively saying
"I nominate that service X is my canonical name".

This feature was designed for use in those applications that NEED
their hostname (as in running `hostname -f`) to be DNS resolvable.
The vast vast vast majority of apps do not need that. It was designed
to work with StatefulSet, not Deployment.
> --
> You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.
> To post to this group, send email to kuberne...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/fba292cd-3973-4506-9de1-9e5db8a74973%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Tim Xu

unread,
Aug 15, 2018, 11:08:59 PM8/15/18
to Tim Hockin, krma...@gmail.com, Kubernetes developer/contributor discussion
In some cluster deployments, they want the worker nodes to have their own hostname and must know them in advance, such as Hadoop / Yarn. So it is one case of pod FQDN. Agree that It was designed to work with StatefulSet, not Deployment.

Steven Harris

unread,
Aug 16, 2018, 9:33:56 AM8/16/18
to Kubernetes developer/contributor discussion
On Wednesday, August 15, 2018 at 7:55:10 PM UTC-4, krma...@gmail.com wrote:
The only reason i can think of is this allows you to add entries for only ready endpoints or Pods. Is there a better reason or something i am missing ?

On that note, I am reminded of this issue I filed, which complains that not only must you define a headless Service, but that Service must expose at least one port in order to get the A records for the pods:

kubernetes/dns 174: Pods and headless Services don't get DNS A records without at least one service port

krma...@gmail.com

unread,
Aug 21, 2018, 2:58:26 AM8/21/18
to Kubernetes developer/contributor discussion
Thanks all for the responses. See my comments inline

> Pods in a Deployment all have the same template so they all have the 
> same hostname.  That is generally not very useful (and in fact 
> broken). 
All Pods of a deployment get the hostname as the pod name. This works without specifying subdomain or the hostname in the pod spec. So I don't understand this comment about they all having the same hostname.

>Subdomain must match the name of the headless Service.  Remember that 
>a pod can be "in" many Services.  Setting subdomain effectively saying 
>"I nominate that service X is my canonical name". 
I agree that a Pod can be part of many kubernetes Services. So lets say Pod is part of svc-a and svc-b. If i want the Pod hostname to be in subdomain of svc-a, then i should be able to specify the subdomain of Pod to be svc-a or svc-b. I do not understand why we need to create a new Headless service with the right selector which has no relation to the existing services svc-a or svc-b and then nominate Pod to belong to this Headless service. Am i missing something ?

> On that note, I am reminded of this issue I filed, which complains that not only must you define a headless Service, but that Service must expose at least one 
> port in order to get the A records for the pods:
Agree i have seen that too. I always add a dummy port 


>This feature was designed for use in those applications that NEED 
> their hostname (as in running `hostname -f`) to be DNS resolvable. 
> The vast vast vast majority of apps do not need that.  It was designed 
> to work with StatefulSet, not Deployment. 
Even with StatefulSet, Why do we need the Headless Service ? 
For StatefulSet, what is the recommended values people use when populating the subdomain field ?

Thanks
Mayank

Tim Hockin

unread,
Aug 21, 2018, 11:57:27 AM8/21/18
to krma...@gmail.com, Kubernetes developer/contributor discussion
On Mon, Aug 20, 2018 at 11:58 PM <krma...@gmail.com> wrote:
>
> Thanks all for the responses. See my comments inline
>
> > Pods in a Deployment all have the same template so they all have the
> > same hostname. That is generally not very useful (and in fact
> > broken).
>
> All Pods of a deployment get the hostname as the pod name. This works without specifying subdomain or the hostname in the pod spec. So I don't understand this comment about they all having the same hostname.

There's the hostname you get when you run "hostname" and there's the
hostname you get when you look at `Pod.spec.hostname`. I meant the
latter, you meant the former. In a Deployment which sets
`Deployment.spec.template.spec.hostname`, all child pods will have the
same `Pod.spec.hostname`.

> >Subdomain must match the name of the headless Service. Remember that
> >a pod can be "in" many Services. Setting subdomain effectively saying
> >"I nominate that service X is my canonical name".
>
> I agree that a Pod can be part of many kubernetes Services. So lets say Pod is part of svc-a and svc-b. If i want the Pod hostname to be in subdomain of svc-a, then i should be able to specify the subdomain of Pod to be svc-a or svc-b. I do not understand why we need to create a new Headless service with the right selector which has no relation to the existing services svc-a or svc-b and then nominate Pod to belong to this Headless service. Am i missing something ?

DNS is about Services. We do not try to do DNS for Pods. DNS servers
are not watching Pods (it doesn't scale well). Besides that, while a
Pod can claim to be part of svc-a, I want svc-a to confirm that --
that is what the headless Service does. It triggers DNS servers to
do SOMETHING and completes the "bi-directional linkage" between a pod
and it's primary service.

> > On that note, I am reminded of this issue I filed, which complains that not only must you define a headless Service, but that Service must expose at least one
> > port in order to get the A records for the pods:
>
> Agree i have seen that too. I always add a dummy port

This is fixed in PR #67622 - a shortcoming in Endpoint processing was
eliding Services with no ports (it was even unit tested that way --
evolution :)

> >This feature was designed for use in those applications that NEED
> > their hostname (as in running `hostname -f`) to be DNS resolvable.
> > The vast vast vast majority of apps do not need that. It was designed
> > to work with StatefulSet, not Deployment.
>
> Even with StatefulSet, Why do we need the Headless Service ?
> For StatefulSet, what is the recommended values people use when populating the subdomain field ?

See above :)

I'm told it matters to some apps - that's why we have it.

> Thanks
> Mayank
>
>
> On Thursday, August 16, 2018 at 6:33:56 AM UTC-7, Steven Harris wrote:
>>
>> On Wednesday, August 15, 2018 at 7:55:10 PM UTC-4, krma...@gmail.com wrote:
>>>
>>> The only reason i can think of is this allows you to add entries for only ready endpoints or Pods. Is there a better reason or something i am missing ?
>>
>>
>> On that note, I am reminded of this issue I filed, which complains that not only must you define a headless Service, but that Service must expose at least one port in order to get the A records for the pods:
>>
>> kubernetes/dns 174: Pods and headless Services don't get DNS A records without at least one service port
>
> --
> You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.
> To post to this group, send email to kuberne...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/4967dcc3-1704-414d-a565-8fe3b8915544%40googlegroups.com.

krma...@gmail.com

unread,
Aug 21, 2018, 8:25:45 PM8/21/18
to Kubernetes developer/contributor discussion
Thank Tim. See inline
>There's the hostname you get when you run "hostname" and there's the 
>hostname you get when you look at `Pod.spec.hostname`.  I meant the 
>latter, you meant the former.  In a Deployment which sets 
>`Deployment.spec.template.spec.hostname`, all child pods will have the 
>same `Pod.spec.hostname`. 
Thanks , this is clear now. Yes i was talking about the former. It seems this was definitely all designed for StatefulSets. The pod.spec.hostname still doesnt give well known hostnames in Deployment(since all pods will get the same hostname) so the user HAS to create different pods with different hostnames to achieve the well known hostnames and an A record for each Pod if they dont want to use StatefulSets. I dont understand why the Hadoop/Yarm use case @Tim Xu pointed couldnt have used the StatefulSet directly and avoided using hostname with invidual Pod feature. I think if StatefulSet provided the Rolling Update feature with maxUnavailable/maxSurge, no one would ever use the Deployment object ever, but thats a different thread.


>DNS is about Services.  We do not try to do DNS for Pods.  DNS servers 
>are not watching Pods (it doesn't scale well).  Besides that, while a 
>Pod can claim to be part of svc-a, I want svc-a to confirm that -- 
>that is what the headless Service does.   It triggers DNS servers to 
>do SOMETHING and completes the "bi-directional linkage" between a pod 
>and it's primary service. 

Still confused about, how does creating a headless service with a name different than svc-a, but having the same selector as svc-a, means svc-a is confirming that a Pod belongs to svc-a ? I dont see how svc-a exercises any control here (Sorry if i am missing a point here.)?  I do see that we have introduced an extra step for the cluster operator to say that by explicitly introducing a headless service, there is an explicit intent ack by the operator to say i know what i am doing by creating a headless svc which has the same selector as the svc-a or svc-b and the same name as subdomain specified in the Pod. This allows the operator to switch addition of A records for Pods belonging to svc-a or svc-b by switching subdomain and the headless service combination without bringing down svc-a or svc-b. Does this seem like the right intent of this decoupling ?  I would love to see a solid use case documented for our users(may be the users who helped to get this feature added can help here.), which cannot be fulfilled by StatefulSets. 

 The sad part here is basically requiring to name this extra thing called headless service and populating it in subdomain. 

Tim Hockin

unread,
Aug 22, 2018, 1:48:33 PM8/22/18
to krma...@gmail.com, Kubernetes developer/contributor discussion
On Tue, Aug 21, 2018 at 5:25 PM <krma...@gmail.com> wrote:
>
> Thank Tim. See inline
> >There's the hostname you get when you run "hostname" and there's the
> >hostname you get when you look at `Pod.spec.hostname`. I meant the
> >latter, you meant the former. In a Deployment which sets
> >`Deployment.spec.template.spec.hostname`, all child pods will have the
> >same `Pod.spec.hostname`.
> Thanks , this is clear now. Yes i was talking about the former. It seems this was definitely all designed for StatefulSets. The pod.spec.hostname still doesnt give well known hostnames in Deployment(since all pods will get the same hostname) so the user HAS to create different pods with different hostnames to achieve the well known hostnames and an A record for each Pod if they dont want to use StatefulSets. I dont understand why the Hadoop/Yarm use case @Tim Xu pointed couldnt have used the StatefulSet directly and avoided using hostname with invidual Pod feature. I think if StatefulSet provided the Rolling Update feature with maxUnavailable/maxSurge, no one would ever use the Deployment object ever, but thats a different thread.
>
>
> >DNS is about Services. We do not try to do DNS for Pods. DNS servers
> >are not watching Pods (it doesn't scale well). Besides that, while a
> >Pod can claim to be part of svc-a, I want svc-a to confirm that --
> >that is what the headless Service does. It triggers DNS servers to
> >do SOMETHING and completes the "bi-directional linkage" between a pod
> >and it's primary service.
>
> Still confused about, how does creating a headless service with a name different than svc-a, but having the same selector as svc-a, means svc-a is confirming that a Pod belongs to svc-a ?

It doesn't. The name of the service has to exactly match the name of
the pod's subdomain. The pod says "I am part of svc-a" and the
Service says (by selector) "yes, you are".

Keep the goal in mind - to have a name that the pods sees for itself
and that is also DNS resolvable. DNS does not look at pods, per se -
just Services and Endpoints. If the pod-a sets subdomain to "svc-b"
it sees it's own name as part of svc-a, but that name is not DNS
resolvable yet. The DNS server sees svc-a and its endpoints, which
includes pod-a. The endpoint for pod-a says "this is my canonical
name". We can now program DNS to forward resolve (A) for
pod-a.svc-a.my-ns.svc.cluster.local to the pod's IP. We can also
reverse resolve (PTR) that IP to that name.

What happens if svc-b also selects pod-a? We can forward resolve the
new name (and do) but what about reverse? We are not supposed to have
multiple PTR records for a given IP. This is part of why the pod's
self-nomination to be canonical under svc-a is important.

> I dont see how svc-a exercises any control here (Sorry if i am missing a point here.)?

If we only trust the pod, any pod can become a member of your DNS
record and they can collide. As it is, we have this problem with
Deployment (all pods use the same hostname == collision) and we handle
it badly. We really need to fix that.

> I do see that we have introduced an extra step for the cluster operator to say that by explicitly introducing a headless service, there is an explicit intent ack by the operator to say i know what i am doing by creating a headless svc which has the same selector as the svc-a or svc-b and the same name as subdomain specified in the Pod. This allows the operator to switch addition of A records for Pods belonging to svc-a or svc-b by switching subdomain and the headless service combination without bringing down svc-a or svc-b. Does this seem like the right intent of this decoupling ? I would love to see a solid use case documented for our users(may be the users who helped to get this feature added can help here.), which cannot be fulfilled by StatefulSets.
>
> The sad part here is basically requiring to name this extra thing called headless service and populating it in subdomain.

You have to internalize this rule - DNS can't watch pods today. There
are too many of them and they change too fast for it to scale well.
Some sites run a DNS DaemonSet and that makes clusters cry. We have
other scalability problems with Services that need to be resolved,
too, but this rule has been in effect forever. This is what lead to
this seemingly complicated design.

But at the end of the day, this whole feature was designed for a
relative corner-case -- pods who need to be able to DNS resolve
themselves. Most pods do not have this requirement, which means the
relatively complicated design is a fair tradeoff. The "normal" path
stays clean.

There are shortcomings here (especially wrt Deployments that qualify
for this handling, but are buggy). I'd like to see those fixed, but I
am not sure we want a dramatically different design. That said, we're
always open to ideas if they are motivated by real use-cases.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/2d6a45ed-91db-4403-a8dc-52f62c9b48ca%40googlegroups.com.
Reply all
Reply to author
Forward
This conversation is locked
You cannot reply and perform actions on locked conversations.
0 new messages