On Tue, Aug 21, 2018 at 5:25 PM <
krma...@gmail.com> wrote:
>
> Thank Tim. See inline
> >There's the hostname you get when you run "hostname" and there's the
> >hostname you get when you look at `Pod.spec.hostname`. I meant the
> >latter, you meant the former. In a Deployment which sets
> >`Deployment.spec.template.spec.hostname`, all child pods will have the
> >same `Pod.spec.hostname`.
> Thanks , this is clear now. Yes i was talking about the former. It seems this was definitely all designed for StatefulSets. The pod.spec.hostname still doesnt give well known hostnames in Deployment(since all pods will get the same hostname) so the user HAS to create different pods with different hostnames to achieve the well known hostnames and an A record for each Pod if they dont want to use StatefulSets. I dont understand why the Hadoop/Yarm use case @Tim Xu pointed couldnt have used the StatefulSet directly and avoided using hostname with invidual Pod feature. I think if StatefulSet provided the Rolling Update feature with maxUnavailable/maxSurge, no one would ever use the Deployment object ever, but thats a different thread.
>
>
> >DNS is about Services. We do not try to do DNS for Pods. DNS servers
> >are not watching Pods (it doesn't scale well). Besides that, while a
> >Pod can claim to be part of svc-a, I want svc-a to confirm that --
> >that is what the headless Service does. It triggers DNS servers to
> >do SOMETHING and completes the "bi-directional linkage" between a pod
> >and it's primary service.
>
> Still confused about, how does creating a headless service with a name different than svc-a, but having the same selector as svc-a, means svc-a is confirming that a Pod belongs to svc-a ?
It doesn't. The name of the service has to exactly match the name of
the pod's subdomain. The pod says "I am part of svc-a" and the
Service says (by selector) "yes, you are".
Keep the goal in mind - to have a name that the pods sees for itself
and that is also DNS resolvable. DNS does not look at pods, per se -
just Services and Endpoints. If the pod-a sets subdomain to "svc-b"
it sees it's own name as part of svc-a, but that name is not DNS
resolvable yet. The DNS server sees svc-a and its endpoints, which
includes pod-a. The endpoint for pod-a says "this is my canonical
name". We can now program DNS to forward resolve (A) for
pod-a.svc-a.my-ns.svc.cluster.local to the pod's IP. We can also
reverse resolve (PTR) that IP to that name.
What happens if svc-b also selects pod-a? We can forward resolve the
new name (and do) but what about reverse? We are not supposed to have
multiple PTR records for a given IP. This is part of why the pod's
self-nomination to be canonical under svc-a is important.
> I dont see how svc-a exercises any control here (Sorry if i am missing a point here.)?
If we only trust the pod, any pod can become a member of your DNS
record and they can collide. As it is, we have this problem with
Deployment (all pods use the same hostname == collision) and we handle
it badly. We really need to fix that.
> I do see that we have introduced an extra step for the cluster operator to say that by explicitly introducing a headless service, there is an explicit intent ack by the operator to say i know what i am doing by creating a headless svc which has the same selector as the svc-a or svc-b and the same name as subdomain specified in the Pod. This allows the operator to switch addition of A records for Pods belonging to svc-a or svc-b by switching subdomain and the headless service combination without bringing down svc-a or svc-b. Does this seem like the right intent of this decoupling ? I would love to see a solid use case documented for our users(may be the users who helped to get this feature added can help here.), which cannot be fulfilled by StatefulSets.
>
> The sad part here is basically requiring to name this extra thing called headless service and populating it in subdomain.
You have to internalize this rule - DNS can't watch pods today. There
are too many of them and they change too fast for it to scale well.
Some sites run a DNS DaemonSet and that makes clusters cry. We have
other scalability problems with Services that need to be resolved,
too, but this rule has been in effect forever. This is what lead to
this seemingly complicated design.
But at the end of the day, this whole feature was designed for a
relative corner-case -- pods who need to be able to DNS resolve
themselves. Most pods do not have this requirement, which means the
relatively complicated design is a fair tradeoff. The "normal" path
stays clean.
There are shortcomings here (especially wrt Deployments that qualify
for this handling, but are buggy). I'd like to see those fixed, but I
am not sure we want a dramatically different design. That said, we're
always open to ideas if they are motivated by real use-cases.
> To view this discussion on the web visit
https://groups.google.com/d/msgid/kubernetes-dev/2d6a45ed-91db-4403-a8dc-52f62c9b48ca%40googlegroups.com.