kubectl logs fails when deployed on AWS

232 views
Skip to first unread message

Adam Kunicki

unread,
Jan 8, 2018, 12:04:38 AM1/8/18
to CoreOS User
I'm having some issues when running Tectonic 1.8 deployed on AWS.

I'm able to get the cluster up and running using terraform however, certain command such as kubectl logs do not work due to a DNS resolution failure.
I've noticed several long threads on GitHub about this, but haven't really found a solution.

For example:
Error from server: Get https://ip-172-31-xxx-xxx:10250/containerLogs/tectonic-system/tectonic-identity-97d874c57-fxz25/tectonic-identity: dial tcp: lookup ip-172-31-xxx-xxx on 172.31.0.2:53: no such host

Seems that for kubectl logs, the master uses the hostname rather than fqdn for lookups and for some reason go can't resolve it. The hostname is resolvable from the master and worker nodes as well as on my client machine using both host and dig commands. /etc/resolv.conf is configured to use Route53 DNS and includes the search suffix for the <region>.compute.internal domain name as well as the domain name used for the cluster.

In case it matters, I'm using a private hosted zone and only have internal ingress enabled in the Terraform variables file. I had previously followed a similar deployment model on Azure and did not experience this issue there.

Has anyone else run into this before, or know what a (scalable) workaround might be?

Thanks!
Adam

Kyle Brown

unread,
Jan 8, 2018, 3:01:14 PM1/8/18
to Adam Kunicki, CoreOS User
Hi Adam, 

I've noticed this issue when there is not a proper dhcp option set attached to the VPC. Our docs mention this: 

DHCP Options Set attached to the VPC must have an AWS private domain name. In us-east-1 region, an AWS private domain name is ec2.internal whereas other regions use region.compute.internal.

Might want to verify that your dhcp-option associated with the VPC is valid: 

aws ec2 describe-dhcp-options
Cheers,
Kyle Brown


--
You received this message because you are subscribed to the Google Groups "CoreOS User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to coreos-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Adam Kunicki

unread,
Jan 8, 2018, 3:06:07 PM1/8/18
to Kyle Brown, CoreOS User
Thanks for the help!

Looks like CoreOS interprets multiple search domains from AWS DHCP OptionSets as a single string (multiple entries separated by the string "032" rather than spaces). Redeploying with only a single search domain to verify that this fixes it.

For reference, this thread has some more information: https://github.com/coreos/bugs/issues/1934
Ultimately sounds like an AWS bug that other Linux distros handle safely.
Reply all
Reply to author
Forward
0 new messages