I'm having trouble adding a GlusterFS volume to a pod in my GKE instance. I'm using K8S 1.5.2 with the gci image, so I have the newfangled containerized mount mechanism.
I've traced what /home/kubernetes/bin/mounter is attempting to do, and it seems to boil down to:
- use rkt to launch a debian image that mounts /var/lib/kubelet/pods from the host (the GCI system)
- run mount -t glusterfs $HOST:/$VOLUME /var/lib/kubetlet/pods/$UUID
- inside the rkt container, runs "sbin/mount.glusterfs $HOST:/$VOLUME /var/lib/kubetlet/pods/$UUID" (this is a shell script)
- which ends up calling "/usr/sbin/glusterfs --volfile-server=$HOST --volfile-id=/$VOLUME /var/lib/kubetlet/pods/$UUID"
When I run that final command myself, the glusterfs command exits with error code 107. If I pull the glusterfs mount logs (still in the rkt container), I see this:
[2017-02-09 15:29:24.549156] I [MSGID: 114020] [client.c:2118:notify] 0-gv0-client-0: parent translators are ready, attempting connect on transport
[2017-02-09 15:29:24.550003] E [MSGID: 101075] [common-utils.c:306:gf_resolve_ip6] 0-resolver: getaddrinfo failed (Temporary failure in name resolution)
[2017-02-09 15:29:24.550016] E [name.c:247:af_inet_client_get_remote_sockaddr] 0-gv0-client-0: DNS resolution failed on host glusterfs-0.glusterfs
[2017-02-09 15:29:24.550063] I [MSGID: 114020] [client.c:2118:notify] 0-gv0-client-1: parent translators are ready, attempting connect on transport
[2017-02-09 15:29:24.550161] E [MSGID: 101075] [common-utils.c:306:gf_resolve_ip6] 0-resolver: getaddrinfo failed (Temporary failure in name resolution)
[2017-02-09 15:29:24.550170] E [name.c:247:af_inet_client_get_remote_sockaddr] 0-gv0-client-1: DNS resolution failed on host glusterfs-1.glusterfs
One small twist (though I don't think this is the root cause) is that my glusterfs peer nodes are themselves running as pods in k8s. They are controlled by a statefulset, which is why it shows DNS resolution failure for both glusterfs-0.glusterfs and glusterfs-1.glusterfs.
These addresses resolve fine from other pods:
$ kubectl exec -i -t mariadb-3507467696-9j65w -- ping -c 1 glusterfs-0.glusterfs
PING glusterfs-0.glusterfs.default.svc.cluster.local (10.216.14.21): 56 data bytes
--- glusterfs-0.glusterfs.default.svc.cluster.local ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.171/0.171/0.171/0.000 ms
$ kubectl exec -i -t mariadb-3507467696-9j65w -- ping -c 1 glusterfs-1.glusterfs
PING glusterfs-1.glusterfs.default.svc.cluster.local (10.216.13.38): 56 data bytes
--- glusterfs-1.glusterfs.default.svc.cluster.local ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.896/0.896/0.896/0.000 ms
But inside of the rkt container, there is no name resolution at all;
/home/kubernetes/bin/rkt run --stage1-path=/home/kubernetes/bin/stage1-fly.aci --insecure-options=image --volume=kubelet,kind=host,source=/var/lib/kubelet,readOnly=false,recursive=true --mount volume=kubelet,target=/var/lib/kubelet gcr.io/google_containers/gci-mounter:v2 --user=root --exec /bin/bash run: group "rkt" not found, will use default gid when rendering images
groups: cannot find name for group ID 11
groups: cannot find name for group ID 1001
root@gke-something-d5c9d60b-6cnm:/# cat /etc/resolv.conf
root@gke-something-d5c9d60b-6cnm:/# grep hosts /etc/nsswitch.conf
hosts: files dns
In fact, before I got this far I was attempting to mount the glusterfs filesystem using the servicename, and that failed entirely. I had to pick the pod address of one of the two pods in order to get this error.
For glusterfs, the mounter grabs a volume file from the host you specify (which could be any of the peers in the pool, since they should all have a consistent view of a given volume). The volume file contains references to the individual nodes that host the brick(s) you want to access, and you pick one (with fallback if it fails) to actually get the data from.
Since the containerized mount doesn't have DNS resolution of any sort, anything that wants to use hostnames as part of a network mount is going to fail (I imagine NFS would have the same issue).
I can't configure the volume file to export the bricks via IP address because a) that's kind of dumb and b) statefulsets don't guarantee IP stability when the pods get recycled, only network identity (i.e. hostname). I suspect that if I did reconfigure the volume to be entirely IP based, it would mount properly, until the first time a glusterfs pod died and was restarted, at which point it would start failing.
I realize I'm a bit of a walking edge case here, but does anyone have ideas? I assume it would be technically possible to change the mounter image to create a resolv.conf so that the mounter container had the same experience as a pod with "dnsPolicy: ClusterFirst", but I wanted to check before embarking on a patch.
Cheers