CC @kubernetes/sig-storage-feature-requests
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Prevent issues from auto-closing with an /lifecycle frozen
comment.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or @fejta
.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or @fejta
.
/lifecycle rotten
/remove-lifecycle stale
Ran into this, myself, while attempting to configure my storageclasses to speak to a heketi glusterfs pod by service name. When I rebooted my cluster, the cluster IP changed, which broke my glusterfs storage solution.
I believe the issue is the node's resolv.conf needs to be configured to point to Kubernetes' dns service.
This is a host configuration that needs to be done per deployment. I believe we do it automatically for GCE/GKE but I'm unsure about other environments. cc @jingxu97
Hm maybe @kubernetes/sig-network-misc knows something that can be done here.
The problem is volume mounts are done by kubelet, so the nfs server IP/hostname needs to be accessible to kubelet's network.
Which on a baremetal deployment should be the node's/host's network, correct?
Correct. So if your nfs server is being provided by a Pod, then you need the node/host network to be able to access the pod's network, which like you pointed out, could be tricky depending on how you've configured your network.
Okay, so I see two solutions to this problem, then:
Service
for kube-dns from ClusterIP
to NodePort
and adjust the node's /etc/resolv.conf
to point to the local IPs to get name resolution working.Service
for NFS or GlusterFS (in my case) from ClusterIP
to NodePort
, and then change the StorageClass
to point to one of the node's static IPs in the node's subnet.Of the two, the first seems like the more generic solution for getting name resolution working across the cluster, but may have unintended side-effects if things are setup to expect it as a ClusterIP
. The second solves this direct problem. I'll try the second option when I get home tonight and report back.
We may want to update the public facing docs to mention that StorageClass
definitions use the kubelet's network so that others don't run aground when trying to set this up.
Did you have a chance to experiment a bit ? For solution 2), Nodeport define ports in the 30000-32767 range, so it seems also necessary to modify nfs PV default ports ? (2049 & 111)
I have a baremetal setup with flannel, and editing resolv.conf doesn't work, because like you pointed out, my nodes don't have access to container's network.
Unfortunately, I can't /edit/ the restUrl for my storageclass because "updating parameters is illegal"... I may end up losing data by deleting the storage class and recreating it with the right url.
Fortunately my analysis was wrong -- didn't lose any data at all, thankfully.
So what I've done is update my Services to be NodePorts, exposed them on port 32708, and set my resturls in the storageclasses to http://<random-node-ip-from-cluster>:32708
. This allows things to continue to work, but there are two major downsides now:
But things work at least.
When I rebooted my cluster, the cluster IP changed, which broke my glusterfs storage solution.
@jtgans Sorry that I might be missing the point, why rebooting cluster changes the cluster IP? Did the NFS service got deleted and recreated? If so, what about giving the NFS service a fixed IP in manifest and keeping it as type=ClusterIP?
I actually didn't realize that ClusterIP services could specify their IPs in the definition. I'll give this a try today.
Pretty sure I didn't recreate the service post reboot, but it's been a while since I restarted the cluster.
use full service name, it works fine:
apiVersion: v1
kind: PersistentVolume
metadata:
name: nfs-zuzu
spec:
capacity:
storage: 1Mi
accessModes:
- ReadWriteMany
nfs:
server: nfs-server.build.svc.cluster.local
path: "/"
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
Closed #44528.
@mtricolici Tried. Didn't work.
@mtricolici @digglife 'build' should be replaced by the namespace.
E.g. when using no namespace (default), the value for server should be nfs-server.default.svc.cluster.local
I have also tried using the service name, and that did not work. Can you clarify the kubectl command that gets the "full" service name that resolves?
Take that back. I just got it to work using "{service-name}.{namespace}.svc.cluster.local". I did not realize that svc.cluster.local was always the same.
I just tried {service-name}.{namespace}.svc.cluster.local
on EKS and it doesn't work.
e.g. service name nfs-service
on default namespace
I got
mount.nfs: Failed to resolve server nfs-service.default.svc.cluster.local: Name or service not known
They are still no fix or clean workaround available?
I just tried
{service-name}.{namespace}.svc.cluster.local
on EKS and it doesn't work.e.g. service name
nfs-service
on default namespaceI got
mount.nfs: Failed to resolve server nfs-service.default.svc.cluster.local: Name or service not known
They are still no fix or clean workaround available?
I'm also having this issue on EKS.
Also having this issue on Digital Ocean.
/reopen
@rjohnson3: You can't reopen an issue/PR unless you authored it or you are a collaborator.
In response to this:
/reopen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Also having this issue on GKE
Same issue in OpenStack.
Same issue on NFS
I just tried
{service-name}.{namespace}.svc.cluster.local
on EKS and it doesn't work.e.g. service name
nfs-service
on default namespaceI got
mount.nfs: Failed to resolve server nfs-service.default.svc.cluster.local: Name or service not known
They are still no fix or clean workaround available?
I'm also having this issue on AKS.
@will-beta: You can't reopen an issue/PR unless you authored it or you are a collaborator.
In response to this:
/reopen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
—
/reopen
Same issue on CDK.
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
From what I just learned this is the issue only on non-GKE Kubernetes. Can't wait for upstream fix.so we can get proper service DNS name resolution on all providers.
Any progress perhaps?
Worth noting that the example highlighted here is now at: https://github.com/kubernetes/examples/tree/bdda4f31e60d171f77252e7d041e9c6877565f73/staging/volumes/nfs
Yeah, regardless, the question still applies as there was no upstream solution at the time of the docs writing.
What does GKE does that allow for this different behavior from upstream Kubernetes?
has there been any work on this ?
Did anyone found a solution for this? It really surprises me that this (imo big) issue hasn't been resolved in 3 years?
How do people cope with this situation, if you want to run multiple nfs servers - you can't always hardcode the IPs?
Any suggestions on how to workaround this?
Specifing nfs-service.default.svc.cluster.local
didn't work out for us.
I'm not sure but could be using the ExternalName service be a viable solution?
I was under impression that this particular object was created to solve these issues. I didn't try it yet but would welcome feedback from those who did, regardless of the outcome.
FWIW I have worked around this by using the nfs-server-provisioner helm chart and moving on with my life. I will say, something has changed about helm's website and now there seem to be two (identical?) options for this, which is a bit weird.
https://artifacthub.io/packages/helm/kvaps/nfs-server-provisioner
Hope that helps! It has worked well enough for me! It would definitely be nice to have a fix though!
A coworker dug into the source and suspected the bug was here, in case it is any help: https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/nfs/nfs.go#L256
The issue is in some environments, kubelet's host network does not have access to the cluster dns. Using https://github.com/kubernetes-csi/csi-driver-nfs should resolve this because it runs as a Pod so has access to cluster services.
This does not appear to us as far as I understand it, when I connect via shell I can ping the nfs server directly via nfs-server-service
and the IP resolution works fine. So the the kube proxy knows where our nfs server is - it's just that the volume doesn't know it.
As @msau42 mentioned, I solved this issue using the https://github.com/kubernetes-csi/csi-driver-nfs