This is a serious regression in 1.7 for disk management. We should patch both 1.7 and 1.8 to disable the feature through: LocalStorageCapacityIsolation first.
cc/ @kubernetes/sig-storage-bugs
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
[MILESTONENOTIFIER] Milestone Labels Incomplete
@dashpole @felipejfc @jingxu97
Action required: This issue requires label changes. If the required changes are not made within 2 days, the issue will be moved out of the v1.8 milestone.
priority: Must specify exactly one of priority/critical-urgent
, priority/important-longterm
or priority/important-soon
.
[MILESTONENOTIFIER] Milestone Issue Needs Approval
@dashpole @felipejfc @jingxu97 @kubernetes/sig-node-bugs @kubernetes/sig-storage-bugs
Action required: This issue must have the status/approved-for-milestone
label applied by a SIG maintainer. If the label is not applied within 6 days, the issue will be moved out of the v1.8 milestone.
sig/node
sig/storage
: Issue will be escalated to these SIGs if needed.priority/important-soon
: Escalate to the issue owners and SIG owner; move out of milestone after several unsuccessful escalation attempts.kind/bug
: Fixes a bug discovered during the current release.@dashpole Can we reproduce the issue with Overlay?
I am currently testing with aufs, since that was the first alternate image I could find. I can test with overlay soon.
It appears to work correctly on aufs on ubuntu xenial:
imageFsInfo.CapacityBytes: 20.7 GB, imageFsInfo.AvailableBytes: 1.91 GB
rootFsInfo.CapacityBytes: 20.7 GB, rootFsInfo.AvailableBytes: 1.91 GB
Pod: container-disk-hog-pod
--- summary Container: container-disk-hog-container UsedBytes: 16.9 GB
Pod: innocent-pod
--- summary Container: innocent-container UsedBytes: 12 KB
Looks like overlay on COS 59 (docker 1.11) is fine as well:
imageFsInfo.CapacityBytes: 16.7 GB, imageFsInfo.AvailableBytes: 1.62 GB
rootFsInfo.CapacityBytes: 16.7 GB, rootFsInfo.AvailableBytes: 1.62 GB
Pod: container-disk-hog-pod
--- summary Container: container-disk-hog-container UsedBytes: 13.8 GB
Pod: innocent-pod
--- summary Container: innocent-container UsedBytes: 49 KB
My best guess is that this is an issue with docker 1.13, or an issue with overlay2. Ill do some more testing tomorrow.
@felipejfc @thomas-riccardi can you share your docker storage driver and docker version?
Another takeaway from this bug is that it is confusing to have some eviction signals and thresholds be in the format:
Signal < eviction threshold
and have the allocatable signals be
Signal - eviction threshold < 0
This is indeed confusing, and incoherent with kubelet cli parameters to control eviction, as explained in a previous comment.
As for my docker version and info (I did not change anything from the default there):
$ docker version
Client:
Version: 1.12.6
API version: 1.24
Go version: go1.7.6
Git commit: a82d35e
Built: Wed Sep 20 22:27:13 2017
OS/Arch: linux/amd64
Server:
Version: 1.12.6
API version: 1.24
Go version: go1.7.6
Git commit: a82d35e
Built: Wed Sep 20 22:27:13 2017
OS/Arch: linux/amd64
$ docker info
Containers: 51
Running: 37
Paused: 0
Stopped: 14
Images: 16
Server Version: 1.12.6
Storage Driver: overlay
Backing Filesystem: extfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: null bridge host overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: seccomp selinux
Kernel Version: 4.12.14-coreos
Operating System: Container Linux by CoreOS 1465.8.0 (Ladybug)
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 29.46 GiB
Name: gpu-europe-west-1-v177-minion-1
ID: 4JER:JG7I:KIYV:D36C:HCK3:QVCD:OPST:WX3H:EVGV:VB5F:BDIO:KDKL
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Insecure Registries:
127.0.0.0/8
@dashpole I think we can get the feature gate at here to avoid allocatable feature for local storage
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/cm/container_manager_linux.go#L597
Re: #52336 (comment)
That is what I guess. We had the issue accounting overlay 2's disk usage in cAdvisor. But that doesn't explain overlay issue on container-linux (core os) image reported by @thomas-riccardi above.
@thomas-riccardi strange, I cannot reproduce this on our testing coreos image:
coreos-alpha-1122-0-0-v20160727
I believe this is an issue with stats collection on overlay2
I have tested a fix, and can confirm that it fixes issues found in my testing. However, the cause of @thomas-riccardi's issues is still unknown.
@davidopp, which @thomas-riccardi's issue you are referring in you last comment? Thanks!
Closed #52336 via google/cadvisor#1770.
Maybe reopen for other impacted storage drivers?
See google/cadvisor#1770 (comment)
@dashpole @derekwaynecarr ping: shouldn't we reopen this issue for other impacted storage drivers? cf google/cadvisor#1770 (comment)
@thomas-riccardi I need to take a closer look at overlay
, but at first glance it looks like both overlay and aufs report correct disk stats. See my earlier comment
👍
same problem...
The node was low on resource: nodefs
The node was low on resource: [DiskPressure].
kube version: 1.8.5
docker version: 17.06.2-ce
@KeithTt can you open a new issue? This particular issue was fixed in 1.8.5, but you may have run into something different.
Will this issue also be patched into the 1.7.X branch ?
@dashpole I am sorry I missed your message, I have opened a new issue and cc you.
Is there any fix for this issue. Still facing the same issue with Kubernetes v1.20.4.
"Kubectl is evicting pods throwing failed to release ephemeral-storage and node is under disk pressure."
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.