Collect disk usage for Kubernetes GCE-PD-backed-PersistentVolume

Joar Wandborg

unread,

Jun 29, 2017, 5:25:32 AM6/29/17

to Prometheus Users

I need to collect metrics for PersistentVolumes backed by GCE-PDs in order to find out if the volumes are starting to fill up, in which case I'll manually take action by resizing the GCE PD or removing data.

The metrics I need, per PV/GCE-PD, are:
- Bytes total
- Bytes used (preferrably excluding ext "Reserved blocks")
- GCE-PD name *OR* k8s PV(C)? name *OR* k8s Pod name and path.

I've looked into existing solutions for both prometheus and Stackdriver, but haven't been able to find anything.

I have made https://github.com/joar/disk-usage-exporter, which exports disk usage metrics to prometheus, but this approach has a few issues:
- I can't easily get metrics for PVs in Pods created by helm charts.
- I haven't figured out how to reliably get the PV/PVC name of a mounted path from inside the container.

Is there a better solution? Have i missed something that does what I want and is included by default?

Brian Brazil

unread,

Jun 29, 2017, 5:29:39 AM6/29/17

to Joar Wandborg, Prometheus Users

The node exporter provides filesystem metrics under node_filesystem_*

--

Brian Brazil

www.robustperception.io

Joar Wandborg

unread,

Jun 29, 2017, 6:05:34 AM6/29/17

to Brian Brazil, Prometheus Users

AFAICT node_filesystem_* doesn't catch GCE-PD PVs (GPPs) at all. GPPs are usually named ^/dev/sd[^a]$, and I haven't been able to find any node_filesystem_free metrics looking like a GPP, e.g. max node_filesystem_free is 8GB, while I have at least 3 GPPs over 500GB.

Could GPPs or plain PVs be ignored, either intentionally or unintentionally, by the node-exporter?

P.S. I read your post about `predict_linear()`[0] while researching this :)

[0]: https://www.robustperception.io/reduce-noise-from-disk-space-alerts/

Brian Brazil

unread,

Jun 29, 2017, 6:09:54 AM6/29/17

to Joar Wandborg, Prometheus Users

On 29 June 2017 at 11:05, Joar Wandborg <jo...@wandborg.se> wrote:

AFAICT node_filesystem_* doesn't catch GCE-PD PVs (GPPs) at all. GPPs are usually named ^/dev/sd[^a]$, and I haven't been able to find any node_filesystem_free metrics looking like a GPP, e.g. max node_filesystem_free is 8GB, while I have at least 3 GPPs over 500GB.

sda is a device, not a filesystem. File systems have bytes reserved, used and total, and they need to be mounted for you to get that.

Brian

Could GPPs or plain PVs be ignored, either intentionally or unintentionally, by the node-exporter?

P.S. I read your post about `predict_linear()`[0] while researching this :)

[0]: https://www.robustperception.io/reduce-noise-from-disk-space-alerts/

On Thu, 29 Jun 2017 at 11:29 Brian Brazil <brian.brazil@robustperception.io> wrote:
On 29 June 2017 at 10:25, Joar Wandborg <jo...@wandborg.com> wrote:
I need to collect metrics for PersistentVolumes backed by GCE-PDs in order to find out if the volumes are starting to fill up, in which case I'll manually take action by resizing the GCE PD or removing data.

The metrics I need, per PV/GCE-PD, are:
- Bytes total
- Bytes used (preferrably excluding ext "Reserved blocks")
- GCE-PD name *OR* k8s PV(C)? name *OR* k8s Pod name and path.

I've looked into existing solutions for both prometheus and Stackdriver, but haven't been able to find anything.

I have made https://github.com/joar/disk-usage-exporter, which exports disk usage metrics to prometheus, but this approach has a few issues:
- I can't easily get metrics for PVs in Pods created by helm charts.
- I haven't figured out how to reliably get the PV/PVC name of a mounted path from inside the container.

Is there a better solution? Have i missed something that does what I want and is included by default?

The node exporter provides filesystem metrics under node_filesystem_*

--
Brian Brazil
www.robustperception.io

--

Brian Brazil

www.robustperception.io

Joar Wandborg

unread,

Jun 29, 2017, 6:16:55 AM6/29/17

to Brian Brazil, Prometheus Users

Note the [^a] in sd[^a], i.e. GPPs are named something like /dev/sd[b-z], and you are correct in that you'd usually expect /dev/sda and /dev/sd[b-z] to be devices, however, in the case of GPPs they seem to be filesystems.

From one of my GKE nodes:

# mount -l | grep sdb
/dev/sdb on /var/lib/kubelet/pods/2add5b97-5c20-11e7-ba69-42010af0012c/volumes/kubernetes.io~gce-pd/pvc-b28346bf-5687-11e7-ba69-42010af0012c type ext4 (rw,relatime,data=ordered)

On Thu, 29 Jun 2017 at 12:09 Brian Brazil <brian....@robustperception.io> wrote:

On 29 June 2017 at 11:05, Joar Wandborg <jo...@wandborg.se> wrote:
AFAICT node_filesystem_* doesn't catch GCE-PD PVs (GPPs) at all. GPPs are usually named ^/dev/sd[^a]$, and I haven't been able to find any node_filesystem_free metrics looking like a GPP, e.g. max node_filesystem_free is 8GB, while I have at least 3 GPPs over 500GB.

sda is a device, not a filesystem. File systems have bytes reserved, used and total, and they need to be mounted for you to get that.

Brian

Could GPPs or plain PVs be ignored, either intentionally or unintentionally, by the node-exporter?

P.S. I read your post about `predict_linear()`[0] while researching this :)

[0]: https://www.robustperception.io/reduce-noise-from-disk-space-alerts/

On Thu, 29 Jun 2017 at 11:29 Brian Brazil <brian....@robustperception.io> wrote:
On 29 June 2017 at 10:25, Joar Wandborg <jo...@wandborg.com> wrote:
I need to collect metrics for PersistentVolumes backed by GCE-PDs in order to find out if the volumes are starting to fill up, in which case I'll manually take action by resizing the GCE PD or removing data.

The metrics I need, per PV/GCE-PD, are:
- Bytes total
- Bytes used (preferrably excluding ext "Reserved blocks")
- GCE-PD name *OR* k8s PV(C)? name *OR* k8s Pod name and path.

I've looked into existing solutions for both prometheus and Stackdriver, but haven't been able to find anything.

I have made https://github.com/joar/disk-usage-exporter, which exports disk usage metrics to prometheus, but this approach has a few issues:
- I can't easily get metrics for PVs in Pods created by helm charts.
- I haven't figured out how to reliably get the PV/PVC name of a mounted path from inside the container.

Is there a better solution? Have i missed something that does what I want and is included by default?

The node exporter provides filesystem metrics under node_filesystem_*

--
Brian Brazil
www.robustperception.io

--
Brian Brazil
www.robustperception.io

Brian Brazil

unread,

Jun 29, 2017, 6:33:23 AM6/29/17

to Joar Wandborg, Prometheus Users

On 29 June 2017 at 11:16, Joar Wandborg <jo...@wandborg.se> wrote:

Note the [^a] in sd[^a], i.e. GPPs are named something like /dev/sd[b-z], and you are correct in that you'd usually expect /dev/sda and /dev/sd[b-z] to be devices, however, in the case of GPPs they seem to be filesystems.

From one of my GKE nodes:

# mount -l | grep sdb
/dev/sdb on /var/lib/kubelet/pods/2add5b97-5c20-11e7-ba69-42010af0012c/volumes/kubernetes.io~gce-pd/pvc-b28346bf-5687-11e7-ba69-42010af0012c type ext4 (rw,relatime,data=ordered)

In this case /var/lib/kubelet/pods/2add5b97-5c20-11e7-ba69-42010af0012c/volumes/kubernetes.io~gce-pd/pvc-b28346bf-5687-11e7-ba69-42010af0012c is the filesystem you should look for the stats of.

Brian

On Thu, 29 Jun 2017 at 12:09 Brian Brazil <brian.brazil@robustperception.io> wrote:

On 29 June 2017 at 11:05, Joar Wandborg <jo...@wandborg.se> wrote:
AFAICT node_filesystem_* doesn't catch GCE-PD PVs (GPPs) at all. GPPs are usually named ^/dev/sd[^a]$, and I haven't been able to find any node_filesystem_free metrics looking like a GPP, e.g. max node_filesystem_free is 8GB, while I have at least 3 GPPs over 500GB.

sda is a device, not a filesystem. File systems have bytes reserved, used and total, and they need to be mounted for you to get that.

Brian

Could GPPs or plain PVs be ignored, either intentionally or unintentionally, by the node-exporter?

P.S. I read your post about `predict_linear()`[0] while researching this :)

[0]: https://www.robustperception.io/reduce-noise-from-disk-space-alerts/

On Thu, 29 Jun 2017 at 11:29 Brian Brazil <brian.brazil@robustperception.io> wrote:
On 29 June 2017 at 10:25, Joar Wandborg <jo...@wandborg.com> wrote:
I need to collect metrics for PersistentVolumes backed by GCE-PDs in order to find out if the volumes are starting to fill up, in which case I'll manually take action by resizing the GCE PD or removing data.

The metrics I need, per PV/GCE-PD, are:
- Bytes total
- Bytes used (preferrably excluding ext "Reserved blocks")
- GCE-PD name *OR* k8s PV(C)? name *OR* k8s Pod name and path.

I've looked into existing solutions for both prometheus and Stackdriver, but haven't been able to find anything.

I have made https://github.com/joar/disk-usage-exporter, which exports disk usage metrics to prometheus, but this approach has a few issues:
- I can't easily get metrics for PVs in Pods created by helm charts.
- I haven't figured out how to reliably get the PV/PVC name of a mounted path from inside the container.

Is there a better solution? Have i missed something that does what I want and is included by default?

The node exporter provides filesystem metrics under node_filesystem_*

--
Brian Brazil
www.robustperception.io

--
Brian Brazil
www.robustperception.io

--

Brian Brazil

www.robustperception.io

Joar Wandborg

unread,

Jun 29, 2017, 9:50:16 AM6/29/17

to Brian Brazil, Prometheus Users

Maybe I've explained myself poorly regarding my issues with node_filesystem_*

I'm using:

- GCE-PD-backed Kubernetes-PersistentVolume (PV) provisioned by a Kubernetes StorageClass based on a Kubernetes PersistentVolumeClaim created by a Kubernetes StatefulSet'- `.spec.volumeClaimTemplates`.
- prometheus v1.5.2, via the prometheus helm chart v3.0.2.

I'm trying to find metrics for my PV in node_filesystem_*, as an example I'll be using node_filesystem_free.

The PV is mounted on the Kubernetes Node as /dev/sdb:

# mount -l | grep sdb
/dev/sdb on /var/lib/kubelet/pods/2add5b97-5c20-11e7-ba69-42010af0012c/volumes/kubernetes.io~gce-pd/pvc-b28346bf-5687-11e7-ba69-42010af0012c type ext4 (rw,relatime,data=ordered)

/dev/sdb can't be seen when running `df` as non-superuser
joar@my-node ~ $ df
Filesystem     1K-blocks     Used Available Use% Mounted on
/dev/root        1249792   423556    826236 34% /
devtmpfs        13379192        0 13379192   0% /dev
tmp             13380112       24 13380088   1% /tmp
run             13380112     1800 13378312   1% /run
shmfs           13380112        0 13380112   0% /dev/shm
/dev/sda1       98884832 24000500 74867948 25% /var
/dev/sda8          11760       28     11408   1% /usr/share/oem
tmpfs           13380112        0 13380112   0% /sys/fs/cgroup
tmpfs                256        0       256   0% /mnt/disks
tmpfs               1024      124       900 13% /var/lib/cloud
overlayfs           1024      144       880 15% /etc
/dev/sdb can be seen when running df as root
# df
Filesystem      1K-blocks      Used Available Use% Mounted on
/dev/root         1249792    423556    826236 34% /
devtmpfs         13379192         0 13379192   0% /dev
tmp              13380112        24 13380088   1% /tmp
run              13380112      1804 13378308   1% /run
shmfs            13380112         0 13380112   0% /dev/shm
/dev/sda1        98884832 24000624 74867824 25% /var
/dev/sda8           11760        28     11408   1% /usr/share/oem
tmpfs            13380112         0 13380112   0% /sys/fs/cgroup
tmpfs                 256         0       256   0% /mnt/disks
tmpfs                1024       124       900 13% /var/lib/cloud
overlayfs            1024       144       880 15% /etc
tmpfs            13380112        12 13380100   1% /var/lib/kubelet/pods/c59101d1-3587-11e7-93c4-42010af0012c/volumes/kubernetes.io~secret/default-token-plcbs
tmpfs            13380112        12 13380100   1% /var/lib/kubelet/pods/9689b7f0-45e7-11e7-ba69-42010af0012c/volumes/kubernetes.io~secret/elasticsearch-token-qwgk5
tmpfs            13380112        12 13380100   1% /var/lib/kubelet/pods/b739d477-45e7-11e7-ba69-42010af0012c/volumes/kubernetes.io~secret/default-token-q3932
tmpfs            13380112        12 13380100   1% /var/lib/kubelet/pods/c19122ce-45e7-11e7-ba69-42010af0012c/volumes/kubernetes.io~secret/default-token-q3932
tmpfs            13380112        12 13380100   1% /var/lib/kubelet/pods/cb271152-45e7-11e7-ba69-42010af0012c/volumes/kubernetes.io~secret/default-token-q3932
tmpfs            13380112        12 13380100   1% /var/lib/kubelet/pods/d3fcc248-45e7-11e7-ba69-42010af0012c/volumes/kubernetes.io~secret/elasticsearch-token-qwgk5
tmpfs            13380112        12 13380100   1% /var/lib/kubelet/pods/4bb9d022-5a63-11e7-ba69-42010af0012c/volumes/kubernetes.io~secret/default-token-q3932
/dev/sdd        515010816    199860 488580172   1% /var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/gke-my-cluster-6d98ef61-dyn-pvc-4bb92cb4-5a63-11e7-ba69-42010af0012c
tmpfs            13380112        12 13380100   1% /var/lib/kubelet/pods/462c0c89-5a6d-11e7-ba69-42010af0012c/volumes/kubernetes.io~secret/default-token-q3932
tmpfs            13380112        16 13380096   1% /var/lib/kubelet/pods/2add5b97-5c20-11e7-ba69-42010af0012c/volumes/kubernetes.io~secret/patroni-secrets
tmpfs            13380112        12 13380100   1% /var/lib/kubelet/pods/2add5b97-5c20-11e7-ba69-42010af0012c/volumes/kubernetes.io~secret/default-token-q3932
/dev/sdb       1055840692 474273492 527863728 48% /var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/gke-my-cluster-6d98ef61-dyn-pvc-b28346bf-5687-11e7-ba69-42010af0012c
tmpfs            13380112        76 13380036   1% /var/lib/kubelet/pods/54b31628-5c34-11e7-8bed-42010af0005c/volumes/kubernetes.io~secret/foo-app-secrets
tmpfs            13380112        12 13380100   1% /var/lib/kubelet/pods/54b31628-5c34-11e7-8bed-42010af0005c/volumes/kubernetes.io~secret/default-token-q3932
AFAICT all node_filesystem_free metrics have at least the device, mountpount, fstype labels.
I can find metrics labeled device="/dev/sda1", e.g.
node_filesystem_free{app="prometheus",chart="prometheus-3.0.2",component="node-exporter",device="/dev/sda1",fstype="ext4",heritage="Tiller",instance="10.132.0.5:9100",job="kubernetes-service-endpoints",kubernetes_name="my-umbrella-chart-prometheus-node-exporter",kubernetes_namespace="default",mountpoint="/etc/resolv.conf",release="my-umbrella-chart"}
as well as metrics for cgroups, debugfs, etc
I can't find metrics labeled device="/dev/sdb".

It occurs to me now that this could be a root/non-root issue.

Brian Brazil

unread,

Jun 29, 2017, 10:11:24 AM6/29/17

to Joar Wandborg, Prometheus Users

Yes, you need to be able to access the filesystem mount point in order to call statfs().

--

Brian Brazil

www.robustperception.io

Joar Wandborg

unread,

Jul 6, 2017, 9:15:55 AM7/6/17

to Prometheus Users, jo...@wandborg.se

I have made some changes to disk-usage-exporter, it will now run as a DaemonSet with a privileged container that exports usage metrics for all PersistentVolumes attached to all Kubernetes Nodes.

I renamed it to k8s-pv-disk-usage-exporter, and it's available at https://github.com/joar/k8s-pv-disk-usage-exporter with a DaemonSet and prometheus job example at https://github.com/joar/k8s-pv-disk-usage-exporter#deploy-as-daemonset

Reply all

Reply to author

Forward