node_exporter - non root user can't see all disk with node_filesystem_files metric

876 views
Skip to first unread message

Soph N

unread,
Aug 14, 2021, 12:38:38 AM8/14/21
to Prometheus Users
Hello everyone,

I am struggling to identify what is the permission issue that forces me to run node_exporter as root instead of its own user.

Here is my issue.

df shows me the two disks /dev/nmve0n1p1 and  /dev/nmve1n1p1
df -Th
Filesystem     Type      Size  Used Avail Use% Mounted on
devtmpfs       devtmpfs  7.6G     0  7.6G   0% /dev
tmpfs          tmpfs     7.7G     0  7.7G   0% /dev/shm
tmpfs          tmpfs     7.7G  476K  7.7G   1% /run
tmpfs          tmpfs     7.7G     0  7.7G   0% /sys/fs/cgroup
/dev/nvme0n1p1 xfs        30G  1.9G   29G   7% /
/dev/nvme1n1p1 ext4      184G  168G  6.2G  97% /home/ec2-user/data
tmpfs          tmpfs     1.6G     0  1.6G   0% /run/user/1000

however the node exporter metric node_filesystem_files will not show me both
curl -s "http://localhost:9100/metrics" | grep "node_filesystem_files"
# HELP node_filesystem_files Filesystem total file nodes.
# TYPE node_filesystem_files gauge
node_filesystem_files{device="/dev/nvme0n1p1",fstype="xfs",mountpoint="/"} 1.57276e+07
node_filesystem_files{device="tmpfs",fstype="tmpfs",mountpoint="/run"} 1.993667e+06
node_filesystem_files{device="tmpfs",fstype="tmpfs",mountpoint="/run/user/1000"} 1.993667e+06
# HELP node_filesystem_files_free Filesystem total free file nodes.
# TYPE node_filesystem_files_free gauge
node_filesystem_files_free{device="/dev/nvme0n1p1",fstype="xfs",mountpoint="/"} 1.5679285e+07
node_filesystem_files_free{device="tmpfs",fstype="tmpfs",mountpoint="/run"} 1.993227e+06
node_filesystem_files_free{device="tmpfs",fstype="tmpfs",mountpoint="/run/user/1000"} 1.993666e+06

It will only work if I run the node_exporter as root
curl -s "http://localhost:9100/metrics" | grep "node_filesystem_files"
# HELP node_filesystem_files Filesystem total file nodes.
# TYPE node_filesystem_files gauge
node_filesystem_files{device="/dev/nvme0n1p1",fstype="xfs",mountpoint="/"} 1.57276e+07
node_filesystem_files{device="/dev/nvme1n1p1",fstype="ext4",mountpoint="/home/ec2-user/data"} 1.2214272e+07
node_filesystem_files{device="tmpfs",fstype="tmpfs",mountpoint="/run"} 1.993667e+06
node_filesystem_files{device="tmpfs",fstype="tmpfs",mountpoint="/run/user/1000"} 1.993667e+06
# HELP node_filesystem_files_free Filesystem total free file nodes.
# TYPE node_filesystem_files_free gauge
node_filesystem_files_free{device="/dev/nvme0n1p1",fstype="xfs",mountpoint="/"} 1.5679285e+07
node_filesystem_files_free{device="/dev/nvme1n1p1",fstype="ext4",mountpoint="/home/ec2-user/data"} 1.2124279e+07
node_filesystem_files_free{device="tmpfs",fstype="tmpfs",mountpoint="/run"} 1.993227e+06
node_filesystem_files_free{device="tmpfs",fstype="tmpfs",mountpoint="/run/user/1000"} 1.993666e+06

few more detail hope that will help with the issue :

my /etc/fstab doesn't have the mount information of the second disk however /proc/mount has it, can this be the issue ? 

Thanks in advance for the help,
Soph
 

Brian Candler

unread,
Aug 14, 2021, 8:22:05 AM8/14/21
to Prometheus Users
I presume you're running the latest version 1.2.2?

Maybe strace will help you understand what's going on:

strace -f -s128 -p <pid-of-node-exporter> 2>strace.out

You can see which files it's trying to access and whether it's getting permission errors.

FWIW, I just tried running node_exporter as root and normal users, and I got 3906 and 3900 lines of metrics respectively.  The only ones which were missing were:

# HELP node_rapl_core_joules_total Current RAPL core value in joules
# TYPE node_rapl_core_joules_total counter
node_rapl_core_joules_total{index="0"} 79345.889568
# HELP node_rapl_package_joules_total Current RAPL package value in joules
# TYPE node_rapl_package_joules_total counter
node_rapl_package_joules_total{index="0"} 129630.134976

Soph N

unread,
Aug 15, 2021, 11:51:55 PM8/15/21
to Prometheus Users
Hi Brian,

I am running 1.2.2 yes.

Regarding strace, would you have any keyword i could grep to identify the permission issue ? the output is enormous and though i can see some error when i grep on "file" i am not sure those would explain why I receive metric for one disk and not the other one.

here is an example of the output : 
[pid 26439] openat(AT_FDCWD, "/sys/devices/system/cpu/cpu13/thermal_throttle/core_throttle_count", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 26439] openat(AT_FDCWD, "/sys/devices/system/cpu/cpu13/thermal_throttle/package_throttle_count", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 26439] <... openat resumed> )      = -1 ENOENT (No such file or directory)
[pid 26439] openat(AT_FDCWD, "/sys/devices/system/cpu/cpu14/thermal_throttle/package_throttle_count", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 26439] openat(AT_FDCWD, "/sys/devices/system/cpu/cpu15/thermal_throttle/core_throttle_count", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 26439] <... openat resumed> )      = -1 ENOENT (No such file or directory)
[pid 26439] openat(AT_FDCWD, "/sys/devices/system/cpu/cpu2/thermal_throttle/core_throttle_count", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 26439] <... openat resumed> )      = -1 ENOENT (No such file or directory)
[pid 26439] openat(AT_FDCWD, "/sys/devices/system/cpu/cpu3/thermal_throttle/core_throttle_count", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 26439] <... openat resumed> )      = -1 ENOENT (No such file or directory)
[pid 26439] <... openat resumed> )      = -1 ENOENT (No such file or directory)
[pid 26439] openat(AT_FDCWD, "/sys/devices/system/cpu/cpu4/thermal_throttle/package_throttle_count", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 26439] openat(AT_FDCWD, "/sys/devices/system/cpu/cpu5/thermal_throttle/core_throttle_count", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 26439] <... openat resumed> )      = -1 ENOENT (No such file or directory)
[pid 26439] openat(AT_FDCWD, "/sys/devices/system/cpu/cpu6/thermal_throttle/core_throttle_count", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 26439] <... openat resumed> )      = -1 ENOENT (No such file or directory)
[pid 26439] openat(AT_FDCWD, "/sys/devices/system/cpu/cpu7/thermal_throttle/core_throttle_count", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 26439] openat(AT_FDCWD, "/sys/devices/system/cpu/cpu7/thermal_throttle/package_throttle_count", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 26439] openat(AT_FDCWD, "/sys/devices/system/cpu/cpu8/thermal_throttle/core_throttle_count", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 26439] openat(AT_FDCWD, "/sys/devices/system/cpu/cpu8/thermal_throttle/package_throttle_count", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 26439] openat(AT_FDCWD, "/sys/devices/system/cpu/cpu9/thermal_throttle/core_throttle_count", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 26439] openat(AT_FDCWD, "/sys/devices/system/cpu/cpu9/thermal_throttle/package_throttle_count", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 26434] <... openat resumed> )      = -1 ENOENT (No such file or directory)
[pid 26441] <... openat resumed> )      = -1 ENOENT (No such file or directory)
[pid 26438] <... openat resumed> )      = -1 ENOENT (No such file or directory)
[pid 26445] <... openat resumed> )      = -1 ENOENT (No such file or directory)
[pid 26438] <... openat resumed> )      = -1 ENOENT (No such file or directory)
[pid 26438] <... openat resumed> )      = -1 ENOENT (No such file or directory)
[pid 26434] <... openat resumed> )      = -1 ENOENT (No such file or directory)
[pid 26433] <... openat resumed> )      = -1 ENOENT (No such file or directory)
[pid 26445] <... newfstatat resumed> 0xc000606858, 0) = -1 ENOENT (No such file or directory)

also as explained, it just can't get the info on a give disk.

Brian Candler

unread,
Aug 16, 2021, 2:26:17 AM8/16/21
to Prometheus Users
I would be grepping for /dev/nvme0n1p1, /dev/nvme1n1p1 and /home/ec2-user/data

A thought: does the user that you're running node_exporter as, have permissions to access /home/ec2-user/data?   Try:

sudo -u <username> ls -l /home/ec2-user/data

Soph N

unread,
Aug 19, 2021, 3:27:17 AM8/19/21
to Prometheus Users
Hi Brian,

grep on /home/ec2-user/data gave me a permission denied as what we were expecting, see the logs:

[pid 26449] statfs("/home/ec2-user/data",  <unfinished ...>
[pid 26452] <... fstat resumed> {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
[pid 26449] <... statfs resumed> 0xc00010f5e8) = -1 EACCES (Permission denied)

However, even after doing chmod 777 on /home/ec2-user/data I still have the issue. 

now  I am wondering what permission and where should I set for this to work.

Thank for the help,

Michael Ströder

unread,
Aug 19, 2021, 3:39:46 AM8/19/21
to Prometheus Users
On 8/19/21 9:27 AM, Soph N wrote:
> grep on /home/ec2-user/data gave me a permission denied as what we were
> expecting, see the logs:
>
> [pid 26449] statfs("/home/ec2-user/data",  <unfinished ...>
> [pid 26452] <... fstat resumed> {st_mode=S_IFREG|0444, st_size=4096,
> ...}) = 0
> [pid 26449] <... statfs resumed> 0xc00010f5e8) = -1 EACCES (Permission
> denied)
>
> However, even after doing chmod 777 on /home/ec2-user/data I still have
> the issue. 
>
> now  I am wondering what permission and where should I set for this to work.

SELinux?
AppArmor?
systemd sand-boxing options?

Ciao, Michael.

Ben Kochie

unread,
Aug 19, 2021, 3:57:41 AM8/19/21
to Brian Candler, Prometheus Users

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/582e7741-9a88-48cd-8c2a-2256678f8f8an%40googlegroups.com.

Brian Candler

unread,
Aug 19, 2021, 7:53:26 AM8/19/21
to Prometheus Users
> However, even after doing chmod 777 on /home/ec2-user/data I still have the issue. 

What about the parent directories: /home and /home/ec2-user ?

If it's not that, then as Michael said, SELinux/AppArmor etc.  Have a look in the output of "dmesg" to see if there are any logs generated when your non-root user tries to access that directory.

Also: you're not running node_exporter in any sort of container are you?  (docker, lxc/lxd etc)

Soph N

unread,
Aug 25, 2021, 10:17:20 AM8/25/21
to Prometheus Users
All disabled or not used

Soph N

unread,
Aug 25, 2021, 10:25:46 AM8/25/21
to Prometheus Users
node_exporter is not running in a container. 
dmesg show me something interesting :
[ 4789.349058]  nvme1n1: p1
[ 4789.413067]  nvme1n1: p1
[ 5302.321060] EXT4-fs (nvme1n1p1): mounted filesystem with ordered data mode. Opts: (null)
[ 5372.678396] EXT4-fs (nvme1n1p1): mounted filesystem with ordered data mode. Opts: (null)
[ 5426.580451] EXT4-fs (nvme1n1p1): mounted filesystem with ordered data mode. Opts: (null)
[ 5560.363303] EXT4-fs (nvme1n1p1): Unrecognized mount option "uid=1000" or missing value
[ 5585.173389] EXT4-fs (nvme1n1p1): Unrecognized mount option "uid=1000" or missing value
[ 5743.279786] EXT4-fs (nvme1n1p1): Unrecognized mount option "uid=1000" or missing value
[14960.754419] EXT4-fs (nvme1n1p1): mounted filesystem with ordered data mode. Opts: (null)

so what i realized after that is that the disk is mounted directly to /home/ec2-user/data
df -Th | grep nvme
/dev/nvme0n1p1 xfs        30G   18G   13G  58% /
/dev/nvme1n1p1 ext4      367G  195G  154G  56% /home/ec2-user/data

with my /etc/fstab containing
/dev/nvme1n1p1 /home/ec2-user/data ext4 defaults 0 0

I have other machine that run with the same setup but having its own directory ie /data instead of /home/ec2-user/data and that works without the need of being root.

I can try to move the partition and everything to /data and retry but would have been nice to understand what is happening in this setup.

Thanks


Reply all
Reply to author
Forward
0 new messages