| We are seeing puppet run failures on some Solaris 10 hosts since we upgraded the agent from 6.26.0. The problem seems to come from Facter code exhausting the available file descriptors, mostly on systems with more than about 13 network interfaces. On Solaris 10, the default max file descriptors for a process is 256. From looking at the output of truss, it seems there is some file descriptor leakage in the facter code (not in our facts, same thing when all facts that come from any puppet module are deleted). Watching the truss output the open descriptors increases while iterating over the network interfaces, but never seem to go down The most common error we see is: [2023-03-30 14:56:27.378715 ] ERROR Facter::Resolvers::Solaris::Networking - can't modify frozen String: "" But it’s eventually accompanied by a “too many open files” error. I have noticed that there are some cases where Facter is getting an error from the kernel ("too many open files" - EMFILE), but doesn't report anything as a result. If a host has more than about 15 network interfaces (incl lo0), then it will go over the 256 file descriptor limit. Here I've deleted all the non-default facts from the host, but the problem shows up even more if they are present:
- Delete any facts from puppet modules:
denga01l:~# rm -rf /opt/puppetlabs/puppet/cache/lib/facter/*
- NOTE: only 3 interfaces
denga01l:~# ifconfig -a | grep '^[^ ]' lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1 vnet0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2 vnet1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
- Max open files:
denga01l:~# ulimit -n 256
- No errors on run:
denga01l:~# facter -p > /dev/null denga01l:~#
- Now, add some virtual interfaces
denga01l:~# for i in {1..15}; do ifconfig vnet1:$i plumb; done denga01l:~# ifconfig -a | grep '^[^ ]' | wc -l 23
- And we get errors on run
denga01l:~# facter -p > /dev/null [2023-04-04 14:27:05.611830 ] ERROR Facter::Resolvers::Solaris::Networking - can't modify frozen String: "" [2023-04-04 14:27:05.660223 ] ERROR Facter::Resolvers::Solaris::Networking - can't modify frozen String: ""
On further investigation, using the truss command, we can see many system calls that open new file descriptors, getting closer and closer to the 256 mark, and eventually reaches the limit. The kernel then returns EMFILE ("too many open files"), which is when the above errors are printed denga01l:~# truss -t open,openat,so_socket,pipe facter -p > /dev/null open("/var/ld/ld.config", O_RDONLY) Err#2 ENOENT [...] /1: pipe() = 243 [244] /1: pipe() = 245 [246] /1: pipe() = 247 [248] /1: pipe() = 249 [250] /1: Received signal #18, SIGCLD, in pollsys() [caught] /1: siginfo: SIGCLD CLD_EXITED pid=29802 status=0x0002 /1: so_socket(PF_INET, SOCK_DGRAM, IPPROTO_IP, "", SOV_DEFAULT) = 243 /1: so_socket(PF_INET, SOCK_DGRAM, IPPROTO_IP, "", SOV_DEFAULT) = 244 /1: so_socket(PF_INET, SOCK_DGRAM, IPPROTO_IP, "", SOV_DEFAULT) = 245 /1: pipe() = 246 [247] /1: pipe() = 248 [249] /1: pipe() = 250 [251] /1: pipe() = 252 [253] /1: Received signal #18, SIGCLD, in pollsys() [caught] /1: siginfo: SIGCLD CLD_EXITED pid=29803 status=0x0002 /1: so_socket(PF_INET, SOCK_DGRAM, IPPROTO_IP, "", SOV_DEFAULT) = 246 /1: so_socket(PF_INET, SOCK_DGRAM, IPPROTO_IP, "", SOV_DEFAULT) = 247 /1: so_socket(PF_INET, SOCK_DGRAM, IPPROTO_IP, "", SOV_DEFAULT) = 248 /1: pipe() = 249 [250] /1: pipe() = 251 [252] /1: pipe() = 253 [254] /1: pipe() Err#24 EMFILE /1: pipe() Err#24 EMFILE /1: open("/var/run/tzsync", O_RDONLY) = 120 /1: open("/usr/share/lib/zoneinfo/Australia/Victoria", O_RDONLY) = 120 [2023-04-04 14:35:08.302043 ] ERROR Facter::Resolvers::Solaris::Networking - can't modify frozen String: "" /1: open64("/opt/puppetlabs/puppet/VERSION", O_RDONLY) = 120 /1: pipe() = 120 [121] /1: pipe() = 122 [123] /1: pipe() = 124 [125] /1: pipe() Err#24 EMFILE /1: pipe() Err#24 EMFILE /1: pipe() = 249 [250] /1: pipe() = 251 [252] /1: pipe() = 253 [254] I've attached the output of these truss commands: denga01l:~# truss -o /tmp/few_network_interfaces.truss.out -t open,openat,so_socket,pipe facter -p > /dev/null
- Note highest file descriptor number:
denga01l:~# grep -v 'ENOENT' /tmp/few_network_interfaces.truss.out | awk '{print $NF}' | tr -d '[]' | sort -n | tail -1 76
denga01l:~# for i in {1..20}; do ifconfig vnet1:$i plumb; done denga01l:~# truss -o /tmp/many_network_interfaces.truss.out -t open,openat,so_socket,pipe facter -p > /dev/null [2023-04-04 14:39:00.768956 ] ERROR Facter::Resolvers::Solaris::Networking - can't modify frozen String: "" [2023-04-04 14:39:00.818385 ] ERROR Facter::Resolvers::Solaris::Networking - can't modify frozen String: ""
- Note highest file descriptor number:
grep -v 'ENOENT' /tmp/many_network_interfaces.truss.out | awk '{print $NF}' | tr -d '[]' | sort -n | tail -1 255
And this one, from puppet agent 6.23.0: denga01l:~# truss -o /tmp/many_network_interfaces.6.23.0.truss.out -t open,openat,so_socket,pipe facter -p > /dev/null 2023-04-04 14:47:21.180639 WARN puppetlabs.facter - skipping external facts for "/opt/puppetlabs/puppet/cache/facts.d": No such file or directory
- Note highest file descriptor number:
denga01l:~# grep -v 'ENOENT' /tmp/many_network_interfaces.6.23.0.truss.out | awk '{print $NF}' | tr -d '[]' | sort -n | tail -1 13
I note that when running the facter command from agent version 6.23.0, the highest file descriptor allocated is #13 |