Jira (FACT-3196) Facter on Solaris 10 seems to leak file descriptors, causing puppet run failures

57 views
Skip to first unread message

Vishal Kharade (Jira)

unread,
Apr 4, 2023, 11:14:02 PM4/4/23
to puppe...@googlegroups.com
Vishal Kharade created an issue
 
Facter / Bug FACT-3196
Facter on Solaris 10 seems to leak file descriptors, causing puppet run failures
Issue Type: Bug Bug
Assignee: Unassigned
Created: 2023/04/04 8:13 PM
Priority: Normal Normal
Reporter: Vishal Kharade

We are seeing puppet run failures on some Solaris 10 hosts since we upgraded the agent from 6.26.0. The problem seems to come from Facter code exhausting the available file descriptors, mostly on systems with more than about 13 network interfaces.

On Solaris 10, the default max file descriptors for a process is 256. From looking at the output of truss, it seems there is some file descriptor leakage in the facter code (not in our facts, same thing when all facts that come from any puppet module are deleted).    Watching the truss output the open descriptors increases while iterating over the network interfaces, but never seem to go down

The most common error we see is:

[2023-03-30 14:56:27.378715 ] ERROR Facter::Resolvers::Solaris::Networking - can't modify frozen String: ""

But it’s eventually accompanied by a “too many open files” error.

I have noticed that there are some cases where Facter is getting an error from the kernel ("too many open files" - EMFILE), but doesn't report anything as a result.

If a host has more than about 15 network interfaces (incl lo0), then it will go over the 256 file descriptor limit. Here I've deleted all the non-default facts from the host, but the problem shows up even more if they are present:

  1. Delete any facts from puppet modules:
    denga01l:~# rm -rf /opt/puppetlabs/puppet/cache/lib/facter/*
  2. NOTE: only 3 interfaces
    denga01l:~# ifconfig -a | grep '^[^     ]'
    lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
    vnet0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
    vnet1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3 
  3. Max open files:
    denga01l:~# ulimit -n
    256
  4. No errors on run:
    denga01l:~# facter -p > /dev/null
    denga01l:~#
  1. Now, add some virtual interfaces
    denga01l:~# for i in {1..15}; do ifconfig vnet1:$i plumb; done
    denga01l:~# ifconfig -a | grep '^[^     ]' | wc -l
          23
  2. And we get errors on run
    denga01l:~# facter -p > /dev/null
    [2023-04-04 14:27:05.611830 ] ERROR Facter::Resolvers::Solaris::Networking - can't modify frozen String: ""
    [2023-04-04 14:27:05.660223 ] ERROR Facter::Resolvers::Solaris::Networking - can't modify frozen String: ""

On further investigation, using the truss command, we can see many system calls that open new file descriptors, getting closer and closer to the 256 mark, and eventually reaches the limit. The kernel then returns EMFILE ("too many open files"), which is when the above errors are printed

denga01l:~# truss -t open,openat,so_socket,pipe facter -p > /dev/null
open("/var/ld/ld.config", O_RDONLY)             Err#2 ENOENT
[...]
/1:     pipe()                                          = 243 [244]
/1:     pipe()                                          = 245 [246]
/1:     pipe()                                          = 247 [248]
/1:     pipe()                                          = 249 [250]
/1:         Received signal #18, SIGCLD, in pollsys() [caught]
/1:           siginfo: SIGCLD CLD_EXITED pid=29802 status=0x0002
/1:     so_socket(PF_INET, SOCK_DGRAM, IPPROTO_IP, "", SOV_DEFAULT) = 243
/1:     so_socket(PF_INET, SOCK_DGRAM, IPPROTO_IP, "", SOV_DEFAULT) = 244
/1:     so_socket(PF_INET, SOCK_DGRAM, IPPROTO_IP, "", SOV_DEFAULT) = 245
/1:     pipe()                                          = 246 [247]
/1:     pipe()                                          = 248 [249]
/1:     pipe()                                          = 250 [251]
/1:     pipe()                                          = 252 [253]
/1:         Received signal #18, SIGCLD, in pollsys() [caught]
/1:           siginfo: SIGCLD CLD_EXITED pid=29803 status=0x0002
/1:     so_socket(PF_INET, SOCK_DGRAM, IPPROTO_IP, "", SOV_DEFAULT) = 246
/1:     so_socket(PF_INET, SOCK_DGRAM, IPPROTO_IP, "", SOV_DEFAULT) = 247
/1:     so_socket(PF_INET, SOCK_DGRAM, IPPROTO_IP, "", SOV_DEFAULT) = 248
/1:     pipe()                                          = 249 [250]
/1:     pipe()                                          = 251 [252]
/1:     pipe()                                          = 253 [254]
/1:     pipe()                                          Err#24 EMFILE
/1:     pipe()                                          Err#24 EMFILE
/1:     open("/var/run/tzsync", O_RDONLY)               = 120
/1:     open("/usr/share/lib/zoneinfo/Australia/Victoria", O_RDONLY) = 120
[2023-04-04 14:35:08.302043 ] ERROR Facter::Resolvers::Solaris::Networking - can't modify frozen String: ""
/1:     open64("/opt/puppetlabs/puppet/VERSION", O_RDONLY) = 120
/1:     pipe()                                          = 120 [121]
/1:     pipe()                                          = 122 [123]
/1:     pipe()                                          = 124 [125]
/1:     pipe()                                          Err#24 EMFILE
/1:     pipe()                                          Err#24 EMFILE
/1:     pipe()                                          = 249 [250]
/1:     pipe()                                          = 251 [252]
/1:     pipe()                                          = 253 [254]

I've attached the output of these truss commands:

denga01l:~# truss -o /tmp/few_network_interfaces.truss.out -t open,openat,so_socket,pipe facter -p > /dev/null

  1. Note highest file descriptor number:
    denga01l:~# grep -v 'ENOENT' /tmp/few_network_interfaces.truss.out | awk '{print $NF}' | tr -d '[]' | sort -n | tail -1
    76

denga01l:~# for i in {1..20}; do ifconfig vnet1:$i plumb; done
denga01l:~# truss -o /tmp/many_network_interfaces.truss.out -t open,openat,so_socket,pipe facter -p > /dev/null
[2023-04-04 14:39:00.768956 ] ERROR Facter::Resolvers::Solaris::Networking - can't modify frozen String: ""
[2023-04-04 14:39:00.818385 ] ERROR Facter::Resolvers::Solaris::Networking - can't modify frozen String: ""

  1. Note highest file descriptor number:
    grep -v 'ENOENT' /tmp/many_network_interfaces.truss.out | awk '{print $NF}' | tr -d '[]' | sort -n | tail -1
    255

And this one, from puppet agent 6.23.0:

denga01l:~# truss -o /tmp/many_network_interfaces.6.23.0.truss.out -t open,openat,so_socket,pipe facter -p > /dev/null
2023-04-04 14:47:21.180639 WARN  puppetlabs.facter - skipping external facts for "/opt/puppetlabs/puppet/cache/facts.d": No such file or directory

  1. Note highest file descriptor number:
    denga01l:~# grep -v 'ENOENT' /tmp/many_network_interfaces.6.23.0.truss.out | awk '{print $NF}' | tr -d '[]' | sort -n | tail -1
    13

I note that when running the facter command from agent version 6.23.0, the highest file descriptor allocated is #13

Add Comment Add Comment
 
This message was sent by Atlassian Jira (v8.20.11#820011-sha1:0629dd8)
Atlassian logo

Vishal Kharade (Jira)

unread,
Apr 4, 2023, 11:19:01 PM4/4/23
to puppe...@googlegroups.com

Vishal Kharade (Jira)

unread,
Apr 4, 2023, 11:25:01 PM4/4/23
to puppe...@googlegroups.com

Vishal Kharade (Jira)

unread,
Apr 5, 2023, 12:00:02 AM4/5/23
to puppe...@googlegroups.com

Vishal Kharade (Jira)

unread,
Apr 5, 2023, 12:00:03 AM4/5/23
to puppe...@googlegroups.com

Vishal Kharade (Jira)

unread,
Apr 5, 2023, 12:02:01 AM4/5/23
to puppe...@googlegroups.com
Vishal Kharade updated an issue
Change By: Vishal Kharade
Zendesk Ticket Count: 1
Zendesk Ticket IDs: 51618

Vishal Kharade (Jira)

unread,
Apr 5, 2023, 12:02:02 AM4/5/23
to puppe...@googlegroups.com

Charmaine Pritchett (Jira)

unread,
Apr 5, 2023, 12:07:01 AM4/5/23
to puppe...@googlegroups.com

Charmaine Pritchett (Jira)

unread,
Apr 5, 2023, 12:07:02 AM4/5/23
to puppe...@googlegroups.com

Vishal Kharade (Jira)

unread,
Apr 5, 2023, 6:48:01 AM4/5/23
to puppe...@googlegroups.com

Vishal Kharade (Jira)

unread,
Apr 5, 2023, 7:11:02 AM4/5/23
to puppe...@googlegroups.com

Vishal Kharade (Jira)

unread,
Apr 5, 2023, 12:39:02 PM4/5/23
to puppe...@googlegroups.com

Josh Cooper (Jira)

unread,
Apr 10, 2023, 1:58:03 PM4/10/23
to puppe...@googlegroups.com
Josh Cooper commented on Bug FACT-3196
 
Re: Facter on Solaris 10 seems to leak file descriptors, causing puppet run failures

I ran reproduce locally:

$ bundle exec rake ci:test:setup SHA=7.24.0 HOSTS=solaris10-64a
$ ssh ...
-bash-3.2# ulimit -n 256
-bash-3.2# /opt/puppetlabs/puppet/bin/facter networking > /dev/null
-bash-3.2# ulimit -n 200
-bash-3.2# /opt/puppetlabs/puppet/bin/facter networking > /dev/null
[2023-04-10 10:50:40.385688 ] ERROR Facter::Resolvers::Solaris::Networking - can't modify frozen String: "" 

The problem is that facter does not close these socket descriptors:

-bash-3.2# truss -o truss.out -t open,openat,close,so_socket,pipe /opt/puppetlabs/puppet/bin/facter networking
-bash-3.2# export TERM=vt100
-bash-3.2# more truss.out 
...
/1:     so_socket(PF_INET, SOCK_DGRAM, IPPROTO_IP, "", SOV_DEFAULT) = 12
/1:     so_socket(PF_INET, SOCK_DGRAM, IPPROTO_IP, "", SOV_DEFAULT) = 13
/1:     so_socket(PF_INET, SOCK_DGRAM, IPPROTO_IP, "", SOV_DEFAULT) = 14
/1:     pipe()                                          = 15 [16]
/1:     pipe()                                          = 17 [18]
/1:     pipe()                                          = 19 [20]
/1:     pipe()                                          = 21 [22]
/1:     close(22)                                       = 0
/1:     close(21)                                       = 0
/1:     close(15)                                       = 0
/1:     close(18)                                       = 0
/1:     close(20)                                       = 0
/1:     close(16)                                       = 0
/1:     close(17)                                       = 0
/1:     close(19)                                       = 0

Note descriptors 12, 13, 14 are never closed. The problem occurs here https://github.com/puppetlabs/facter/blob/21917020bc48c612b490911e904f5dbddf7e6c3b/lib/facter/resolvers/solaris/ffi/functions.rb#L17-L19. If ioctl_base raises, then the descriptor is not closed.

Josh Cooper (Jira)

unread,
Apr 10, 2023, 7:36:01 PM4/10/23
to puppe...@googlegroups.com

Josh Cooper (Jira)

unread,
Apr 10, 2023, 7:36:01 PM4/10/23
to puppe...@googlegroups.com

Josh Cooper (Jira)

unread,
Apr 26, 2023, 6:54:02 PM4/26/23
to puppe...@googlegroups.com

Josh Cooper (Jira)

unread,
May 10, 2023, 12:54:01 PM5/10/23
to puppe...@googlegroups.com

Josh Cooper (Jira)

unread,
May 10, 2023, 12:54:01 PM5/10/23
to puppe...@googlegroups.com

Josh Cooper (Jira)

unread,
May 18, 2023, 3:44:01 PM5/18/23
to puppe...@googlegroups.com
Josh Cooper updated an issue
Change By: Josh Cooper
Sprint: Phoenix 2023-06-07
This message was sent by Atlassian Jira (v8.20.21#820021-sha1:38274c8)
Atlassian logo

Christopher Thorn (Jira)

unread,
May 22, 2023, 11:22:01 AM5/22/23
to puppe...@googlegroups.com

Christopher Thorn (Jira)

unread,
May 22, 2023, 11:22:01 AM5/22/23
to puppe...@googlegroups.com

Michael Hashizume (Jira)

unread,
May 24, 2023, 1:08:03 PM5/24/23
to puppe...@googlegroups.com

Christopher Thorn (Jira)

unread,
Jun 8, 2023, 6:57:02 PM6/8/23
to puppe...@googlegroups.com

Josh Cooper (Jira)

unread,
Jun 9, 2023, 1:48:02 PM6/9/23
to puppe...@googlegroups.com
Josh Cooper updated an issue
 
Change By: Josh Cooper
Release Notes: Bug Fix
Release Notes Summary: Facter no longer leaks file descriptors when attempting to resolve network interfaces

Parker Leach (Jira)

unread,
Jun 14, 2023, 1:00:03 PM6/14/23
to puppe...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages