Facter via cron hanging on RHEL5

62 views
Skip to first unread message

Paul Seymour

unread,
Oct 20, 2014, 4:41:06 AM10/20/14
to puppet...@googlegroups.com
Hello,

We run facter via cron on all hosts regularly for mcollective filtering this is now hanging on 2.x.x (2.1, and 2.2 tried).

Facter run on various RHEL5 systems (both physical and virtual) hangs when executed via cron.

Crontab Entry:-
$ cat /etc/cron.d/mcollective-yaml-update
10 * * * * root /usr/bin/facter -p --yaml >/etc/mcollective/facts.yaml.tmp && /bin/mv /etc/mcollective/facts.yaml.tmp /etc/mcollective/facts.yaml

Process Table:-
$ ps -ef | grep fac
root     25137 25131  0 09:10 ?        00:00:00 /bin/sh -c /usr/bin/facter -p --yaml >/etc/mcollective/facts.yaml.tmp && /bin/mv /etc/mcollective/facts.yaml.tmp /etc/mcollective/facts.yaml
root     25138 25137  0 09:10 ?        00:00:00 /usr/bin/ruby /usr/bin/facter -p –yaml

lsof output:-
$ lsof | grep facter
facter    29899    root  cwd       DIR              253,0     4096     130433 /root
facter    29899    root  rtd       DIR              253,0     4096          2 /
facter    29899    root  txt       REG              253,0     5056    1349275 /usr/bin/ruby
… various misc libs …
facter    29899    root  mem       REG              253,0    12072    1506765 /usr/lib64/ruby/1.8/x86_64-linux/digest.so
facter    29899    root  mem       REG              253,0     5000    1506768 /usr/lib64/ruby/1.8/x86_64-linux/fcntl.so
facter    29899    root  mem       REG              253,0   263072    1506772 /usr/lib64/ruby/1.8/x86_64-linux/nkf.so
facter    29899    root  mem       REG              253,0    18192    1506785 /usr/lib64/ruby/1.8/x86_64-linux/strscan.so
facter    29899    root  mem       REG              253,0   111480     326115 /lib64/libnsl-2.5.so
facter    29899    root  mem       REG              253,0    53880     326141 /lib64/libnss_files-2.5.so
facter    29899    root  mem       REG              253,0    12440    1506790 /usr/lib64/ruby/1.8/x86_64-linux/syslog.so
facter    29899    root  mem       REG              253,0    45440    1506779 /usr/lib64/ruby/1.8/x86_64-linux/socket.so
facter    29899    root  mem       REG              253,0    15048    1506775 /usr/lib64/ruby/1.8/x86_64-linux/racc/cparse.so
facter    29899    root  mem       REG              253,0    20736    1601142 /usr/lib64/ruby/site_ruby/1.8/x86_64-linux/json/ext/parser.so
facter    29899    root  mem       REG              253,0    28152    1601141 /usr/lib64/ruby/site_ruby/1.8/x86_64-linux/json/ext/generator.so
facter    29899    root  mem       REG              253,0     5160    1598087 /usr/lib64/ruby/1.8/x86_64-linux/digest/sha1.so
facter    29899    root  mem       REG              253,0     5160    1598085 /usr/lib64/ruby/1.8/x86_64-linux/digest/md5.so
facter    29899    root  mem       REG              253,0    25464    1352515 /usr/lib64/gconv/gconv-modules.cache
facter    29899    root  mem       REG              253,0   217016    1504010 /var/db/nscd/hosts
facter    29899    root    0r     FIFO                0,6      0t0  850778561 pipe
facter    29899    root    1w      REG              253,0        0    2609309 /etc/mcollective/facts.yaml.tmp
facter    29899    root    2w     FIFO                0,6      0t0  850778562 pipe
facter    29899    root    3r      REG              253,0      139    2186506 /etc/sysconfig/appdynamics
facter    29899    root    4r     FIFO                0,6      0t0  850779144 pipe
facter    29899    root    5r      REG                0,0     4096       5829 /sys/block/hdc/size

Ruby Version
$ rpm -q --qf '%{VERSION}-%{RELEASE}\n' ruby
1.8.7.370-1.el5

This didn’t happen with 1.6.x which was our previous version.And I cannot find a RHEL6 host with this issue. So I suspect Ruby here. Any ideas anyone ?

Thanks
Paul

Wil Cooley

unread,
Oct 20, 2014, 12:20:23 PM10/20/14
to puppet-users group


On Oct 20, 2014 1:41 AM, "Paul Seymour" <paul.s...@ig.com> wrote:
>
> This didn’t happen with 1.6.x which was our previous version.And I cannot find a RHEL6 host with this issue. So I suspect Ruby here. Any ideas anyone ?

Run `strace` on the hung process to try to see what it's doing?

Wil

Paul Seymour

unread,
Oct 21, 2014, 11:39:46 AM10/21/14
to puppet...@googlegroups.com
Thanks. It's a little tricky to do as it always comes back with a "wait4" call if attaching to the hung process. Will try and capture it.

But looking at the open files it's always trying to read the size of block devices from /sys

Cheers
Paul

Wil Cooley

unread,
Oct 21, 2014, 7:07:32 PM10/21/14
to puppet-users group


On Oct 21, 2014 8:41 AM, "Paul Seymour" <paul.s...@ig.com> wrote:

> Thanks. It's a little tricky to do as it always comes back with a "wait4" call if attaching to the hung process. Will try and capture it.

`wait4` would indicate it's waiting for a child process; I would try to figure out what the child is doing by having strace run with "-f" to follow forks.

You could run facter from cron with strace prepended so it can strace from the beginning. Use the "-o" to have strace send output to a file.

Wil

Throwe, Jesse

unread,
Oct 22, 2014, 6:19:48 AM10/22/14
to puppet...@googlegroups.com
We had this problem on EL5 with a few facts that were reading proc and
sysfs entries. The exec utility function would hang on EL5 on these
facts, and I eventually had to change to backticks to get it to run to
completion. It's been awhile since I had to deal with the bug, but I
seem to recall the ultimate issue was something to do with a lack of
tty and the <2.6.2x kernel.
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to puppet-users...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/puppet-users/CAMmm3r7DA9a-BW25B%2BsaTRaJe5K4izyrsUtdqgG%3DsJkhKy4fbg%40mail.gmail.com.
>
> For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages