puppet freezes on FUTEX_WAKE_PRIVATE

2,964 views
Skip to first unread message

Ernest Beinrohr

unread,
Jun 25, 2012, 4:16:35 AM6/25/12
to puppet...@googlegroups.com
Hi, i just want to ask whether somebody else has this problem of mine and if it can be solved.

Many of my ( 30+ ) puppet installation freeze up after some time. The process is waiting for some private futex, but it stays like that forever. This is what strace looks like when the problem occures:

[pid 29173] futex(0x3d35ce7a84, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 43406739, {1340611695, 739433265}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid 29173] clock_gettime(CLOCK_REALTIME, {1340611695, 741431552}) = 0
[pid 29173] futex(0x3d35ce7ab0, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 29173] futex(0x3d35ce7a84, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 43406741, {1340611695, 751431552}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid 29173] clock_gettime(CLOCK_REALTIME, {1340611695, 753429831}) = 0
[pid 29173] futex(0x3d35ce7ab0, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 29173] futex(0x3d35ce7a84, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 43406743, {1340611695, 763429831}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid 29173] clock_gettime(CLOCK_REALTIME, {1340611695, 765427460}) = 0
[pid 29173] futex(0x3d35ce7ab0, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 29173] futex(0x3d35ce7a84, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 43406745, {1340611695, 775427460}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid 29173] clock_gettime(CLOCK_REALTIME, {1340611695, 777424282}) = 0
[pid 29173] futex(0x3d35ce7ab0, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 29173] futex(0x3d35ce7a84, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 43406747, {1340611695, 787424282}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid 29173] clock_gettime(CLOCK_REALTIME, {1340611695, 789423203}) = 0
[pid 29173] futex(0x3d35ce7ab0, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 29173] futex(0x3d35ce7a84, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 43406749, {1340611695, 799423203}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid 29173] clock_gettime(CLOCK_REALTIME, {1340611695, 801422477}) = 0
[pid 29173] futex(0x3d35ce7ab0, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 29173] futex(0x3d35ce7a84, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 43406751, {1340611695, 811422477}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid 29173] clock_gettime(CLOCK_REALTIME, {1340611695, 813420142}) = 0



This happens on RHEL6 with puppet from RF (puppet-2.7.9-1.el6.rf, ruby-1.8.7.352-7.el6_2.x86_64) and also with some mandriva I happen to have (puppet-2.7.13-1mdv2010.2, ruby-1.8.7.p249-4.2mdv2010.2)


Stefan Schulte

unread,
Jul 13, 2012, 12:24:35 PM7/13/12
to puppet...@googlegroups.com
On Fri, Jul 13, 2012 at 06:30:41AM -0700, Thomas Sturm wrote:
> We have the same problem on Ubuntu 12.04 with kernel 3.2.0-24 and puppet
> 2.7.11. This occurs just after "info: Retrieving plugin" and before loading
> the facter facts. It occurs every 100th or 200th puppet run. Any hint much
> appreciated!
>
> cheers,
> Thomas
>

Is this a relativly new issue for you? FUTEX_WAIT reminds me of the leap
second kernelbug. If that's the case setting the time will fix the issue.

http://serverfault.com/questions/407224/java-process-opends-consumes-all-cpu-futex-flood-how-to-debug-futex

-Stefan

Thomas Sturm

unread,
Jul 16, 2012, 3:56:15 AM7/16/12
to puppet...@googlegroups.com

Is this a relativly new issue for you? FUTEX_WAIT reminds me of the leap
second kernelbug. If that's the case setting the time will fix the issue.

http://serverfault.com/questions/407224/java-process-opends-consumes-all-cpu-futex-flood-how-to-debug-futex

-Stefan


No, we already noticed this some weeks ago, so I don't think it has to do with the leap second bug. The process also doesn't consume much CPU, it just waits.

Thomas

Ernest Beinrohr

unread,
Jul 16, 2012, 4:00:54 AM7/16/12
to puppet...@googlegroups.com
Same here, we have this issue from the beginning (~3m). I am now forced to restart the service every hour :(

Richard Leitner

unread,
Oct 23, 2012, 10:37:14 AM10/23/12
to puppet...@googlegroups.com
Hi everybody,
just for info:
I solved this issue after hours of strace'ing, tcp- and ssldump'ing.
And as you may expect it was simple.

In my case the /usr/share/puppet-dashboard/bin/external_node timed out, because it was trying to connect to ::1 port 3000 (the dashboard).
The timeout was caused by my ip6tables DROP policy.
So two lines of code resolved my issues:
ip6tables -A INPUT -i lo -j ACCEPT
ip6tables -A OUTPUT -o lo -j ACCEPT

Then one simple "service ip6tables save" and everything was in tall cotton.

A puppet agent --test now takes around 15 seconds again (instead of nearly 3 minutes) *hooray!*

regards,
Richard

On Wednesday, October 17, 2012 5:23:07 PM UTC+2, Richard Leitner wrote:
Hi,
I don't know if this issue is still actual... but I'm suffering the same thing.

My puppet agent hangs for ~1 Minute with these messages:
26297 16:49:19.735059 futex(0x33e6ce7ab0, FUTEX_WAKE_PRIVATE, 1) = 0
26297 16:49:19.735127 futex(0x33e6ce7a84, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 7, {1350485359, 745041001}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
26297 16:49:19.745244 futex(0x33e6ce7ab0, FUTEX_WAKE_PRIVATE, 1) = 0
26297 16:49:19.745325 futex(0x33e6ce7a84, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 9, {1350485359, 755228509}, ffffffff) = -1 ETIMEDOUT (Connection timed out)

These lines repeat about 6000 times...
Then the agent continues with:
26265 16:50:22.898764 <... select resumed> ) = 1 (in [4], left {56, 826791})
26265 16:50:22.898853 read(4, "\27\3\1\0\300", 5) = 5
26265 16:50:22.898920 read(4, "\f\233\301\212\366\332X\277Q\273\n5\351\222\27\262\321#2*\350\260xPL\230\372\377!\366\270\355"..., 192) = 192
26265 16:50:22.899105 select(0, [], [], [], {0, 0}) = 0 (Timeout)
26265 16:50:22.899271 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
26265 16:50:22.899495 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0

Have anybody any idea?

One thing: It's definitely not the leap-second bug, the machine was born afterwards ;-)

best regards,
Richard
Reply all
Reply to author
Forward
0 new messages