Bug with puppetmaster and with puppet

52 views
Skip to first unread message

Larry Ludwig

unread,
Jun 8, 2008, 10:11:33 AM6/8/08
to Puppet Users
I've been having this issue with puppetmaster and the file serving.
It seems to stop working after hours of trying to download. Intially
works and don't have any issues. I then get the "Connect reset by
peer" sometime later.

Then in addition to this puppet couldn't download the cert in this
case and replaced it with the default, which for some odd reason it
was able to download. I've seen this happen quite a few times.

Jun 8 06:16:37 pan puppetd[29172]: Could not call
fileserver.describe: #<Errno::ECONNRESET: Connection reset by peer>
Jun 8 06:16:37 pan puppetd[29172]: (//Node[pan]/template-directadmin/
directadmin-exim/File[exim.cert]/source) Could not describe /
directadmin-exim/exim.pan.cert: Connection reset by peer
Jun 8 06:16:59 pan puppetd[29172]: (//Node[pan]/template-directadmin/
directadmin-exim/File[exim.cert]/source) replacing from source
puppet:///directadmin-exim/exim.cert with contents
{md5}b156c01be54b8c0a285d9d1bccc4c3b6J

It appears two bugs are happening:
1 - puppetmaster is for whatever reason stops file serving
2 - puppet replaces a file even though it knows the first one in the
source list is the correct one. It should not do anything if it sees
the file but can't download it.

Larry Ludwig

unread,
Jun 8, 2008, 10:56:36 AM6/8/08
to Puppet Users
Let me add the code for #2 that this happened in:

file { "exim.cert":
name => "/etc/exim.cert",
checksum => md5,
ensure => present,
owner => 'root',
group => 'root',
mode => '0444',
require => [ Package["da_exim"], File["exim.conf"] ],
source => $config_reseller ? {
true => "puppet:///directadmin-
exim/exim.reseller.cert",
default => [ "puppet:///directadmin-
exim/exim.${hostname}.cert",
"puppet:///directadmin-
exim/exim.cert" ],
},
}

I think I remember reading a post about #2 issue. I have not seen
anyone describe #1 issue.

Larry Ludwig

unread,
Jun 8, 2008, 7:40:40 PM6/8/08
to Puppet Users
here is more log info to show what I mean.. I restarted puppetmasterd
at 11:42A, then let it run as usual. Keep in mind that neither
puppetmaster or puppet were touched after this time it failed again.
No new recipes were introduced.

I donno if this makes a difference but I have custom facter plugins.
It seemed to have started when these were used.

Jun 8 10:39:19 pan puppetd[4959]: Could not call
puppetmaster.freshness: #<Errno::ECONNRESET: Connection reset by peer>
Jun 8 10:39:19 pan puppetd[4959]: Could not retrieve catalog:
Connection reset by peer
Jun 8 11:09:27 pan puppetd[4959]: Starting catalog run
Jun 8 11:09:54 pan puppetd[4959]: Finished catalog run in 27.31
seconds
Jun 8 11:41:54 pan puppetd[4959]: Could not retrieve plugins:
execution expired
Jun 8 11:42:41 pan puppetd[4959]: Starting catalog run
Jun 8 11:44:14 pan puppetd[4959]: Finished catalog run in 92.70
seconds
Jun 8 12:14:21 pan puppetd[4959]: Starting catalog run
Jun 8 12:14:53 pan puppetd[4959]: Finished catalog run in 32.44
seconds
Jun 8 12:44:58 pan puppetd[4959]: Starting catalog run
Jun 8 12:45:32 pan puppetd[4959]: Finished catalog run in 33.74
seconds
Jun 8 13:17:32 pan puppetd[4959]: Could not retrieve plugins:
execution expired
Jun 8 13:20:43 pan puppetd[4959]: Starting catalog run
Jun 8 13:29:08 pan puppetd[4959]: Could not call fileserver.describe:

Luke Kanies

unread,
Jun 9, 2008, 1:31:30 PM6/9/08
to puppet...@googlegroups.com
On Jun 8, 2008, at 9:11 AM, Larry Ludwig wrote:
>
> It appears two bugs are happening:
> 1 - puppetmaster is for whatever reason stops file serving

What messages is the server producing? It's the daemon having
problems, so its logs are the only ones that matter.

>
> 2 - puppet replaces a file even though it knows the first one in the
> source list is the correct one. It should not do anything if it sees
> the file but can't download it.

This sounds similar to the problem someone else posted -- if Puppet
has a list of files to download, it continues looking even if the
first file produces an error. This can result it in using the second
file even if the first exists (because there was an error while
downloading the first).

--
If you can't be a good example, then you'll just have to be a
horrible warning. -- Catherine Aird
---------------------------------------------------------------------
Luke Kanies | http://reductivelabs.com | http://madstop.com

Larry Ludwig

unread,
Jun 9, 2008, 2:39:45 PM6/9/08
to Puppet Users


On Jun 9, 1:31 pm, Luke Kanies <l...@madstop.com> wrote:
> On Jun 8, 2008, at 9:11 AM, Larry Ludwig wrote:
>
>
>
> > It appears two bugs are happening:
> > 1 - puppetmaster is for whatever reason stops file serving
>
> What messages is the server producing? It's the daemon having
> problems, so its logs are the only ones that matter.

Nothing I could find.

I'll have to run --trace next time it starts up again.


> > 2 - puppet replaces a file even though it knows the first one in the
> > source list is the correct one. It should not do anything if it sees
> > the file but can't download it.
>
> This sounds similar to the problem someone else posted -- if Puppet
> has a list of files to download, it continues looking even if the
> first file produces an error. This can result it in using the second
> file even if the first exists (because there was an error while
> downloading the first).

Yes right now this is definitely an undesirable result. It shouldn't
do anything, or at least this should be an option for the File type.


Larry Ludwig

unread,
Jun 11, 2008, 12:04:21 PM6/11/08
to Puppet Users
With #1 issue it appears to be a webrick issue. We have over 100
nodes.

I have since replaced it with apache/mongrel setup and the performance
is much better I believe it will also solve this issue.

As far as #2 if it's an open bug already then I won't add any
additional info except it should be fixed in some fashion.

-L
--
Larry Ludwig
Empowering Media
1-866-792-0489 x600
Managed and Unmanaged Xen VPSes
http://www.hostcube.com/

Larry Ludwig

unread,
Jun 16, 2008, 10:49:54 PM6/16/08
to Puppet Users
Luke,

There is definitely some bug going on. Even with the new setup using
apache/mongrel/puppetmaster setup.

puppetmaster seems to stop responding to requests after a few hours/
days. Where restarting it is the only way to make it work again. Not
sure if it's a memory leak but I do see memory usage increase on the
puppetmaster server.

How can I help track down this bug as it's affecting our production.
What do you recommend?

-L

Luke Kanies

unread,
Jun 16, 2008, 10:54:06 PM6/16/08
to puppet...@googlegroups.com
On Jun 16, 2008, at 9:49 PM, Larry Ludwig wrote:

> Luke,
>
> There is definitely some bug going on. Even with the new setup using
> apache/mongrel/puppetmaster setup.
>
> puppetmaster seems to stop responding to requests after a few hours/
> days. Where restarting it is the only way to make it work again. Not
> sure if it's a memory leak but I do see memory usage increase on the
> puppetmaster server.
>
> How can I help track down this bug as it's affecting our production.
> What do you recommend?


I guess run strace or truss or dtrace or whatever on it to figure out
what the heck it's doing. That's really all you can do, right?

Maybe check lsof to see if it's somehow got too many files open, too.

--
The covers of this book are too far apart. -- Ambrose Bierce

Frank Sweetser

unread,
Jun 16, 2008, 10:56:58 PM6/16/08
to puppet...@googlegroups.com
Larry Ludwig wrote:
> Luke,
>
> There is definitely some bug going on. Even with the new setup using
> apache/mongrel/puppetmaster setup.
>
> puppetmaster seems to stop responding to requests after a few hours/
> days. Where restarting it is the only way to make it work again. Not
> sure if it's a memory leak but I do see memory usage increase on the
> puppetmaster server.
>
> How can I help track down this bug as it's affecting our production.
> What do you recommend?

Larry,

are you by any chance seeing the same set of symptoms that I have been?

http://reductivelabs.com/redmine/issues/show/1095

--
Frank Sweetser fs at wpi.edu | For every problem, there is a solution that
WPI Senior Network Engineer | is simple, elegant, and wrong. - HL Mencken
GPG fingerprint = 6174 1257 129E 0D21 D8D4 E8A3 8E39 29E3 E2E8 8CEC

Luke Kanies

unread,
Jun 16, 2008, 11:00:42 PM6/16/08
to puppet...@googlegroups.com
On Jun 16, 2008, at 9:56 PM, Frank Sweetser wrote:

>
> Larry Ludwig wrote:
>> Luke,
>>
>> There is definitely some bug going on. Even with the new setup using
>> apache/mongrel/puppetmaster setup.
>>
>> puppetmaster seems to stop responding to requests after a few hours/
>> days. Where restarting it is the only way to make it work again.
>> Not
>> sure if it's a memory leak but I do see memory usage increase on the
>> puppetmaster server.
>>
>> How can I help track down this bug as it's affecting our production.
>> What do you recommend?
>
> Larry,
>
> are you by any chance seeing the same set of symptoms that I have
> been?
>
> http://reductivelabs.com/redmine/issues/show/1095

That's probably what it is.

Any ideas for how to possibly fix this? I just use the http libs from
Ruby, so I'm not doing any magic, but it also means that I haven't had
to learn much about their guts.

--
I wanna hang a map of the world in my house. Then I'm gonna put pins
into all the locations that I've traveled to. But first, I'm gonna
have to travel to the top two corners of the map so it won't fall
down. -- Mitch Hedberg

Frank Sweetser

unread,
Jun 16, 2008, 11:22:34 PM6/16/08
to puppet...@googlegroups.com
Luke Kanies wrote:
> On Jun 16, 2008, at 9:56 PM, Frank Sweetser wrote:
>
>> Larry Ludwig wrote:
>>> Luke,
>>>
>>> There is definitely some bug going on. Even with the new setup using
>>> apache/mongrel/puppetmaster setup.
>>>
>>> puppetmaster seems to stop responding to requests after a few hours/
>>> days. Where restarting it is the only way to make it work again.
>>> Not
>>> sure if it's a memory leak but I do see memory usage increase on the
>>> puppetmaster server.
>>>
>>> How can I help track down this bug as it's affecting our production.
>>> What do you recommend?
>> Larry,
>>
>> are you by any chance seeing the same set of symptoms that I have
>> been?
>>
>> http://reductivelabs.com/redmine/issues/show/1095
>
> That's probably what it is.
>
> Any ideas for how to possibly fix this? I just use the http libs from
> Ruby, so I'm not doing any magic, but it also means that I haven't had
> to learn much about their guts.

I'm afraid that I'm used to coming at it from the bottom up, looking through a
network sniffer, but I'm equally ignorant about the Ruby http libs.

The too many open files idea you mentioned does look promising, though. My
puppetmaster is currently stuck, and lsof shows that it's got exactly 256 open
files. In addition to the normal suspects (library files, /dev/null, etc)
there are what looks to me like an excessive number two lines:

puppetmas 31780 puppet 42w REG 253,4 216 21954578
/var/log/puppet/rails.log

This line appears 102 times...

puppetmas 31780 puppet 41u sock 0,4 2479117
can't identify protocol

... and this one 99 times.

I don't know enough to say if they're really relevant or not, but googling
around I found these promising looking discussions:

http://www.ruby-forum.com/topic/127663

http://www.ruby-forum.com/topic/154667

Luke Kanies

unread,
Jun 16, 2008, 11:26:55 PM6/16/08
to puppet...@googlegroups.com, Blake Barnett

Hmm. If it's got 102 rails log files open, that's a good bet for
being a real problem, even if it's maybe not the actual source of the
problem.

I *thought* we killed the problem of a new log file being opened on
every connection. Blake -- do you remember if that's the case? If
so, could it have gotten reborn somehow?

--
I have never met a man so ignorant that I couldn't learn something
from him. --Galileo Galilei

Larry Ludwig

unread,
Jun 17, 2008, 6:29:29 AM6/17/08
to Puppet Users
I'll start looking down this path and let you know what I find.

-L

Larry Ludwig

unread,
Jun 17, 2008, 4:34:05 PM6/17/08
to Puppet Users
I'm not sure if this is the issue yet.

At least from my end.

-L

Blake Barnett

unread,
Jun 18, 2008, 5:10:14 PM6/18/08
to Luke Kanies, puppet...@googlegroups.com
On Jun 16, 2008, at 8:26 PM, Luke Kanies wrote:
> Hmm. If it's got 102 rails log files open, that's a good bet for
> being a real problem, even if it's maybe not the actual source of
> the problem.
>
> I *thought* we killed the problem of a new log file being opened on
> every connection. Blake -- do you remember if that's the case? If
> so, could it have gotten reborn somehow?

The Keep-Alive stuff fixed it, but that's reverted now from #1010.

-Blake

Larry Ludwig

unread,
Jun 18, 2008, 6:41:12 PM6/18/08
to Puppet Users
from it occurring today it is in fact the same error list in the above
bug

Many CLOSE_WAIT entries in netstat.

-L

Luke Kanies

unread,
Jun 18, 2008, 9:15:48 PM6/18/08
to puppet...@googlegroups.com


As in the keep-alive stuff magically fixed it, or there was something
else in that code that provided the fix?

This is certainly an annoyingly big deal.

--
I have an answering machine in my car. It says, "I'm home now. But
leave a message and I'll call when I'm out. -- Stephen Wright

Blake Barnett

unread,
Jun 19, 2008, 12:45:09 AM6/19/08
to puppet...@googlegroups.com

On Jun 18, 2008, at 6:15 PM, Luke Kanies wrote:

>
> On Jun 18, 2008, at 4:10 PM, Blake Barnett wrote:
>
>> On Jun 16, 2008, at 8:26 PM, Luke Kanies wrote:
>>> Hmm. If it's got 102 rails log files open, that's a good bet for
>>> being a real problem, even if it's maybe not the actual source of
>>> the problem.
>>>
>>> I *thought* we killed the problem of a new log file being opened on
>>> every connection. Blake -- do you remember if that's the case? If
>>> so, could it have gotten reborn somehow?
>>
>> The Keep-Alive stuff fixed it, but that's reverted now from #1010.
>
>
> As in the keep-alive stuff magically fixed it, or there was something
> else in that code that provided the fix?
>
> This is certainly an annoyingly big deal.

The reason it happens is that ActiveRecord opens a new connection for
every client that connects. There's no clean way to make ActiveRecord
use a connection pool without wrapping every call to it in some
handling code. I think if we continue to use something like this for
direct DB access we should look at Sequel[1] or DataMapper[2]. Both
are thread safe and can deal with situations like this much more
cleanly.

-Blake

1. http://sequel.rubyforge.org/
2. http://datamapper.org/ (appears to be down)

Larry Ludwig

unread,
Jun 19, 2008, 12:27:38 PM6/19/08
to Puppet Users

> The reason it happens is that ActiveRecord opens a new connection for  
> every client that connects.  There's no clean way to make ActiveRecord  
> use a connection pool without wrapping every call to it in some  
> handling code.  I think if we continue to use something like this for  
> direct DB access we should look at Sequel[1] or DataMapper[2].  Both  
> are thread safe and can deal with situations like this much more  
> cleanly.

Just to let you know in our case we aren't storing our data in MySQL
(yet), though ActiveRecord is installed (donno if that makes a
difference).

-L

Blake Barnett

unread,
Jun 19, 2008, 5:07:59 PM6/19/08
to puppet...@googlegroups.com

Ah, that's different entirely. It shouldn't even load ActiveRecord
unless storeconfigs is enabled. Try commenting out lines 41 - 49 in
lib/puppet/feature/rails.rb and restart puppetmasterd. Hopefully it's
that simple.

-Blake


Reply all
Reply to author
Forward
0 new messages