What messages is the server producing? It's the daemon having
problems, so its logs are the only ones that matter.
>
> 2 - puppet replaces a file even though it knows the first one in the
> source list is the correct one. It should not do anything if it sees
> the file but can't download it.
This sounds similar to the problem someone else posted -- if Puppet
has a list of files to download, it continues looking even if the
first file produces an error. This can result it in using the second
file even if the first exists (because there was an error while
downloading the first).
--
If you can't be a good example, then you'll just have to be a
horrible warning. -- Catherine Aird
---------------------------------------------------------------------
Luke Kanies | http://reductivelabs.com | http://madstop.com
> Luke,
>
> There is definitely some bug going on. Even with the new setup using
> apache/mongrel/puppetmaster setup.
>
> puppetmaster seems to stop responding to requests after a few hours/
> days. Where restarting it is the only way to make it work again. Not
> sure if it's a memory leak but I do see memory usage increase on the
> puppetmaster server.
>
> How can I help track down this bug as it's affecting our production.
> What do you recommend?
I guess run strace or truss or dtrace or whatever on it to figure out
what the heck it's doing. That's really all you can do, right?
Maybe check lsof to see if it's somehow got too many files open, too.
--
The covers of this book are too far apart. -- Ambrose Bierce
Larry,
are you by any chance seeing the same set of symptoms that I have been?
http://reductivelabs.com/redmine/issues/show/1095
--
Frank Sweetser fs at wpi.edu | For every problem, there is a solution that
WPI Senior Network Engineer | is simple, elegant, and wrong. - HL Mencken
GPG fingerprint = 6174 1257 129E 0D21 D8D4 E8A3 8E39 29E3 E2E8 8CEC
>
> Larry Ludwig wrote:
>> Luke,
>>
>> There is definitely some bug going on. Even with the new setup using
>> apache/mongrel/puppetmaster setup.
>>
>> puppetmaster seems to stop responding to requests after a few hours/
>> days. Where restarting it is the only way to make it work again.
>> Not
>> sure if it's a memory leak but I do see memory usage increase on the
>> puppetmaster server.
>>
>> How can I help track down this bug as it's affecting our production.
>> What do you recommend?
>
> Larry,
>
> are you by any chance seeing the same set of symptoms that I have
> been?
>
> http://reductivelabs.com/redmine/issues/show/1095
That's probably what it is.
Any ideas for how to possibly fix this? I just use the http libs from
Ruby, so I'm not doing any magic, but it also means that I haven't had
to learn much about their guts.
--
I wanna hang a map of the world in my house. Then I'm gonna put pins
into all the locations that I've traveled to. But first, I'm gonna
have to travel to the top two corners of the map so it won't fall
down. -- Mitch Hedberg
I'm afraid that I'm used to coming at it from the bottom up, looking through a
network sniffer, but I'm equally ignorant about the Ruby http libs.
The too many open files idea you mentioned does look promising, though. My
puppetmaster is currently stuck, and lsof shows that it's got exactly 256 open
files. In addition to the normal suspects (library files, /dev/null, etc)
there are what looks to me like an excessive number two lines:
puppetmas 31780 puppet 42w REG 253,4 216 21954578
/var/log/puppet/rails.log
This line appears 102 times...
puppetmas 31780 puppet 41u sock 0,4 2479117
can't identify protocol
... and this one 99 times.
I don't know enough to say if they're really relevant or not, but googling
around I found these promising looking discussions:
http://www.ruby-forum.com/topic/127663
http://www.ruby-forum.com/topic/154667
Hmm. If it's got 102 rails log files open, that's a good bet for
being a real problem, even if it's maybe not the actual source of the
problem.
I *thought* we killed the problem of a new log file being opened on
every connection. Blake -- do you remember if that's the case? If
so, could it have gotten reborn somehow?
--
I have never met a man so ignorant that I couldn't learn something
from him. --Galileo Galilei
The Keep-Alive stuff fixed it, but that's reverted now from #1010.
-Blake
As in the keep-alive stuff magically fixed it, or there was something
else in that code that provided the fix?
This is certainly an annoyingly big deal.
--
I have an answering machine in my car. It says, "I'm home now. But
leave a message and I'll call when I'm out. -- Stephen Wright
>
> On Jun 18, 2008, at 4:10 PM, Blake Barnett wrote:
>
>> On Jun 16, 2008, at 8:26 PM, Luke Kanies wrote:
>>> Hmm. If it's got 102 rails log files open, that's a good bet for
>>> being a real problem, even if it's maybe not the actual source of
>>> the problem.
>>>
>>> I *thought* we killed the problem of a new log file being opened on
>>> every connection. Blake -- do you remember if that's the case? If
>>> so, could it have gotten reborn somehow?
>>
>> The Keep-Alive stuff fixed it, but that's reverted now from #1010.
>
>
> As in the keep-alive stuff magically fixed it, or there was something
> else in that code that provided the fix?
>
> This is certainly an annoyingly big deal.
The reason it happens is that ActiveRecord opens a new connection for
every client that connects. There's no clean way to make ActiveRecord
use a connection pool without wrapping every call to it in some
handling code. I think if we continue to use something like this for
direct DB access we should look at Sequel[1] or DataMapper[2]. Both
are thread safe and can deal with situations like this much more
cleanly.
-Blake
1. http://sequel.rubyforge.org/
2. http://datamapper.org/ (appears to be down)
Ah, that's different entirely. It shouldn't even load ActiveRecord
unless storeconfigs is enabled. Try commenting out lines 41 - 49 in
lib/puppet/feature/rails.rb and restart puppetmasterd. Hopefully it's
that simple.
-Blake