[PATCH/puppet 1/1] Possible workaround for #2824 (MRI GC bug)

25 views
Skip to first unread message

Markus Roberts

unread,
Nov 17, 2009, 1:45:27 AM11/17/09
to puppe...@googlegroups.com
This is a moderately ugly workaround for the MRI garbage collection
bug (see the ticket for details).

I explored several other potential solutions (notably, monkey
patching the routines that trigger the bug) but none of them were
satisfactory. Monkey patching sub, gsub, sub!, gsub!, etc., for
example, either changes the scoping of $~, $1, etc. in a way that
could potentially subtly change the meaning of programs or (if you
are clever) faithfully reproduces the behaviour of MRI--including
the memory leak.

I decided to go with the standardized and somewhat obnoxious never-
used optional argument as it was easy to automatically insert and
should be even easier to automatically find and remove if a better
fix is developed. It also should be obtrusive enough to escape
accidental removal in refactoring.

Signed-off-by: Markus Roberts <Mar...@reality.com>
---
lib/puppet/application/puppetdoc.rb | 2 +-
lib/puppet/file_serving/base.rb | 2 +-
lib/puppet/indirector/node/ldap.rb | 2 +-
lib/puppet/parser/ast/leaf.rb | 2 +-
lib/puppet/provider/service/daemontools.rb | 2 +-
lib/puppet/provider/service/runit.rb | 2 +-
lib/puppet/provider/zone/solaris.rb | 2 +-
lib/puppet/rails/resource.rb | 2 +-
lib/puppet/sslcertificates/ca.rb | 4 ++--
lib/puppet/type/k5login.rb | 2 +-
lib/puppet/util/subclass_loader.rb | 2 +-
11 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/lib/puppet/application/puppetdoc.rb b/lib/puppet/application/puppetdoc.rb
index 5656112..0a4e0c3 100644
--- a/lib/puppet/application/puppetdoc.rb
+++ b/lib/puppet/application/puppetdoc.rb
@@ -192,7 +192,7 @@ Puppet::Application.new(:puppetdoc) do
end
end

- def setup_rdoc
+ def setup_rdoc(dummy_argument=:work_arround_for_ruby_GC_bug)
# consume the unknown options
# and feed them as settings
if @unknown_args.size > 0
diff --git a/lib/puppet/file_serving/base.rb b/lib/puppet/file_serving/base.rb
index 02132e8..a7ab9b6 100644
--- a/lib/puppet/file_serving/base.rb
+++ b/lib/puppet/file_serving/base.rb
@@ -22,7 +22,7 @@ class Puppet::FileServing::Base
end

# Return the full path to our file. Fails if there's no path set.
- def full_path
+ def full_path(dummy_argument=:work_arround_for_ruby_GC_bug)
(if relative_path.nil? or relative_path == "" or relative_path == "."
path
else
diff --git a/lib/puppet/indirector/node/ldap.rb b/lib/puppet/indirector/node/ldap.rb
index 4600a0d..dd8cebf 100644
--- a/lib/puppet/indirector/node/ldap.rb
+++ b/lib/puppet/indirector/node/ldap.rb
@@ -73,7 +73,7 @@ class Puppet::Node::Ldap < Puppet::Indirector::Ldap

# The attributes that Puppet will stack as array over the full
# hierarchy.
- def stacked_attributes
+ def stacked_attributes(dummy_argument=:work_arround_for_ruby_GC_bug)
Puppet[:ldapstackedattrs].split(/\s*,\s*/)
end

diff --git a/lib/puppet/parser/ast/leaf.rb b/lib/puppet/parser/ast/leaf.rb
index b73c781..c8ac6f7 100644
--- a/lib/puppet/parser/ast/leaf.rb
+++ b/lib/puppet/parser/ast/leaf.rb
@@ -101,7 +101,7 @@ class Puppet::Parser::AST
end
end

- def to_classname
+ def to_classname(dummy_argument=:work_arround_for_ruby_GC_bug)
to_s.downcase.gsub(/[^-\w:.]/,'').sub(/^\.+/,'')
end

diff --git a/lib/puppet/provider/service/daemontools.rb b/lib/puppet/provider/service/daemontools.rb
index 2c6c6db..3749f9c 100644
--- a/lib/puppet/provider/service/daemontools.rb
+++ b/lib/puppet/provider/service/daemontools.rb
@@ -45,7 +45,7 @@ Puppet::Type.type(:service).provide :daemontools, :parent => :base do
attr_writer :defpath

# Determine the daemon path.
- def defpath
+ def defpath(dummy_argument=:work_arround_for_ruby_GC_bug)
unless defined?(@defpath) and @defpath
["/var/lib/service", "/etc"].each do |path|
if FileTest.exist?(path)
diff --git a/lib/puppet/provider/service/runit.rb b/lib/puppet/provider/service/runit.rb
index b313fc7..b8b444e 100644
--- a/lib/puppet/provider/service/runit.rb
+++ b/lib/puppet/provider/service/runit.rb
@@ -38,7 +38,7 @@ Puppet::Type.type(:service).provide :runit, :parent => :daemontools do
class << self
# this is necessary to autodetect a valid resource
# default path, since there is no standard for such directory.
- def defpath
+ def defpath(dummy_argument=:work_arround_for_ruby_GC_bug)
unless defined?(@defpath) and @defpath
["/etc/sv", "/var/lib/service"].each do |path|
if FileTest.exist?(path)
diff --git a/lib/puppet/provider/zone/solaris.rb b/lib/puppet/provider/zone/solaris.rb
index 52007bb..b047f69 100644
--- a/lib/puppet/provider/zone/solaris.rb
+++ b/lib/puppet/provider/zone/solaris.rb
@@ -64,7 +64,7 @@ Puppet::Type.type(:zone).provide(:solaris) do
@property_hash.clear
end

- def install
+ def install(dummy_argument=:work_arround_for_ruby_GC_bug)
if @resource[:install_args]
zoneadm :install, @resource[:install_args].split(" ")
else
diff --git a/lib/puppet/rails/resource.rb b/lib/puppet/rails/resource.rb
index 12d3211..984bdc0 100644
--- a/lib/puppet/rails/resource.rb
+++ b/lib/puppet/rails/resource.rb
@@ -199,7 +199,7 @@ class Puppet::Rails::Resource < ActiveRecord::Base
result
end

- def ref
+ def ref(dummy_argument=:work_arround_for_ruby_GC_bug)
"%s[%s]" % [self[:restype].split("::").collect { |s| s.capitalize }.join("::"), self.title.to_s]
end

diff --git a/lib/puppet/sslcertificates/ca.rb b/lib/puppet/sslcertificates/ca.rb
index b5a2469..f6bcbc1 100644
--- a/lib/puppet/sslcertificates/ca.rb
+++ b/lib/puppet/sslcertificates/ca.rb
@@ -134,7 +134,7 @@ class Puppet::SSLCertificates::CA

# List certificates waiting to be signed. This returns a list of hostnames, not actual
# files -- the names can be converted to full paths with host2csrfile.
- def list
+ def list(dummy_argument=:work_arround_for_ruby_GC_bug)
return Dir.entries(Puppet[:csrdir]).find_all { |file|
file =~ /\.pem$/
}.collect { |file|
@@ -144,7 +144,7 @@ class Puppet::SSLCertificates::CA

# List signed certificates. This returns a list of hostnames, not actual
# files -- the names can be converted to full paths with host2csrfile.
- def list_signed
+ def list_signed(dummy_argument=:work_arround_for_ruby_GC_bug)
return Dir.entries(Puppet[:signeddir]).find_all { |file|
file =~ /\.pem$/
}.collect { |file|
diff --git a/lib/puppet/type/k5login.rb b/lib/puppet/type/k5login.rb
index 20c0324..5526fda 100644
--- a/lib/puppet/type/k5login.rb
+++ b/lib/puppet/type/k5login.rb
@@ -56,7 +56,7 @@ Puppet::Type.newtype(:k5login) do
end

# Return the principals
- def principals
+ def principals(dummy_argument=:work_arround_for_ruby_GC_bug)
if File.exists?(@resource[:name])
File.readlines(@resource[:name]).collect { |line| line.chomp }
else
diff --git a/lib/puppet/util/subclass_loader.rb b/lib/puppet/util/subclass_loader.rb
index 8776e85..b71ec72 100644
--- a/lib/puppet/util/subclass_loader.rb
+++ b/lib/puppet/util/subclass_loader.rb
@@ -72,7 +72,7 @@ module Puppet::Util::SubclassLoader
end

# Retrieve or calculate a name.
- def name
+ def name(dummy_argument=:work_arround_for_ruby_GC_bug)
unless defined? @name
@name = self.to_s.sub(/.+::/, '').intern
end
--
1.6.4

Luke Kanies

unread,
Nov 17, 2009, 5:33:26 PM11/17/09
to puppe...@googlegroups.com
I'd love to see someone with memory problems try this out and see how
much of an impact it has.

Maybe put this in the testing branch for a while?
> --~--~---------~--~----~------------~-------~--~----~
> You received this message because you are subscribed to the Google
> Groups "Puppet Developers" group.
> To post to this group, send email to puppe...@googlegroups.com
> To unsubscribe from this group, send email to puppet-dev+...@googlegroups.com
> For more options, visit this group at http://groups.google.com/group/puppet-dev?hl=en
> -~----------~----~----~----~------~----~------~--~---
>


--
Barrow's first law:
Any Universe simple enough to be understood is too simple to
produce
a mind able to understand it.
---------------------------------------------------------------------
Luke Kanies | http://reductivelabs.com | http://madstop.com

Markus Roberts

unread,
Nov 17, 2009, 6:05:21 PM11/17/09
to puppe...@googlegroups.com
> I'd love to see someone with memory problems try this out and see how
much of an impact it has.

Yes.  Dan and I are working on it.
 
Maybe put this in the testing branch for a while?

Any reason not to put it in both (0.25.x & testing)?

-- Markus
 

Luke Kanies

unread,
Nov 18, 2009, 6:52:42 PM11/18/09
to puppe...@googlegroups.com
Hmm. Not really. I guess my main concern is that if it doesn't have
significant real-world impact it might not be worth the complexity.

But I suppose I'm being silly there.

--
It's not that I'm afraid to die. I just don't want to be there when it
happens. -- Woody Allen

Peter Meier

unread,
Nov 27, 2009, 9:10:39 AM11/27/09
to puppe...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi

I had this now running for around a week on the master and the clients.

for the clients I can't really say anything as I'm still running the
clients by cron.

However on the server I can tell that in my opinion it didn't really
make any *big* differences. :( I'm running five instances which get
restarted every 4 hours. I might be able to interpret that the overall
usage dropped a few 100MB (maybe 100-200), but not really more. Maybe I
try to see a difference where there isn't really one. Or maybe there is
one, but it's a bit too small to spot for me in the weekly Munin graphs.

For sure restarting the masters is polluting the graphs a lot and it
makes it very hard to read trends. But it's definitely not a that severe
change as changing from 0.24.8 to 0.25.x. Which saved me so much memory
that I can now run an instance more with still a few 100MB less memory
used. :)

So well as I think it doesn't really hurt and it maybe help (maybe other
people can give other reports) it's not a that bad idea.

If you'd like to have additional information, please let me know. Maybe
I'm able to provide you them.

cheers pete
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAksP3dgACgkQbwltcAfKi3/UfwCgkEWWT+kMisF2rsczOW4TvsLZ
QOUAoKg9Bpa5s0a67OfevvhoqAGjwiDv
=D60X
-----END PGP SIGNATURE-----

Markus

unread,
Nov 27, 2009, 2:42:11 PM11/27/09
to puppe...@googlegroups.com
Peter --

Thanks for the feedback!

Your results sound reasonable given that the errors this was addressing
probably wouldn't show up in your environment--they aren't fast leaks
but they are inexorable, beyond and mechanism in the system to reclaim
short of restarting the process. Even if they are only losing 10 Meg
per puppetmaster over the course of your 4 hour lifecycles (which would
be consistent with the results you reported) they would collectively eat
up around 2 Gig a week if left to run unchecked.

One question (since you nicely offered to provide additional
information): why are you running the clients by cron and restarting the
puppetmasters every four hours? Are there persistent memory problems
(not corrected by this patch) or did some other consideration drive the
decision?

-- Markus



Peter Meier

unread,
Dec 1, 2009, 5:47:38 PM12/1/09
to puppe...@googlegroups.com
Hi Markus

> Your results sound reasonable given that the errors this was addressing
> probably wouldn't show up in your environment--they aren't fast leaks
> but they are inexorable, beyond and mechanism in the system to reclaim
> short of restarting the process. Even if they are only losing 10 Meg
> per puppetmaster over the course of your 4 hour lifecycles (which would
> be consistent with the results you reported) they would collectively eat
> up around 2 Gig a week if left to run unchecked.

That makes sense and so I decided to give it a try and it really looks
like it stays quite stable once every master-instance has loaded code
for every node:
https://durito.cronopios.org/puppetmaster-091201-memory-day.png

> One question (since you nicely offered to provide additional
> information): why are you running the clients by cron and restarting the
> puppetmasters every four hours? Are there persistent memory problems
> (not corrected by this patch) or did some other consideration drive the
> decision?

Mainly due to the following reason:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

22319 root 18 0 530m 437m 3808 R 98.2 29.1 6:37.71 puppetd


and this will stay in memory at this size between puppet runs. I can use
this memory during non-puppet runs for other things. ;)

This host has a localconfig.yml of 3.3MB, doing a lot of file stuff
(creating and managing directories/vhosts-configs from templates/etc.
for tiny webhostings).
0.25 improved the situation heavily on the master and the client. But on
the client I would say it improved more on the performance side than on
the memory side.
Beware: this is really more talking at large.
On the other side there have been other reports about checksumming
problems, reading content twice etc. For example the "Checksum Behavior
Change"-thread on this list or bug #2630. But I have n't yet been able
to look into them in detail, so I'm more talking at large. Question: the
bug discussed in "Checksum Behavior Change" have it been fixed? Paul
Lathrop?

At least it's on my todo list... ;)

Anyway I would be happy to provide you with more or additional
information if you have specific questions.

cheers pete

Luke Kanies

unread,
Dec 1, 2009, 11:34:44 PM12/1/09
to puppe...@googlegroups.com
This is pretty gross. :/

Do you by chance have the ability to do a dump of what's in memory or
anything, so we can get an idea of what the heck is being held there?

I don't think MRI will ever entirely reclaim it, but Puppet should be
holding essentially nothing in memory between runs, other than the
code required to make it all work.

> Beware: this is really more talking at large.
> On the other side there have been other reports about checksumming
> problems, reading content twice etc. For example the "Checksum
> Behavior
> Change"-thread on this list or bug #2630. But I have n't yet been able
> to look into them in detail, so I'm more talking at large. Question:
> the
> bug discussed in "Checksum Behavior Change" have it been fixed? Paul
> Lathrop?

Not sure if it's been fixed yet, but it will be for 0.25.2.

> At least it's on my todo list... ;)
>
> Anyway I would be happy to provide you with more or additional
> information if you have specific questions.


--
Between two evils, I always pick the one I never tried before.
-- Mae West

Brice Figureau

unread,
Dec 2, 2009, 3:28:38 AM12/2/09
to puppe...@googlegroups.com
Disclaimer: I never checked MRI code, so what follows is what I gathered
reading books and blogs. As such it might be completely wrong.

The issue to me is the following: the MRI memory allocator (like most
allocators) never return the memory to the OS. So basically your ruby
process will _always_ consume the memory you needed at peak, even if at
a given time you don't need any objects.

But it's even worst: each time you do an allocation and the allocator is
already full (or because of fragmentation it can't find enough
contiguous free space), it will ask for more memory from the OS.

For performance reasons, instead of asking a little bit more memory when
needed, the allocator will extend the heap by a factor of 1.8. That
means, if you have bad luck, that you can consume 1.8 times more than
what you needed at peak (grr).

I might be wrong, but I think the MRI GC can only fire when the
interpreter is not too busy. That means it won't trigger that much in a
puppetd catalog evaluation (we're busy), but unfortunately it's when we
need it the more (we're consuming memory). So if this proves to be true
(I hope to be contradicted here), we might accumulate
ready-to-be-trashed objects, still using heap space where we could have
a better profile.
Can we ask the GC to trigger manually? If yes that would be interesting
to put a few GC calls in the transaction evaluation (like after
completing file resources), to see if that's better.

The solutions:
* use another allocator (difficult of course :-) )

* use ruby enterprise edition (which contains the MBari patch and
various allocator patches and allows to tune the 1.8 factor, it also
helps with fragmentation)

* reduce as much as we can the puppetd memory footprint at peak
(including never reading files, never store collections...). This is a
difficult task of course.

* use another ruby interpreter/vm, like JRuby (which BTW has the same
issue as MRI: it never returns anything to the OS, but at least its
garbage collector is top-notch _and_ observable and tunable).

HTH,
--
Brice Figureau
Follow the latest Puppet Community evolutions on www.planetpuppet.org!

Luke Kanies

unread,
Dec 2, 2009, 12:33:41 PM12/2/09
to puppe...@googlegroups.com
My understanding is that that is correct.

> Can we ask the GC to trigger manually? If yes that would be
> interesting
> to put a few GC calls in the transaction evaluation (like after
> completing file resources), to see if that's better.

This probably wouldn't help that much because we build up memory usage
over the transaction and then get rid of objects at the end. I mean,
we shouldn't ever build up 500mb of memory, but still.

I think one of the main things we could do to help things is to switch
file recursion to not generate resources, somehow (i.e., to have a
single resource capable of recursing, rather than having each recursed
file result in a new resource instance). This is complicated on its
own, but it's even more complicated once we start looking at the
granular reporting - we still need to generate per-resource status
information, otherwise we're losing too much data.

> The solutions:
> * use another allocator (difficult of course :-) )
>
> * use ruby enterprise edition (which contains the MBari patch and
> various allocator patches and allows to tune the 1.8 factor, it also
> helps with fragmentation)

Has anyone tried this?

> * reduce as much as we can the puppetd memory footprint at peak
> (including never reading files, never store collections...). This is a
> difficult task of course.
>
> * use another ruby interpreter/vm, like JRuby (which BTW has the same
> issue as MRI: it never returns anything to the OS, but at least its
> garbage collector is top-notch _and_ observable and tunable).


Another option is something like Ohad's puppetlisten - have a much
thinner client that forks when it triggers a run - that's probably the
best way to ensure that the memory will get reclaimed. The client
itself could be incredibly thin, just loading the network code into
memory.

Some things would be done in-process, I expect, such as status response.

Any objections to this move?

--
The one thing more difficult than following a regimen is not imposing it
on others. -- Marcel Proust

Brice Figureau

unread,
Dec 2, 2009, 1:56:01 PM12/2/09
to puppe...@googlegroups.com
3.3MB of template shouldn't translate to 500MB of ruby objects :-(
What ruby version are you running there?
Cool.

>> Can we ask the GC to trigger manually? If yes that would be
>> interesting
>> to put a few GC calls in the transaction evaluation (like after
>> completing file resources), to see if that's better.
>
> This probably wouldn't help that much because we build up memory usage
> over the transaction and then get rid of objects at the end. I mean,
> we shouldn't ever build up 500mb of memory, but still.

Yes, but we maybe needed only 277MB (500MB/1.8), and since we are
unlucky, heap fragmentation and so on, maybe we're allocating only 200MB
worth of data (pure guess of course). What I mean is that our memory
profile might not be so high, but the MRI allocator/heap/GC doesn't help
us be good citizens.

> I think one of the main things we could do to help things is to switch
> file recursion to not generate resources, somehow (i.e., to have a
> single resource capable of recursing, rather than having each recursed
> file result in a new resource instance).

We should try to see what is the impact of a file resource (without
content/template) in terms of memory. I don't think it's that much.

Peter: can you roughly evaluate how many files you let puppet manage on
the host you sent the ps output?

I don't really have the time, but that would be interesting to run
puppetd with only one deep local (recursive or not) file resource and
see the impact on memory (and verify the theory that the heap is large
but we use only a small part of it between runs)...
Running with various "depth", and measuring the process size after one
or several runs, would let us know roughly how much we consume per file
resource...

Any volunteers?


> This is complicated on its
> own, but it's even more complicated once we start looking at the
> granular reporting - we still need to generate per-resource status
> information, otherwise we're losing too much data.
>
>> The solutions:
>> * use another allocator (difficult of course :-) )
>>
>> * use ruby enterprise edition (which contains the MBari patch and
>> various allocator patches and allows to tune the 1.8 factor, it also
>> helps with fragmentation)
>
> Has anyone tried this?

I'm running (like most of the people I know) it on the master. It has a
good impact (both in terms of performance and memory consumption).
I didn't bother to tune the allocator though, but I don't get any never
ending growing processes (on the other hand, when I was running pure MRI
1.8.7, I didn't either).

>> * reduce as much as we can the puppetd memory footprint at peak
>> (including never reading files, never store collections...). This is a
>> difficult task of course.
>>
>> * use another ruby interpreter/vm, like JRuby (which BTW has the same
>> issue as MRI: it never returns anything to the OS, but at least its
>> garbage collector is top-notch _and_ observable and tunable).
>
>
> Another option is something like Ohad's puppetlisten - have a much
> thinner client that forks when it triggers a run - that's probably the
> best way to ensure that the memory will get reclaimed. The client
> itself could be incredibly thin, just loading the network code into
> memory.

Yes, that's a good idea to have a small "master" process doing the
listen (if you are a puppetrunner) and sleep (for the regular case and
splay) which then fires a new process to do the run when needed.

> Some things would be done in-process, I expect, such as status response.
>
> Any objections to this move?

Absolutely none, but this doesn't address the transient memory we need
for a catalog evaluation.
--
Brice Figureau
My Blog: http://www.masterzen.fr/

Luke Kanies

unread,
Dec 2, 2009, 2:10:46 PM12/2/09
to puppe...@googlegroups.com
On Dec 2, 2009, at 10:56 AM, Brice Figureau wrote:

>> Another option is something like Ohad's puppetlisten - have a much
>> thinner client that forks when it triggers a run - that's probably
>> the
>> best way to ensure that the memory will get reclaimed. The client
>> itself could be incredibly thin, just loading the network code into
>> memory.
>
> Yes, that's a good idea to have a small "master" process doing the
> listen (if you are a puppetrunner) and sleep (for the regular case and
> splay) which then fires a new process to do the run when needed.
>
>> Some things would be done in-process, I expect, such as status
>> response.
>>
>> Any objections to this move?
>
> Absolutely none, but this doesn't address the transient memory we need
> for a catalog evaluation.

I concur.

It should be essentially trivial to trigger the GC after every
resource; if someone with a large process is in a position to test, I
can provide a quick patch.

There are multiple other things that have been kicking around in my
head that could help more or less amounts. At the least, I'd like to
start seeing file contents stored out of memory, in a filebucket or
whatever (this is for files whose content is specified directly).
Additionally (although I know this is a small amount) I'd like to see
a trigger to strip docs from memory in those cases where it's not
used, on both server and client. We normally should just ignore docs,
but instead they sit around in memory for no good reason.

Most of our memory usage isn't this kind of problem, though, it's some
other leak-like behaviour, so this won't help a lot.

--
The most dangerous strategy is to jump a chasm in two leaps.
-- Benjamin Disraeli

Peter Meier

unread,
Dec 2, 2009, 5:50:31 PM12/2/09
to puppe...@googlegroups.com
>>>>> This host has a localconfig.yml of 3.3MB, doing a lot of file stuff
>>>>> (creating and managing directories/vhosts-configs from templates/
>>>>> etc.
>>>>> for tiny webhostings).
>
> 3.3MB of template shouldn't translate to 500MB of ruby objects :-(
> What ruby version are you running there?

# ruby --version
ruby 1.8.6 (2007-09-24 patchlevel 111) [x86_64-linux]

I fear that this is the problem. :/ Maybe I have to look for a good way
to keep ruby on CentOS a bit more sane.
What are other people doing?

>> I think one of the main things we could do to help things is to switch
>> file recursion to not generate resources, somehow (i.e., to have a
>> single resource capable of recursing, rather than having each recursed
>> file result in a new resource instance).
>
> We should try to see what is the impact of a file resource (without
> content/template) in terms of memory. I don't think it's that much.
>
> Peter: can you roughly evaluate how many files you let puppet manage on
> the host you sent the ps output?

# grep -c "type: file" /var/lib/puppet/state/localconfig.yaml
1375

> I don't really have the time, but that would be interesting to run
> puppetd with only one deep local (recursive or not) file resource and
> see the impact on memory (and verify the theory that the heap is large
> but we use only a small part of it between runs)...
> Running with various "depth", and measuring the process size after one
> or several runs, would let us know roughly how much we consume per file
> resource...
>
> Any volunteers?

I could, but let's first look at basics (ruby version etc. see below)

>> This is complicated on its
>> own, but it's even more complicated once we start looking at the
>> granular reporting - we still need to generate per-resource status
>> information, otherwise we're losing too much data.
>>
>>> The solutions:
>>> * use another allocator (difficult of course :-) )
>>>
>>> * use ruby enterprise edition (which contains the MBari patch and
>>> various allocator patches and allows to tune the 1.8 factor, it also
>>> helps with fragmentation)
>> Has anyone tried this?
>
> I'm running (like most of the people I know) it on the master. It has a
> good impact (both in terms of performance and memory consumption).
> I didn't bother to tune the allocator though, but I don't get any never
> ending growing processes (on the other hand, when I was running pure MRI
> 1.8.7, I didn't either).

Well I'm currently looking for newer rpms for ruby on centos. What are
people usually running? 1.8.7? I have seen various posts discussing
various different aspects of different ruby version. Centos ships still
with 1.8.5, I got a newer version from thoughtworks
(http://rubyworks.rubyforge.org/) but maybe there are even newer ones
and more managed ones.

There have even been ongoing discussion that x86_64 uses twice as much
memory. But I have to admit that I have never seen really a solution in
this jungle of Ruby-Versions and memory issues.

>>> * reduce as much as we can the puppetd memory footprint at peak
>>> (including never reading files, never store collections...). This is a
>>> difficult task of course.
>>>
>>> * use another ruby interpreter/vm, like JRuby (which BTW has the same
>>> issue as MRI: it never returns anything to the OS, but at least its
>>> garbage collector is top-notch _and_ observable and tunable).
>>
>> Another option is something like Ohad's puppetlisten - have a much
>> thinner client that forks when it triggers a run - that's probably the
>> best way to ensure that the memory will get reclaimed. The client
>> itself could be incredibly thin, just loading the network code into
>> memory.
>
> Yes, that's a good idea to have a small "master" process doing the
> listen (if you are a puppetrunner) and sleep (for the regular case and
> splay) which then fires a new process to do the run when needed.

+100 besides with what we'll find out in the different places. This
would anyway be a good solution.

cheers pete

Peter Meier

unread,
Dec 2, 2009, 5:51:38 PM12/2/09
to puppe...@googlegroups.com
>>> Any objections to this move?
>> Absolutely none, but this doesn't address the transient memory we need
>> for a catalog evaluation.
>
> I concur.
>
> It should be essentially trivial to trigger the GC after every
> resource; if someone with a large process is in a position to test, I
> can provide a quick patch.

I'm happy to test it.

cheers pete

Peter Meier

unread,
Dec 2, 2009, 6:01:13 PM12/2/09
to puppe...@googlegroups.com
>> Mainly due to the following reason:
>>
>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>>
>> 22319 root 18 0 530m 437m 3808 R 98.2 29.1 6:37.71 puppetd
>>
>>
>> and this will stay in memory at this size between puppet runs. I can
>> use
>> this memory during non-puppet runs for other things. ;)
>>
>> This host has a localconfig.yml of 3.3MB, doing a lot of file stuff
>> (creating and managing directories/vhosts-configs from templates/etc.
>> for tiny webhostings).
>> 0.25 improved the situation heavily on the master and the client.
>> But on
>> the client I would say it improved more on the performance side than
>> on
>> the memory side.
>
> This is pretty gross. :/
>
> Do you by chance have the ability to do a dump of what's in memory or
> anything, so we can get an idea of what the heck is being held there?

if you point me to some documention which can guides me (or at least
give some idea) of how to do that, I could give it a try.

>> Beware: this is really more talking at large.
>> On the other side there have been other reports about checksumming
>> problems, reading content twice etc. For example the "Checksum
>> Behavior
>> Change"-thread on this list or bug #2630. But I have n't yet been able
>> to look into them in detail, so I'm more talking at large. Question:
>> the
>> bug discussed in "Checksum Behavior Change" have it been fixed? Paul
>> Lathrop?
>
> Not sure if it's been fixed yet, but it will be for 0.25.2.

actually it is targeted for rowlf based on Markus' scope.

cheers pete

Luke Kanies

unread,
Dec 2, 2009, 6:45:37 PM12/2/09
to puppe...@googlegroups.com
On Dec 2, 2009, at 3:01 PM, Peter Meier wrote:

>>> Mainly due to the following reason:
>>>
>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>>>
>>> 22319 root 18 0 530m 437m 3808 R 98.2 29.1 6:37.71 puppetd
>>>
>>>
>>> and this will stay in memory at this size between puppet runs. I can
>>> use
>>> this memory during non-puppet runs for other things. ;)
>>>
>>> This host has a localconfig.yml of 3.3MB, doing a lot of file stuff
>>> (creating and managing directories/vhosts-configs from templates/
>>> etc.
>>> for tiny webhostings).
>>> 0.25 improved the situation heavily on the master and the client.
>>> But on
>>> the client I would say it improved more on the performance side than
>>> on
>>> the memory side.
>>
>> This is pretty gross. :/
>>
>> Do you by chance have the ability to do a dump of what's in memory or
>> anything, so we can get an idea of what the heck is being held there?
>
> if you point me to some documention which can guides me (or at least
> give some idea) of how to do that, I could give it a try.

http://reductivelabs.com/trac/puppet/wiki/PuppetIntrospection

>>> Beware: this is really more talking at large.
>>> On the other side there have been other reports about checksumming
>>> problems, reading content twice etc. For example the "Checksum
>>> Behavior
>>> Change"-thread on this list or bug #2630. But I have n't yet been
>>> able
>>> to look into them in detail, so I'm more talking at large. Question:
>>> the
>>> bug discussed in "Checksum Behavior Change" have it been fixed? Paul
>>> Lathrop?
>>
>> Not sure if it's been fixed yet, but it will be for 0.25.2.
>
> actually it is targeted for rowlf based on Markus' scope.

Ah, ok.


--
Writing is not necessarily something to be ashamed of, but do it in
private and wash your hands afterwards. --Robert Heinlein

Luke Kanies

unread,
Dec 2, 2009, 6:52:07 PM12/2/09
to puppe...@googlegroups.com
diff --git a/lib/puppet/transaction.rb b/lib/puppet/transaction.rb
index d04856d..7e25ff7 100644
--- a/lib/puppet/transaction.rb
+++ b/lib/puppet/transaction.rb
@@ -103,6 +103,8 @@ class Transaction
end
end

+ GC.start
+
resourceevents
end


Note that this should add a significant amount of CPU time to your run.

--
Life is too short for traffic. --Dan Bellack

Peter Meier

unread,
Dec 2, 2009, 7:15:41 PM12/2/09
to puppe...@googlegroups.com
> Note that this should add a significant amount of CPU time to your run.

it did. but it also didn't really help, as the memory usage was already
that high as reported while caching the catalog, but not yet applying
the catalog. So in my opinion it's not yet in transaction.

I'm currently building 1.8.6 from fc12 on centos. I have found some
fedora bugs talking about memory leaks in the version I'm currently running.

Fedora/RedHat is skipping 1.8.7, but there isn't yet anything planned
for 1.9. According do Fedora Devs 1.8.7 is just a transition version to
1.9 and latest 1.8.6 should be as stable and good. Let's see.

cheers pete

Luke Kanies

unread,
Dec 2, 2009, 7:37:16 PM12/2/09
to puppe...@googlegroups.com
On Dec 2, 2009, at 4:15 PM, Peter Meier wrote:

>> Note that this should add a significant amount of CPU time to your
>> run.
>
> it did. but it also didn't really help, as the memory usage was
> already
> that high as reported while caching the catalog, but not yet applying
> the catalog. So in my opinion it's not yet in transaction.

Wait... your client is taking 500mb by the time its catalog is
downloaded?

Erm. I don't really know what to say to that. Could instantiating
the catalog really be the expensive memory-wise?

Taking this to IRC for more info.

> I'm currently building 1.8.6 from fc12 on centos. I have found some
> fedora bugs talking about memory leaks in the version I'm currently
> running.
>
> Fedora/RedHat is skipping 1.8.7, but there isn't yet anything planned
> for 1.9. According do Fedora Devs 1.8.7 is just a transition version
> to
> 1.9 and latest 1.8.6 should be as stable and good. Let's see.


--
The conception of two people living together for twenty-five years
without having a cross word suggests a lack of spirit only to be
admired in sheep. --Alan Patrick Herbert

Peter Meier

unread,
Dec 2, 2009, 8:45:24 PM12/2/09
to puppe...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> I'm currently building 1.8.6 from fc12 on centos. I have found some
> fedora bugs talking about memory leaks in the version I'm currently running.

a first test shows that it doesn't change anything, at least not on the
client.

But we have done some other progress during IRC. Maybe we find out
something there. (Memory is allocated during catalog caching...)

cheers pete
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAksXGC0ACgkQbwltcAfKi3+iLQCdFMVJGnSoQjS0d8etMPnyKxPW
eRoAn3RRgK/Z9OBzZ6+z3SZiJhqCZtiv
=leek
-----END PGP SIGNATURE-----

Peter Meier

unread,
Dec 3, 2009, 8:18:13 AM12/3/09
to puppe...@googlegroups.com
>> I'm currently building 1.8.6 from fc12 on centos. I have found some
>> fedora bugs talking about memory leaks in the version I'm currently running.
>
> a first test shows that it doesn't change anything, at least not on the
> client.
>
> But we have done some other progress during IRC. Maybe we find out
> something there. (Memory is allocated during catalog caching...)

so brice had the idea to comment out line 40 (writelock...) in
lib/puppet/indirector/yaml.rb and the result was that memory footprint
was reduced to: 300M VIRT and 200M RES comparing to 530M VIRT and
430M RES before.

Brice then assumed (and I share this opinion) that yaml is duplicating
the objects.

so far some more information...

cheers pete


Markus Roberts

unread,
Dec 3, 2009, 3:14:13 PM12/3/09
to puppet-dev
> so brice had the idea to comment out line 40 (writelock...) in
> lib/puppet/indirector/yaml.rb and the result was that memory footprint
> was reduced to: 300M VIRT and 200M RES comparing to 530M VIRT  and
> 430M RES before.
>
> Brice then assumed (and I share this opinion) that yaml is duplicating
> the objects.

I'd concur.

IIRC syck/yaml process can effectively wind up making two copies of
the data (the internalized form and the in-ram serialized form--I
forget their specific names for these), plus the loop-detection hash
thus roughly tripling the memory usage. You're supposed to be able to
hook in at the various translation points (see the YAML spec for
details) and transform the data if you like, but so far as I've ever
heard, no one ever does.

If your data is a tree/forest (no loops, no shared refs) and you know
this up front it's possible to YAML-serialize in O(1) (well, actually
O(max tree depth), but that's generally so small as not to matter) but
not with Ruby's built in syck/yaml setup.

-- Markus

Brice Figureau

unread,
Dec 3, 2009, 4:46:45 PM12/3/09
to puppe...@googlegroups.com
On 03/12/09 21:14, Markus Roberts wrote:
>> so brice had the idea to comment out line 40 (writelock...) in
>> lib/puppet/indirector/yaml.rb and the result was that memory footprint
>> was reduced to: 300M VIRT and 200M RES comparing to 530M VIRT and
>> 430M RES before.
>>
>> Brice then assumed (and I share this opinion) that yaml is duplicating
>> the objects.
>
> I'd concur.
>
> IIRC syck/yaml process can effectively wind up making two copies of
> the data (the internalized form and the in-ram serialized form--I
> forget their specific names for these), plus the loop-detection hash
> thus roughly tripling the memory usage. You're supposed to be able to
> hook in at the various translation points (see the YAML spec for
> details) and transform the data if you like, but so far as I've ever
> heard, no one ever does.

That's more or less what I was thinking.

> If your data is a tree/forest (no loops, no shared refs) and you know
> this up front it's possible to YAML-serialize in O(1) (well, actually
> O(max tree depth), but that's generally so small as not to matter) but
> not with Ruby's built in syck/yaml setup.

Is it time to dump YAML in favor of something else (the pson catalog
comes to mind since we already have it in the serialized form) for local
catalog persistence?

James Turnbull

unread,
Dec 3, 2009, 6:05:49 PM12/3/09
to puppe...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Brice Figureau wrote:
>> If your data is a tree/forest (no loops, no shared refs) and you know
>> this up front it's possible to YAML-serialize in O(1) (well, actually
>> O(max tree depth), but that's generally so small as not to matter) but
>> not with Ruby's built in syck/yaml setup.
>
> Is it time to dump YAML in favor of something else (the pson catalog
> comes to mind since we already have it in the serialized form) for local
> catalog persistence?

+1

Regards

James Turnbull

- --
Author of:
* Pro Linux System Administration (http://tinyurl.com/linuxadmin)
* Pulling Strings with Puppet (http://tinyurl.com/pupbook)
* Pro Nagios 2.0 (http://tinyurl.com/pronagios)
* Hardening Linux (http://tinyurl.com/hardeninglinux)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEVAwUBSxhETSFa/lDkFHAyAQJaFQf+LTusdBWwI8OI9Gjj7dgkRY2NjZaL2Qsa
XpqTIRnSBdaV64/8vjf6SZKRUztzkGHIdW6TGn5mJ5Xwig9yBo5GEoYnkKaWt0Bp
uL9mk2iy6ykvVVyHNPd7t60hIQ7CoezGAYIDX3Sy5b5NHfIlwIUO1Mv5JufxTkU2
hWCnEzQyY8hqzgC0VyJsU839EgD+xDpeqF7eTL3x875VfADE/bBno+5lwNAGcJRe
K+Ry0gFuIf0IV3Hrenzn6KYPSbzQWbeVcBQ1vGo80zXoYNbwLyxbnBHGBza44BRz
/pcpjWPh1L5wxRx8FDjXm/mw2sCtz9z8BLjEljf9O4dC8K/a+yjm4A==
=aSpu
-----END PGP SIGNATURE-----

Markus Roberts

unread,
Dec 3, 2009, 6:30:23 PM12/3/09
to puppet-dev
> Is it time to dump YAML in favor of something else (the pson catalog
> comes to mind since we already have it in the serialized form) for local
> catalog persistence?

Call me cynical (I'm having one of those days), but we should probably
at the very least 1) directly confirm that this is the problem or a
significant component thereof and 2) confirm that whatever we're
thinking of going to doesn't have analogous problems 3) consider other
ramifications before we jump.

-- Markus

Luke Kanies

unread,
Dec 3, 2009, 10:48:41 PM12/3/09
to puppe...@googlegroups.com
While I agree that we should do this based on testing, but we
definitely should dump yaml on the client. Switching to storing the
catalogs as json should be downright easy, although we annoyingly have
to create new terminus classes for it.

--
I worry that the person who thought up Muzak may be thinking up
something else. -- Lily Tomlin

Markus Roberts

unread,
Dec 4, 2009, 1:51:25 PM12/4/09
to puppet-dev
On Thu, Dec 3, 2009 at 7:48 PM, Luke Kanies <lu...@madstop.com> wrote:
> On Dec 3, 2009, at 3:30 PM, Markus Roberts wrote:
>
>>> Is it time to dump YAML in favor of something else (the pson catalog
>>> comes to mind since we already have it in the serialized form) for
>>> local
>>> catalog persistence?
>>
>> Call me cynical (I'm having one of those days), but we should probably
>> at the very least 1) directly confirm that this is the problem or a
>> significant component thereof and 2) confirm that whatever we're
>> thinking of going to doesn't have analogous problems 3) consider other
>> ramifications before we jump.
>
>
> While I agree that we should do this based on testing, but we
> definitely should dump yaml on the client.  Switching to storing the
> catalogs as json should be downright easy, although we annoyingly have
> to create new terminus classes for it.

Just looking through the code it appears json/pson will have the same
problem--it looks as if constructs a duplicate of the data as a hash,
then produces a copy of that as a string, which it emits, for a total
of ~3x memory use, same as yaml.

-- Markus

br...@reductivelabs.com

unread,
Dec 4, 2009, 1:59:54 PM12/4/09
to puppe...@googlegroups.com

--

It's too bad that (presumably for the sake of being aggressively cross-platform) we can't use the YAJL[1] binding, which parses (and possibly emits) in streams, and seems to have a much better memory profile.
 

--
Bruce Williams
Developer @ Reductive Labs, http://reductivelabs.com

Brice Figureau

unread,
Dec 4, 2009, 2:00:01 PM12/4/09
to puppe...@googlegroups.com
We receive the catalog in json/pson format, right?
Couldn't we cache this large string instead of re-formatting what we
just unformatted?

This way no need to consume the memory.

mar...@reality.com

unread,
Dec 4, 2009, 2:17:45 PM12/4/09
to puppe...@googlegroups.com
Until we get rid of the preferred serialization format it could come in as who-knows-what (Marsall, anyone?)

--Markus
Sent via BlackBerry from T-Mobile

Luke Kanies

unread,
Dec 4, 2009, 2:22:46 PM12/4/09
to puppe...@googlegroups.com
On Dec 4, 2009, at 11:17 AM, mar...@reality.com wrote:

> Until we get rid of the preferred serialization format it could come
> in as who-knows-what (Marsall, anyone?)

That's only for old code, right? If you're using the old system, then
essentially all bets are off. I'm not concerned about fixing this for
that.

--
Ninety-eight percent of the adults in this country are decent,
hard-working, honest Americans. It's the other lousy two percent that
get all the publicity. But then--we elected them. --Lily Tomlin

Luke Kanies

unread,
Dec 4, 2009, 2:28:21 PM12/4/09
to puppe...@googlegroups.com
Well, no need beyond the first time we do it.

I agree, though, our caching system needs to somehow be triggered
during deserialization; it's an architecture problem I've known was
there for a while but I don't see a way around it, unless we special-
case catalogs so they don't use the normal caching mechanism.

At this point, we have these layers:

REST (receives json, returns a Catalog)
Indirector (receives Catalog, caches if necessary)
Configurer (receives Catalog)

We would need the caching to happen at the REST layer and then *not*
the Indirector layer, and we'd need the caching to be a more direct
"write this json to disk" rather than the normal method, which would
involve a File terminus of some kind receiving a Catalog instance,
serializing it, and writing it.

At this point, my temptation is to say that everything is different
enough about the catalogs that it probably makes more sense to special-
case them, rather than try to modify the architecture to fit their
needs. I'd just have a special hook in the Catalog REST terminus that
knows how to use the cached copy if necessary.

Comments?


--
A nation is a society united by delusions about its ancestry and by
common hatred of its neighbors. -- William Ralph Inge

Brice Figureau

unread,
Dec 4, 2009, 2:36:09 PM12/4/09
to puppe...@googlegroups.com
Of course. Only using YAJL as Bruce suggested would allow us to not
consume the memory, but since that's not really an option...

> I agree, though, our caching system needs to somehow be triggered
> during deserialization; it's an architecture problem I've known was
> there for a while but I don't see a way around it, unless we special-
> case catalogs so they don't use the normal caching mechanism.
>
> At this point, we have these layers:
>
> REST (receives json, returns a Catalog)
> Indirector (receives Catalog, caches if necessary)
> Configurer (receives Catalog)

Correct, but that doesn't prevent us to keep the serialized form in the
current request. This way in the caching layer we just have to write it.
If the currently serialized format doesn't suit our needs, we can trash
it and reformat the "live" instance.

> We would need the caching to happen at the REST layer and then *not*
> the Indirector layer, and we'd need the caching to be a more direct
> "write this json to disk" rather than the normal method, which would
> involve a File terminus of some kind receiving a Catalog instance,
> serializing it, and writing it.
>
> At this point, my temptation is to say that everything is different
> enough about the catalogs that it probably makes more sense to special-
> case them, rather than try to modify the architecture to fit their
> needs. I'd just have a special hook in the Catalog REST terminus that
> knows how to use the cached copy if necessary.
>
> Comments?

Obviously I prefer my idea :-)
But I understand why we'd want to exempt the catalog to cache through
the indirector.

So I bet it's up to you to chose what you prefer :-)

Luke Kanies

unread,
Dec 4, 2009, 2:43:00 PM12/4/09
to puppe...@googlegroups.com
Yeah, I don't know how we could; I guess preferentially load it and
alias all of the methods it creates? I have no idea.

If we could ever get Rails out of the Puppet memory space, this would
be much easier.

>> I agree, though, our caching system needs to somehow be triggered
>> during deserialization; it's an architecture problem I've known was
>> there for a while but I don't see a way around it, unless we special-
>> case catalogs so they don't use the normal caching mechanism.
>>
>> At this point, we have these layers:
>>
>> REST (receives json, returns a Catalog)
>> Indirector (receives Catalog, caches if necessary)
>> Configurer (receives Catalog)
>
> Correct, but that doesn't prevent us to keep the serialized form in
> the
> current request. This way in the caching layer we just have to write
> it.
> If the currently serialized format doesn't suit our needs, we can
> trash
> it and reformat the "live" instance.

Ah. Yeah, that should be pretty straightforward, I think.

>> We would need the caching to happen at the REST layer and then *not*
>> the Indirector layer, and we'd need the caching to be a more direct
>> "write this json to disk" rather than the normal method, which would
>> involve a File terminus of some kind receiving a Catalog instance,
>> serializing it, and writing it.
>>
>> At this point, my temptation is to say that everything is different
>> enough about the catalogs that it probably makes more sense to
>> special-
>> case them, rather than try to modify the architecture to fit their
>> needs. I'd just have a special hook in the Catalog REST terminus
>> that
>> knows how to use the cached copy if necessary.
>>
>> Comments?
>
> Obviously I prefer my idea :-)
> But I understand why we'd want to exempt the catalog to cache through
> the indirector.
>
> So I bet it's up to you to chose what you prefer :-)

No, your method is better. Care to open a ticket for it?

--
Commit suicide. A hundred thousand lemmings can't be wrong.

Brice Figureau

unread,
Dec 4, 2009, 4:34:15 PM12/4/09
to puppe...@googlegroups.com
Done in #2892:
http://projects.reductivelabs.com/issues/2892

I didn't set the target, I don't know if you want to have this for
0.25.2 or Rowlf.

James Turnbull

unread,
Dec 4, 2009, 4:52:41 PM12/4/09
to puppe...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Brice Figureau wrote:
> Done in #2892:
> http://projects.reductivelabs.com/issues/2892
>
> I didn't set the target, I don't know if you want to have this for
> 0.25.2 or Rowlf.

I vote Rowlf given we're already behind schedule on .2 and its a
reasonably big change. Unless someone thinks otherwise and can
pitch a good argument.

Regards

James Turnbull

- --
Author of:
* Pro Linux System Administration (http://tinyurl.com/linuxadmin)
* Pulling Strings with Puppet (http://tinyurl.com/pupbook)
* Pro Nagios 2.0 (http://tinyurl.com/pronagios)
* Hardening Linux (http://tinyurl.com/hardeninglinux)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEVAwUBSxmEqSFa/lDkFHAyAQLCKAgA52bw/O/2WNlX3ZkPkI80WiFwEJqqIZr6
PEcR4wLKuCxAXQPL63Ar83cYeLh5WHNKDhrGuVSFoJiow6blcgmpLwGvuBmp7Jqz
u9AMkf/H5PTYbBIQrNtR9xZRFDDZEA8MGKWBkoRZJOprQu7Wh/FcxiO5XDwyHoHR
+c/9EARntbhoPLvRbSadhvUixka8ch1BoaacGrvvC9eiKopj9fnMLdHRZSP0V/Bc
MBXTD/wO8dY6zh2ipNcNK4n2hjSveCpAJPOjsW+tu3puP/6ona5+HEKL9htRC5QE
sbGkXmIZl6CO/Nf9l64nl02vOnGRYRUnpE7vphJgAJLtUq8bYurvAg==
=0o0g
-----END PGP SIGNATURE-----

Luke Kanies

unread,
Dec 4, 2009, 5:01:58 PM12/4/09
to puppe...@googlegroups.com
On Dec 4, 2009, at 1:52 PM, James Turnbull wrote:

> Brice Figureau wrote:
>> Done in #2892:
>> http://projects.reductivelabs.com/issues/2892
>>
>> I didn't set the target, I don't know if you want to have this for
>> 0.25.2 or Rowlf.
>
> I vote Rowlf given we're already behind schedule on .2 and its a
> reasonably big change. Unless someone thinks otherwise and can
> pitch a good argument.


I agree, with the caveat that it should be a simple patch that people
could apply if they wanted, I think.

--
Hegel was right when he said that we learn from history that man can
never learn anything from history. -- George Bernard Shaw

Luke Kanies

unread,
Dec 6, 2009, 3:44:07 PM12/6/09
to puppe...@googlegroups.com
On Dec 4, 2009, at 1:34 PM, Brice Figureau wrote:

>>> Obviously I prefer my idea :-)
>>> But I understand why we'd want to exempt the catalog to cache
>>> through
>>> the indirector.
>>>
>>> So I bet it's up to you to chose what you prefer :-)
>>
>> No, your method is better. Care to open a ticket for it?
>
> Done in #2892:
> http://projects.reductivelabs.com/issues/2892
>
> I didn't set the target, I don't know if you want to have this for
> 0.25.2 or Rowlf.


Ok, I've done the first little bit here (tickets/master/2892 in my
repo), but it just adds the necessary support to the request.

The hard part is what's left - modifying the Indirection's use of
caching to make it cache this serialized data instead of redoing it all.

I've started work on #1943, which will create a 'file' terminus
capable of storing content serialized in any form. This will enable
us to skip further one-off hacks in the indirection. It looks like
it's relatively straightforward; the only complicated bit was that the
unused Checksum class was already using the File terminus type, but
for now I just removed it and it will be redone when we add that
functionality.

--
The covers of this book are too far apart. -- Ambrose Bierce

Markus Roberts

unread,
Dec 6, 2009, 7:42:22 PM12/6/09
to puppet-dev
> Of course. Only using YAJL as Bruce suggested would allow us to not
> consume the memory, but since that's not really an option...

There's nothing stopping us from doing stream serialization ourselves;
it's easy to write and (IMHO) easier to maintain than object mungling
system.

-- Markus
Reply all
Reply to author
Forward
0 new messages