This patch allows to write such manifests:
file {
"/path/to/deep/hierarchy":
owner => brice, recurse => true, checksum => none
}
Then puppet(d) won't checksum those files, just manage their ownership.
Signed-off-by: Brice Figureau <brice-...@daysofwonder.com>
---
lib/puppet/file_serving/fileset.rb | 4 ++--
lib/puppet/file_serving/terminus_helper.rb | 1 +
lib/puppet/type/file.rb | 9 ++++++++-
lib/puppet/type/file/checksum.rb | 3 ++-
lib/puppet/util/checksums.rb | 5 +++++
spec/unit/file_serving/fileset.rb | 6 ++++++
spec/unit/file_serving/terminus_helper.rb | 10 ++++++++++
spec/unit/type/file.rb | 7 +++++++
spec/unit/type/file/checksum.rb | 28 ++++++++++++++++++++++++++++
spec/unit/util/checksums.rb | 8 +++++++-
10 files changed, 76 insertions(+), 5 deletions(-)
create mode 100644 spec/unit/type/file/checksum.rb
diff --git a/lib/puppet/file_serving/fileset.rb b/lib/puppet/file_serving/fileset.rb
index 50e4e1e..a66d935 100644
--- a/lib/puppet/file_serving/fileset.rb
+++ b/lib/puppet/file_serving/fileset.rb
@@ -9,7 +9,7 @@ require 'puppet/file_serving/metadata'
# Operate recursively on a path, returning a set of file paths.
class Puppet::FileServing::Fileset
attr_reader :path, :ignore, :links
- attr_accessor :recurse, :recurselimit
+ attr_accessor :recurse, :recurselimit, :checksum_type
# Produce a hash of files, with merged so that earlier files
# with the same postfix win. E.g., /dir1/subfile beats /dir2/subfile.
@@ -105,7 +105,7 @@ class Puppet::FileServing::Fileset
end
def initialize_from_request(request)
- [:links, :ignore, :recurse, :recurselimit].each do |param|
+ [:links, :ignore, :recurse, :recurselimit, :checksum_type].each do |param|
if request.options.include?(param) # use 'include?' so the values can be false
value = request.options[param]
elsif request.options.include?(param.to_s)
diff --git a/lib/puppet/file_serving/terminus_helper.rb b/lib/puppet/file_serving/terminus_helper.rb
index c88bacc..6f5d52b 100644
--- a/lib/puppet/file_serving/terminus_helper.rb
+++ b/lib/puppet/file_serving/terminus_helper.rb
@@ -16,6 +16,7 @@ module Puppet::FileServing::TerminusHelper
Puppet::FileServing::Fileset.merge(*filesets).collect do |file, base_path|
inst = model.new(base_path, :relative_path => file)
+ inst.checksum_type = request.options[:checksum_type] if request.options[:checksum_type]
inst.links = request.options[:links] if request.options[:links]
inst.collect
inst
diff --git a/lib/puppet/type/file.rb b/lib/puppet/type/file.rb
index 2f5b5df..fdb5a35 100644
--- a/lib/puppet/type/file.rb
+++ b/lib/puppet/type/file.rb
@@ -592,7 +592,14 @@ module Puppet
end
def perform_recursion(path)
- Puppet::FileServing::Metadata.search(path, :links => self[:links], :recurse => (self[:recurse] == :remote ? true : self[:recurse]), :recurselimit => self[:recurselimit], :ignore => self[:ignore])
+ params = {
+ :links => self[:links],
+ :recurse => (self[:recurse] == :remote ? true : self[:recurse]),
+ :recurselimit => self[:recurselimit],
+ :ignore => self[:ignore]
+ }
+ params[:checksum_type] = self[:checksum] if self[:checksum] == :none
+ Puppet::FileServing::Metadata.search(path, params)
end
# Remove any existing data. This is only used when dealing with
diff --git a/lib/puppet/type/file/checksum.rb b/lib/puppet/type/file/checksum.rb
index 23a3e5a..0c45aad 100755
--- a/lib/puppet/type/file/checksum.rb
+++ b/lib/puppet/type/file/checksum.rb
@@ -23,7 +23,7 @@ Puppet::Type.type(:file).newproperty(:checksum) do
@unmanaged = true
- @validtypes = %w{md5 md5lite timestamp mtime time}
+ @validtypes = %w{md5 md5lite timestamp mtime time none}
def self.validtype?(type)
@validtypes.include?(type)
@@ -52,6 +52,7 @@ Puppet::Type.type(:file).newproperty(:checksum) do
cache(type, sum)
return type
else
+ return :none if value.nil? or value.to_s == "" or value.to_s == "none"
if FileTest.directory?(@resource[:path])
return :time
elsif @resource[:source] and value.to_s != "md5"
diff --git a/lib/puppet/util/checksums.rb b/lib/puppet/util/checksums.rb
index 98bf5de..39477ee 100644
--- a/lib/puppet/util/checksums.rb
+++ b/lib/puppet/util/checksums.rb
@@ -68,6 +68,11 @@ module Puppet::Util::Checksums
File.stat(filename).send(:ctime)
end
+ # Return a "no checksum"
+ def none_file(filename)
+ ""
+ end
+
private
# Perform an incremental checksum on a file.
diff --git a/spec/unit/file_serving/fileset.rb b/spec/unit/file_serving/fileset.rb
index 55cc2a9..c03522d 100755
--- a/spec/unit/file_serving/fileset.rb
+++ b/spec/unit/file_serving/fileset.rb
@@ -42,6 +42,12 @@ describe Puppet::FileServing::Fileset, " when initializing" do
set.links.should == :manage
end
+ it "should accept a 'checksum_type' option" do
+ File.expects(:lstat).with("/some/file").returns stub("stat")
+ set = Puppet::FileServing::Fileset.new("/some/file", :checksum_type => :test)
+ set.checksum_type.should == :test
+ end
+
it "should fail if 'links' is set to anything other than :manage or :follow" do
proc { Puppet::FileServing::Fileset.new("/some/file", :links => :whatever) }.should raise_error(ArgumentError)
end
diff --git a/spec/unit/file_serving/terminus_helper.rb b/spec/unit/file_serving/terminus_helper.rb
index d0c06f1..98c64b7 100755
--- a/spec/unit/file_serving/terminus_helper.rb
+++ b/spec/unit/file_serving/terminus_helper.rb
@@ -76,6 +76,16 @@ describe Puppet::FileServing::TerminusHelper do
@helper.path2instances(@request, "/my/file")
end
+ it "should set the request checksum_type if one is provided" do
+ @one.expects(:checksum_type=).with :test
+ @two.expects(:checksum_type=).with :test
+ @model.expects(:new).returns(@one)
+ @model.expects(:new).returns(@two)
+
+ @request.options[:checksum_type] = :test
+ @helper.path2instances(@request, "/my/file")
+ end
+
it "should collect the instance's attributes" do
@one.expects(:collect)
@two.expects(:collect)
diff --git a/spec/unit/type/file.rb b/spec/unit/type/file.rb
index 1b3fe6a..b5963a6 100755
--- a/spec/unit/type/file.rb
+++ b/spec/unit/type/file.rb
@@ -320,6 +320,13 @@ describe Puppet::Type.type(:file) do
@file.expects(:newchild).with("my/file").returns "fiebar"
@file.recurse_local.should == {"my/file" => "fiebar"}
end
+
+ it "should set checksum_type to none if this file checksum is none" do
+ @file[:checksum] = :none
+ Puppet::FileServing::Metadata.expects(:search).with { |path,params| params[:checksum_type] == :none }.returns [@metadata]
+ @file.expects(:newchild).with("my/file").returns "fiebar"
+ @file.recurse_local
+ end
end
it "should have a method for performing link recursion" do
diff --git a/spec/unit/type/file/checksum.rb b/spec/unit/type/file/checksum.rb
new file mode 100644
index 0000000..5d715d1
--- /dev/null
+++ b/spec/unit/type/file/checksum.rb
@@ -0,0 +1,28 @@
+#!/usr/bin/env ruby
+
+Dir.chdir(File.dirname(__FILE__)) { (s = lambda { |f| File.exist?(f) ? require(f) : Dir.chdir("..") { s.call(f) } }).call("spec/spec_helper.rb") }
+
+checksum = Puppet::Type.type(:file).attrclass(:checksum)
+describe checksum do
+ before do
+ # Wow that's a messy interface to the resource.
+ @resource = stub 'resource', :[] => nil, :[]= => nil, :property => nil, :newattr => nil, :parameter => nil
+ end
+
+ it "should be a subclass of Property" do
+ checksum.superclass.must == Puppet::Property
+ end
+
+ it "should have default checksum of :md5" do
+ @checksum = checksum.new(:resource => @resource)
+ @checksum.checktype.should == :md5
+ end
+
+ [:none, nil, ""].each do |ck|
+ it "should use a none checksum for #{ck.inspect}" do
+ @checksum = checksum.new(:resource => @resource)
+ @checksum.should = "none"
+ @checksum.checktype.should == :none
+ end
+ end
+end
diff --git a/spec/unit/util/checksums.rb b/spec/unit/util/checksums.rb
index d31d7a0..615ed90 100755
--- a/spec/unit/util/checksums.rb
+++ b/spec/unit/util/checksums.rb
@@ -14,7 +14,7 @@ describe Puppet::Util::Checksums do
end
content_sums = [:md5, :md5lite, :sha1, :sha1lite]
- file_only = [:ctime, :mtime]
+ file_only = [:ctime, :mtime, :none]
content_sums.each do |sumtype|
it "should be able to calculate %s sums from strings" % sumtype do
@@ -104,4 +104,10 @@ describe Puppet::Util::Checksums do
end
end
end
+
+ describe "when using the none checksum" do
+ it "should return an empty string" do
+ @summer.none_file("/my/file").should == ""
+ end
+ end
end
--
1.6.6.1
--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To post to this group, send email to puppe...@googlegroups.com.
To unsubscribe from this group, send email to puppet-dev+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/puppet-dev?hl=en.
Indeed, but I think this patch is close to this.
> That said, if this hot patch
> works around the 100% CPU usage on recursive chowns bug in the user list
> thread, I think it's desirable to get this into the 0.25.x branch ASAP.
The real underlying issue is that file metadata terminus doesn't care
about the file {} checksum settings, it always compute the md5 checksum
of said files. Why? Because this is what is correct for remote files
metadatas.
Unfortunately the same code is called in the direct file serving
terminus system, where the checksum should not be computed (it will be
done by the checksum property).
This patch is not 100% correct (but my minimal testing show that it
works), and I don't expect it to be merged directly in 0.25.x (on which
it is based).
It basically does two things:
* allow setting checksum => none
* make sure file metadata doesn't compute anything if checksum is none
(but only for the direct file server terminus).
A more correct patch would be to prevent checksums to be computed in
local file metadata search.
--
Brice Figureau
My Blog: http://www.masterzen.fr/
I think the "real" fix for the underlying problem is fixing the regression bug introduced in the file serving refactor. That said, if this hot patch works around the 100% CPU usage on recursive chowns bug in the user list thread, I think it's desirable to get this into the 0.25.x branch ASAP.+1
local file metadata search.
> On 13/03/10 21:33, Rein Henrichs wrote:
>> Brice,
>>
>> I think the "real" fix for the underlying problem is fixing the
>> regression
>> bug introduced in the file serving refactor.
>
> Indeed, but I think this patch is close to this.
>
>> That said, if this hot patch
>> works around the 100% CPU usage on recursive chowns bug in the user
>> list
>> thread, I think it's desirable to get this into the 0.25.x branch
>> ASAP.
>
> The real underlying issue is that file metadata terminus doesn't care
> about the file {} checksum settings, it always compute the md5
> checksum
> of said files. Why? Because this is what is correct for remote files
> metadatas.
This actually shouldn't always need to be true - it's definitely
unlikely that we wouldn't be interested in the remote md5, but it's at
least possible.
> Unfortunately the same code is called in the direct file serving
> terminus system, where the checksum should not be computed (it will be
> done by the checksum property).
>
> This patch is not 100% correct (but my minimal testing show that it
> works), and I don't expect it to be merged directly in 0.25.x (on
> which
> it is based).
Why don't you expect that?
> It basically does two things:
> * allow setting checksum => none
> * make sure file metadata doesn't compute anything if checksum is none
> (but only for the direct file server terminus).
>
> A more correct patch would be to prevent checksums to be computed in
> local file metadata search.
I suppose that would be sufficient - just have a 'local' flag,
effectively, and an additional flag saying "i don't care about the
checksum", but... then the latter becomes a superset of the former, so
skip the former, use what you have here, and move on.
Or am I missing something?
--
SCSI is *not* magic. There are fundamental technical reasons
why it is necessary to sacrifice a young goat to your SCSI chain
now and then.
Local metadata are always called by the direct file server. This
instance can set a flag on each filesets to not compute checksum and
we're done.
OK, it makes sense, but I have hard time thinking (in the current puppet
system) when we wouldn't want to have remote metadatas including the
checksum?
>> Unfortunately the same code is called in the direct file serving
>> terminus system, where the checksum should not be computed (it will be
>> done by the checksum property).
>>
>> This patch is not 100% correct (but my minimal testing show that it
>> works), and I don't expect it to be merged directly in 0.25.x (on which
>> it is based).
>
> Why don't you expect that?
My sentence wasn't complete: it is missing an "as is".
When I wrote this I just had thought that directing the direct file
server to say to the fileset that they are local (and as such don't have
to compute a checksum) would be a better direction.
I must also add that I did only minimal testing of said patch (ie tried
only in the lab with various files (ie sourced, local, recursive). But
the file resource is so vast, I didn't check managing links, copying
files, etc, were still working.
>> It basically does two things:
>> * allow setting checksum => none
>> * make sure file metadata doesn't compute anything if checksum is none
>> (but only for the direct file server terminus).
>>
>> A more correct patch would be to prevent checksums to be computed in
>> local file metadata search.
>
> I suppose that would be sufficient - just have a 'local' flag,
> effectively, and an additional flag saying "i don't care about the
> checksum", but... then the latter becomes a superset of the former, so
> skip the former, use what you have here, and move on.
>
> Or am I missing something?
I don't think so, but I'm quite ignorant about the interaction between
the checksum file property and the rest of the system, so I want to make
sure I'm not introducing any new regressions.
Yes, this was the less intrusive way of doing this.
> The only change I would make is to change it so this is the default -
> checksum should default to 'none', and only default to 'md5' if
> something like 'source' is specified.
That would make sense. I'll try to see if I can come with something that
acts like this.
> This behaviour will be a bit harder to test all the behaviours of, but
> it doesn't require that people change their Puppet code, and it would
> behave as people expect.
That's correct and looks better on drawing board.
I just need to see how the checksum property can know if there is a
source or a content property in the file resource.
> I think the "real" fix for the underlying problem is fixing the regression
> bug introduced in the file serving refactor. That said, if this hot patch
> works around the 100% CPU usage on recursive chowns bug in the user list
> thread, I think it's desirable to get this into the 0.25.x branch ASAP.
I would really appreciate that. Thanks Brice for the patch!
cheers pete
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkuc3G0ACgkQbwltcAfKi3/7pgCgq3dMh+oY9Gk9CsEsox34JBof
FO4An3dJGhkPUsvTYGrF1giOef/jB65q
=akKt
-----END PGP SIGNATURE-----
*We* wouldn't, but there's someone out there who's using remote file
recursion just for setting owner/mode/etc, I promise. Crazy, but
probably true. (Note: I don't know that this is true, I'm just
estimating it based on experience.)
That being said, if we have a simple fix that only works for local
file serving, I'm perfectly happy with that and we should do it.
>>> Unfortunately the same code is called in the direct file serving
>>> terminus system, where the checksum should not be computed (it
>>> will be
>>> done by the checksum property).
>>>
>>> This patch is not 100% correct (but my minimal testing show that it
>>> works), and I don't expect it to be merged directly in 0.25.x (on
>>> which
>>> it is based).
>>
>> Why don't you expect that?
>
> My sentence wasn't complete: it is missing an "as is".
>
> When I wrote this I just had thought that directing the direct file
> server to say to the fileset that they are local (and as such don't
> have
> to compute a checksum) would be a better direction.
>
> I must also add that I did only minimal testing of said patch (ie
> tried
> only in the lab with various files (ie sourced, local, recursive). But
> the file resource is so vast, I didn't check managing links, copying
> files, etc, were still working.
The basic model is sound; the only confusing bits are around how
checksum is used by source and content. That needs to be refactored
anyway (e.g., I somehow broke Puppet's Tripwire-like functionality in
0.25), partially because it can be so hard to track.
I've been thinking that 'checksum' needs the treatment 'source' got -
turned into a parameter instead of a property. Theoretically,
checksum and content should be isomorphic, as long as you have source
specified (or a filebucket), so checksum should just be a utility
parameter rather than a real property. Given that, we should probably
also fix the confusion where the checksum parameter sets the checksum
type but also determines and returns the actual checksum. This is
especially stupid now that we only ever print a file's checksum rather
than its contents.
So, seems like we should switch the 'checksum' parameter to just
support the various values (md5 et al), and move all checksum
calculating into utility methods that the rest of the file resource
uses. This should hopefully clean everything up enough that it
actually becomes maintainable.
Then our tripwire-like behaviour (which is that it passively monitors
files, logging when a file's contents change) comes from tracking the
content rather than the checksum:
file { "/tmp/foobar": check => content }
This doesn't actually work right now, but I haven't had the time to
figure out why.
>>> It basically does two things:
>>> * allow setting checksum => none
>>> * make sure file metadata doesn't compute anything if checksum is
>>> none
>>> (but only for the direct file server terminus).
>>>
>>> A more correct patch would be to prevent checksums to be computed in
>>> local file metadata search.
>>
>> I suppose that would be sufficient - just have a 'local' flag,
>> effectively, and an additional flag saying "i don't care about the
>> checksum", but... then the latter becomes a superset of the former,
>> so
>> skip the former, use what you have here, and move on.
>>
>> Or am I missing something?
>
> I don't think so, but I'm quite ignorant about the interaction between
> the checksum file property and the rest of the system, so I want to
> make
> sure I'm not introducing any new regressions.
Ok.
Do you want to attempt to tackle the above refactor, would you like to
collaborate with me on it, or would you prefer to skip it for now and
just get this fix in?
--
It's not that I'm afraid to die. I just don't want to be there when it
happens. -- Woody Allen
resource[:source] and resource[:content] should work, right?
--
You will notice that BeOS has taken the best parts from all the major
operating systems and made them its own. We've got the power of the
Unix command line, the ease of use of the Macintosh interface, and
Minesweeper from Windows. -- Tyler Riti
I'll do another attempt later this week, but my first attempt miserably
failed. There is a complex intrication between content, source and
checksum, which I'm not sure I completely understood....
--
Brice Figureau
Follow the latest Puppet Community evolutions on www.planetpuppet.org!
I think so too. I'd just appreciate if someone else could test the
current patch (which is available in my github repository at
tickets/0.25.x/2929).
Yes, I think this would be much simpler.
> Theoretically,
> checksum and content should be isomorphic, as long as you have source
> specified (or a filebucket), so checksum should just be a utility
> parameter rather than a real property.
That would make sense. Checksum should just be an indication of the
checksum type. And the real checksum would just be driven by either
content or source.
> Given that, we should probably
> also fix the confusion where the checksum parameter sets the checksum
> type but also determines and returns the actual checksum. This is
> especially stupid now that we only ever print a file's checksum rather
> than its contents.
That's what made my second attempt at fixing this issue so difficult.
> So, seems like we should switch the 'checksum' parameter to just
> support the various values (md5 et al), and move all checksum
> calculating into utility methods that the rest of the file resource
> uses. This should hopefully clean everything up enough that it
> actually becomes maintainable.
>
> Then our tripwire-like behaviour (which is that it passively monitors
> files, logging when a file's contents change) comes from tracking the
> content rather than the checksum:
>
> file { "/tmp/foobar": check => content }
I'm not sure about this syntax: do you mean we should add a new property
(ie check) to reintroduce tripwire functionality?
> This doesn't actually work right now, but I haven't had the time to
> figure out why.
>
> >>> It basically does two things:
> >>> * allow setting checksum => none
> >>> * make sure file metadata doesn't compute anything if checksum is
> >>> none
> >>> (but only for the direct file server terminus).
> >>>
> >>> A more correct patch would be to prevent checksums to be computed in
> >>> local file metadata search.
> >>
> >> I suppose that would be sufficient - just have a 'local' flag,
> >> effectively, and an additional flag saying "i don't care about the
> >> checksum", but... then the latter becomes a superset of the former,
> >> so
> >> skip the former, use what you have here, and move on.
> >>
> >> Or am I missing something?
> >
> > I don't think so, but I'm quite ignorant about the interaction between
> > the checksum file property and the rest of the system, so I want to
> > make
> > sure I'm not introducing any new regressions.
>
> Ok.
>
> Do you want to attempt to tackle the above refactor, would you like to
> collaborate with me on it, or would you prefer to skip it for now and
> just get this fix in?
Frankly I'd prefer we fix the issue in 0.25.x with this patch (granted
someone else can test it with success).
I don't really have lot of time until June for puppet dev (except those
occasional fixes), but I can at least try to start something. If you
want to do it, then go ahead, that will certainly be much more effective
than my procrastination :-)
I'll be testing it this week, definitely.
This is actually a metaparam that's existed since the earliest days of
Puppet - its intent is that Puppet will not manage those properties
but will instead just track changes to them over time and alert when
changes happen.
Ok. I'll get this patch tested, and if I have some airplane time or
something see what a full checksum refactor would look like. The
'source' refactor ended up being relatively easy in the end and it had
a huge impact on maintainability.
--
The world is round; it has no point.
-- Adrienne E. Gusoff
Peter tested my patch today and found that it was working as intended
(there is no more md5 checksumming of all the files).
But he also discovered something that we didnt't even thought about: the
first run of:
file {
"/deep/hierarchy/full/of/files":
recurse => true, checksum => none, mode => 0600
}
produces many changes in the transaction (assuming the files are not
0600 of course). This in turn generates a large consumption of memory
and 100% of cpu use for a looong time (ie more than 30 min for 7.5k
files said Peter, several minutes for 2.5k on my fast computer).
Why?
After a quick search, we found that most of the time was taken around
Puppet::SimpleGraph#matching_edges or around line 218 of
Puppet::Transaction (ie I'm not sure it's the call itselft or iteration
on the result).
I didn't do a deep analysis of the problem, but to me we are generating
tons of events that we are propagating up to the root file (that spawned
all those sub-file-resources). I don't think in this case that it is
necessary, unless there is a notify/subscribe (which is not the case).
One of the problem I think (no evidence to back this up, it's only my
understanding) is that we dup the current edge in a fast paced loop. The
poor ruby GC has really no time to collect the previous edge instance so
we pile up new edge instances at a really fast pace, spending most of
it's time in the allocator (the ruby one or the libgc one, which is
costly operation).
I'll do more tests later this week on this code (I first have to
understand it of course :-)).
Anyway commenting line 218 in transaction.rb, and the aforementioned
manifest becomes really fast (if not instantaneous).
Any comments?
OK, I identified the issue (patch coming soon):
Puppet::SimpleGraph#matching_edge is simply completely inefficient in
front of lots of events for a given resource (which is our case).
The problem is that matching_edge returns the list of edges for which we
should trigger an event. In the aforementioned manifest, since there is
no subscribe/notify, the list is empty.
Unfortunately with the current code, it takes about 22s to find this
result for about 1100 events. With the (soon to be published patch), I
was able to reduce this time to 4s.
The whole issue was that to find this list of edges the process was
abusing memory allocations (ie appending to arrays again and again),
producing a large list of empty arrays which finally ends up as being
flattened and then compacted.
I rewrote this part (sorry the code is not as beautiful as it was before
I touched it :-)) to never create those transient arrays, limiting as
much as we can array appending.
This patch rewrite the code to never append or allocate an array if
there is no matching edge, or to limit those operations to their strict
minimum needed in the other cases.
Results for matching 1100 events:
* old code: 22s
* new code: 4s
This patch also helps on the memory consumption side since the GC has
almost no work to perform.
Signed-off-by: Brice Figureau <brice-...@daysofwonder.com>
---
lib/puppet/simple_graph.rb | 20 ++++++++++++++++----
lib/puppet/transaction.rb | 18 ++++++++++--------
2 files changed, 26 insertions(+), 12 deletions(-)
diff --git a/lib/puppet/simple_graph.rb b/lib/puppet/simple_graph.rb
index 5e8f5cd..591c75d 100644
--- a/lib/puppet/simple_graph.rb
+++ b/lib/puppet/simple_graph.rb
@@ -64,6 +64,14 @@ class Puppet::SimpleGraph
end
end
+ def each_out_edges
+ @adjacencies[:out].values.each do |edges|
+ edges.each do |edge|
+ yield edge
+ end
+ end
+ end
+
# The other vertex in the edge.
def other_vertex(direction, edge)
case direction
@@ -146,7 +154,7 @@ class Puppet::SimpleGraph
# Collect all of the edges that the passed events match. Returns
# an array of edges.
def matching_edges(events, base = nil)
- events.collect do |event|
+ events.inject([]) do |matching,event|
source = base || event.source
unless vertex?(source)
@@ -156,10 +164,14 @@ class Puppet::SimpleGraph
# Get all of the edges that this vertex should forward events
# to, which is the same thing as saying all edges directly below
# This vertex in the graph.
- adjacent(source, :direction => :out, :type => :edges).find_all do |edge|
- edge.match?(event.name)
+
+ if wrapper = @vertices[source]
+ wrapper.each_out_edges do |edge|
+ matching << edge if edge.match?(event.name)
+ end
end
- end.compact.flatten
+ matching
+ end
end
# Return a reversed version of this graph.
diff --git a/lib/puppet/transaction.rb b/lib/puppet/transaction.rb
index e132b72..3d8e38e 100644
--- a/lib/puppet/transaction.rb
+++ b/lib/puppet/transaction.rb
@@ -215,15 +215,17 @@ class Transaction
# Collect the targets of any subscriptions to those events. We pass
# the parent resource in so it will override the source in the events,
# since eval_generated children can't have direct relationships.
- relationship_graph.matching_edges(events, resource).each do |orig_edge|
- # We have to dup the label here, else we modify the original edge label,
- # which affects whether a given event will match on the next run, which is,
- # of course, bad.
- edge = orig_edge.class.new(orig_edge.source, orig_edge.target, orig_edge.label)
- edge.event = events.collect { |e| e.name }
- set_trigger(edge)
+ Puppet::Util.benchmark(:debug, "Time for triggering events to edges") do
+ relationship_graph.matching_edges(events, resource).each do |orig_edge|
+ # We have to dup the label here, else we modify the original edge label,
+ # which affects whether a given event will match on the next run, which is,
+ # of course, bad.
+ puts "new edge: #{orig_edge.target} <- #{orig_edge.source} #{orig_edge.label}"
+ edge = orig_edge.class.new(orig_edge.source, orig_edge.target, orig_edge.label)
+ edge.event = events.collect { |e| e.name }
+ set_trigger(edge)
+ end
end
-
# And return the events for collection
events
end
--
1.6.6.1
Note: 22s doesn't seem that many, but I also tested with about 5000
events and got about 2min (new code), and stopped it after more than
15min (old code).
It seems the previous code was at least quadratic (damn ruby allocator,
not the code itself).
Peter, if you have some spare time, I suggest you try this new patch on
top of the previous one (although they're not related).
Warning: I didn't really test the new code, and it isn't covered with
tests (it shouldn't modify any behavior).
Do you think this might be just the wrong way of doing this matching
overall? My instincts around graphs seem to be particularly horrible,
so there's probably a much better way to do this.
> --
> You received this message because you are subscribed to the Google
> Groups "Puppet Developers" group.
> To post to this group, send email to puppe...@googlegroups.com.
> To unsubscribe from this group, send email to puppet-dev+...@googlegroups.com
> .
> For more options, visit this group at http://groups.google.com/group/puppet-dev?hl=en
> .
>
--
Man is the only animal that can remain on friendly terms with the
victims he intends to eat until he eats them.
-- Samuel Butler (1835-1902)
Still... 2minutes just for the graph operations? That's pretty gross.
I suppose this wouldn't be mitigated at all if we switched to not
creating individual file resources per file, since you'd still need
the same number of events.
> It seems the previous code was at least quadratic (damn ruby
> allocator,
> not the code itself).
>
> Peter, if you have some spare time, I suggest you try this new patch
> on
> top of the previous one (although they're not related).
>
> Warning: I didn't really test the new code, and it isn't covered with
> tests (it shouldn't modify any behavior).
I think that code is pretty well covered, so it should be a straight
refactor.
--
'Tis better to be silent and be thought a fool, than to speak and
remove all doubt. --Abraham Lincoln
IMO, you're the best positioned (because you actually understand
graphs) to see whether my original approach to the problem was flawed.
--
We all have strength enough to endure the misfortunes of others.
-- Francois de La Rochefoucauld
Though it's a massive PITA, would anyone be up to trying a comparison
with RGL?
I've had good luck with it and, in theory, it's as optimized as you can
get in Ruby.
Trevor
- --
Trevor Vaughan
Vice President, Onyx Point, Inc.
email: tvau...@onyxpoint.com
phone: 410-541-ONYX (6699)
- -- This account not approved for unencrypted sensitive information --
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
iEYEARECAAYFAkuhjUcACgkQyWMIJmxwHpTaAACgyniT0zbmZ5mEiht7dRHY6zy5
iwgAnjv/R6HF2pGouC+rnw62CG3Ecf5p
=YWo/
-----END PGP SIGNATURE-----
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Though it's a massive PITA, would anyone be up to trying a comparison
> with RGL?
>
> I've had good luck with it and, in theory, it's as optimized as you
> can
> get in Ruby.
It's been a while since I looked, but AFAIR RGL is just a ruby skin
over a C/C++ implementation of a graph library and, while relatively
fast, is also heinously non-ruby. As in, it feels really bad actually
writing code in it.
Really, though, I'm probably misinformed, because I know more about
GRATR -- which was essentially a response to RGL -- than I do about
RGL itself, so we could probably do well to back-end off of it.
I've also been considering experimenting with the java graph db. But
that's just because I ran into Emil again last night.
--
Computers are not intelligent. They only think they are.
OK, I'll remove this in a new version.
> Do you think this might be just the wrong way of doing this matching
> overall? My instincts around graphs seem to be particularly horrible,
> so there's probably a much better way to do this.
I must admit I'm not good in graphs at all. I even didn't try to
understand the whole concept of the SimpleGraph, I just tried to
micro-optimize the piece that was given me troubles.
But I think that even if the current implementation can be slow
simetimes (and fine in most cases), it is working fine, so I doubt the
implementation is incorrect. Maybe there can be a better implementation
or layout, but since I'm more than ignorant about graph data
structures/algorithms, my advices won't be that useful :-)
Are you talking about Neo4J or sth else?
--
Brice, thanks for that follow up. I tested your patch on a directory
containing ~8k of files or directories. Puppet finished chmod and
chown the files as before in about 1-2 minutes. Still it hangs
afterwards burning cpu. However it does not anymore grow an enormous
amount of memory! The amount of memory remains stable now, which is
great.
However after those 1-2 minutes it hangs for about 60 minutes burning
cpu as before and finishes after 62 minutes. However as discussed in
IRC this is imho more a matter of traveling the graph.
To be clear: this is really only a problem within the first run when
puppet touches _every_ file. If it has to touch only a few files it
finishes much faster. For example in about 7 minutes to fix 1k of
files. Which means imho that the main problem remains in passing the
events up.
Still regarding the features and advantages you get by chown/chmoding
such a structure with puppet resources, it would be great if that
could be passed in a reasonable amount of time.
cheers pete
60 minutes is still too high for managing 8k change events that won't
ever match (or be used in the end) :-(
It's certainly better (in time) to what it would be without the patch
(ie do you have this number, assuming the server is not swapping)?
I think we need a smarter way of matching those events, something
involving pruning the graph of some sort. I'll try to think about this.
> To be clear: this is really only a problem within the first run when
> puppet touches _every_ file. If it has to touch only a few files it
> finishes much faster. For example in about 7 minutes to fix 1k of
> files. Which means imho that the main problem remains in passing the
> events up.
>
> Still regarding the features and advantages you get by chown/chmoding
> such a structure with puppet resources, it would be great if that
> could be passed in a reasonable amount of time.
The problem is not only for chown/chmoding. It just means we are
inefficient at managing many events (granted there might not be too many
real situation generating that many events).
+1
> It's certainly better (in time) to what it would be without the patch
> (ie do you have this number, assuming the server is not swapping)?
yes, I think that might be the best explanation.
> I think we need a smarter way of matching those events, something
> involving pruning the graph of some sort. I'll try to think about this.
Thanks! I'm happy to test anything you bring up. Unfortunately I don't
have enough time to support you in finding an appropriate proposal.
>> To be clear: this is really only a problem within the first run when
>> puppet touches _every_ file. If it has to touch only a few files it
>> finishes much faster. For example in about 7 minutes to fix 1k of
>> files. Which means imho that the main problem remains in passing the
>> events up.
>>
>> Still regarding the features and advantages you get by chown/chmoding
>> such a structure with puppet resources, it would be great if that
>> could be passed in a reasonable amount of time.
>
> The problem is not only for chown/chmoding. It just means we are
> inefficient at managing many events (granted there might not be too many
> real situation generating that many events).
yes.
cheers pete
There's something I'm not sure I completely understood:
In this case of interest, all the "change" events are coming from the
generated sub file resources.
If I correctly read the code, we are collecting events from all those
generated resources.
Then we take the source (ie the root file of the recursive file) and for
each event scan all the edges out of this resources (most if not all
will be the auto-required generated sub file resources) to find those
that match (this cross join between events and out edges is what is
killing our performance).
As I understand it, we're trying to flow the events back to where they
were created, am I correct?
If yes, this should be completely un-needed, and we should prevent the
events to flow along those edges.
I haven't used any of the other implementations, but I agree that a pure
Ruby solution would be preferred.
I was using it for the native Graphviz capabilities at the time.
Trevor
On 03/18/2010 03:05 AM, Luke Kanies wrote:
> On Mar 17, 2010, at 7:17 PM, Trevor Vaughan wrote:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Though it's a massive PITA, would anyone be up to trying a comparison
>> with RGL?
>>
>> I've had good luck with it and, in theory, it's as optimized as you can
>> get in Ruby.
>
> It's been a while since I looked, but AFAIR RGL is just a ruby skin over
> a C/C++ implementation of a graph library and, while relatively fast, is
> also heinously non-ruby. As in, it feels really bad actually writing
> code in it.
>
> Really, though, I'm probably misinformed, because I know more about
> GRATR -- which was essentially a response to RGL -- than I do about RGL
> itself, so we could probably do well to back-end off of it.
>
> I've also been considering experimenting with the java graph db. But
> that's just because I ran into Emil again last night.
>
- --
Trevor Vaughan
Vice President, Onyx Point, Inc.
email: tvau...@onyxpoint.com
phone: 410-541-ONYX (6699)
- -- This account not approved for unencrypted sensitive information --
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
iEYEARECAAYFAkuiwtwACgkQyWMIJmxwHpQlyQCfbKeyPzsALNmsOSnEkKvzbAJh
3lYAoMxNmYRKyhUde7Laqs9kVWfgSX9D
=y4QB
-----END PGP SIGNATURE-----
Would it be possible to detect a recursive file operation, stop at that
point in the graph and then simply call the corresponding Ruby code the
rest of the way down for specific operations?
I can, unfortunately, see this approach wreaking havok with my recent
symbolic mode patch as an example of where it wouldn't work.
Alternatively, instead of merging all of the sub-file objects into the
main graph, could you create a subgraph that is built off of a DFS from
that originating recurse point?
This would allow you to keep everything the same all the way, but
*should* vastly speed things up since this will be a subgraph liner DAG
operation instead of a full graph merge.
Trevor
>
> I think we need a smarter way of matching those events, something
> involving pruning the graph of some sort. I'll try to think about this.
>
>
- --
Trevor Vaughan
Vice President, Onyx Point, Inc.
email: tvau...@onyxpoint.com
phone: 410-541-ONYX (6699)
- -- This account not approved for unencrypted sensitive information --
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
iEYEARECAAYFAkuiw88ACgkQyWMIJmxwHpS30QCgzge6zcpwRUiy3q9HsDv0TAFp
OD0AoJk8/iZzsPSCq83vUeiUEkHnaFCM
=i4jS
-----END PGP SIGNATURE-----
I thought about this. It all depends on how we want to consider those
generated resources: do they deserve being regular resources, or are
they somewhat special?
> This would allow you to keep everything the same all the way, but
> *should* vastly speed things up since this will be a subgraph liner DAG
> operation instead of a full graph merge.
For the moment I'll fix the current issue, which is that we don't need
to propagate generated sub-resource events back to them (at least for
the file resources, if there are other generated resources needing this
let me know).
This is clearly a bug in the current implementation, coming from the
fact that the code masquerades the real event generator with the topmost
file resource. If we were sending the events from the subgenerated
resources we wouldn't do that at all. Unfortunately there are
"auto-require" relationships to this topmost node to every generated
sub-file-resources, thus the code "sends" the events back to every
generated sub-file-resources. This makes this algorithm something along
of O(kn^2) with n = number of generated resources, and k number of
different kind of changed events.
I believe it is not necessary to flow those sub-events back to where
they were spawned. The very next fix (coming during the week-end I
think) will make sure those events are propagated only to non-generated
resources. This should fix this issue one for all (we can still keep my
previous patch because it is clearly nicer in terms of memory usage).
Too bad, your comments are always insightful, and you might have some
clever idea to fix the aforementioned issue.
> I have two favors to ask:
>
> * Since we appear agreed the previous patch is a Good Thing, even if
> it's not the ultimate solution to everything, could you put it in a
> branch, add it to the ticket, and mark it "Ready for Testing"
The branch is currently against 0.25.x, should I create a new one
against master and/or testing?
> * Could you open a new ticket for the forthcoming more general fix,
> and do the same with that change as soon as it's available?
Will do.
> I'm trying to get all the things the we'd like to see in Rowlf rolled
> into the testing branch in the next week, so we can bang them against
> each other.
OK, I understand.
More generally, if, say, I want to produce a new patch for Rowlf (I'm
thinking about some random puppetdoc fix or feature), what is the more
convenient for you:
* base it on top of the latest available testing (or somewhere in the
middle)
* base it on master
Thanks,
Done.
> * Could you open a new ticket for the forthcoming more general fix,
> and do the same with that change as soon as it's available?
Done, for reference it's ticket: #3396
http://projects.reductivelabs.com/issues/3396
> Though I've been quiet on this thread I've been following it withToo bad, your comments are always insightful, and you might have some
> interest.
clever idea to fix the aforementioned issue.
> I have two favors to ask:The branch is currently against 0.25.x, should I create a new one
>
> * Since we appear agreed the previous patch is a Good Thing, even if
> it's not the ultimate solution to everything, could you put it in a
> branch, add it to the ticket, and mark it "Ready for Testing"
against master and/or testing?
More generally, if, say, I want to produce a new patch for Rowlf (I'm
thinking about some random puppetdoc fix or feature), what is the more
convenient for you:
* base it on top of the latest available testing (or somewhere in the
middle)
* base it on master
You are right - we could probably do a quick test before we collect
events, to determine if there are any subscriptions. If not, we can
just ignore the events entirely and save a bunch of time.
There's probably also some kind of 'break' we can do - functionally,
one event is equivalent to 100 events, in terms of the behaviour of
the system, so we could just keep a boolean and pass it around, or
maybe just count the events or something.
I think there are a couple of different optimizations available.
--
The one thing more difficult than following a regimen is not imposing it
on others. -- Marcel Proust
Yep. It's the only one around, AFIAK.
--
The point of living and of being an optimist, is to be foolish enough
to believe the best is yet to come. -- Peter Ustinov
I agree.
>> To be clear: this is really only a problem within the first run when
>> puppet touches _every_ file. If it has to touch only a few files it
>> finishes much faster. For example in about 7 minutes to fix 1k of
>> files. Which means imho that the main problem remains in passing the
>> events up.
>>
>> Still regarding the features and advantages you get by chown/chmoding
>> such a structure with puppet resources, it would be great if that
>> could be passed in a reasonable amount of time.
>
> The problem is not only for chown/chmoding. It just means we are
> inefficient at managing many events (granted there might not be too
> many
> real situation generating that many events).
I agree, again. This is unacceptable.
--
It has recently been discovered that research causes cancer in
labratory rats.
> On Thu, 2010-03-18 at 20:22 -0400, Trevor Vaughan wrote:
>> Would it be possible to detect a recursive file operation, stop at
>> that
>> point in the graph and then simply call the corresponding Ruby code
>> the
>> rest of the way down for specific operations?
>>
>> I can, unfortunately, see this approach wreaking havok with my recent
>> symbolic mode patch as an example of where it wouldn't work.
>>
>> Alternatively, instead of merging all of the sub-file objects into
>> the
>> main graph, could you create a subgraph that is built off of a DFS
>> from
>> that originating recurse point?
>
> I thought about this. It all depends on how we want to consider those
> generated resources: do they deserve being regular resources, or are
> they somewhat special?
IMO, this is the right approach, with some caveats. The only real
concerns I see with it are that we begin to lose some clarity over
what's specifically happening (because we're treating all of the files
as a collection, rather than individually), and if we do have rollback
at some point this will considerably complicate it.
However, I think that as long as we produce events with sufficient
information (i.e., they have to mention both the actual file and the
parent file), we should be able to do rollback without too much
effort, albeit it will require some hackery. That hackery will likely
rarely get used, though, and at this point we're doing way more
hackery in order to get recursive file management in its current form.
>> This would allow you to keep everything the same all the way, but
>> *should* vastly speed things up since this will be a subgraph liner
>> DAG
>> operation instead of a full graph merge.
>
> For the moment I'll fix the current issue, which is that we don't need
> to propagate generated sub-resource events back to them (at least for
> the file resources, if there are other generated resources needing
> this
> let me know).
There are no other resource types that generate resources mid-
transaction.
Are you sure we don't need to do any propagation? If a service is
subscribed to a file tree and only one change is made but it's lower
in the tree, don't we want that event to propagate?
> This is clearly a bug in the current implementation, coming from the
> fact that the code masquerades the real event generator with the
> topmost
> file resource. If we were sending the events from the subgenerated
> resources we wouldn't do that at all. Unfortunately there are
> "auto-require" relationships to this topmost node to every generated
> sub-file-resources, thus the code "sends" the events back to every
> generated sub-file-resources. This makes this algorithm something
> along
> of O(kn^2) with n = number of generated resources, and k number of
> different kind of changed events.
>
> I believe it is not necessary to flow those sub-events back to where
> they were spawned. The very next fix (coming during the week-end I
> think) will make sure those events are propagated only to non-
> generated
> resources. This should fix this issue one for all (we can still keep
> my
> previous patch because it is clearly nicer in terms of memory usage).
Ah; that makes sense.
--
Calamities are of two kinds: misfortunes to ourselves, and good
fortune to others. -- Ambrose Bierce