[PATCH/puppet 1/1] Fix #2929 - Allow checksum to be "none"

84 views
Skip to first unread message

Brice Figureau

unread,
Mar 13, 2010, 8:59:51 AM3/13/10
to puppe...@googlegroups.com
File checksum is "md5" by default. When managing local files (not sourced
or content) it might be desirable to not checksum files, especially
when managing deep hierarchies containing many files.

This patch allows to write such manifests:
file {
"/path/to/deep/hierarchy":
owner => brice, recurse => true, checksum => none
}

Then puppet(d) won't checksum those files, just manage their ownership.

Signed-off-by: Brice Figureau <brice-...@daysofwonder.com>
---
lib/puppet/file_serving/fileset.rb | 4 ++--
lib/puppet/file_serving/terminus_helper.rb | 1 +
lib/puppet/type/file.rb | 9 ++++++++-
lib/puppet/type/file/checksum.rb | 3 ++-
lib/puppet/util/checksums.rb | 5 +++++
spec/unit/file_serving/fileset.rb | 6 ++++++
spec/unit/file_serving/terminus_helper.rb | 10 ++++++++++
spec/unit/type/file.rb | 7 +++++++
spec/unit/type/file/checksum.rb | 28 ++++++++++++++++++++++++++++
spec/unit/util/checksums.rb | 8 +++++++-
10 files changed, 76 insertions(+), 5 deletions(-)
create mode 100644 spec/unit/type/file/checksum.rb

diff --git a/lib/puppet/file_serving/fileset.rb b/lib/puppet/file_serving/fileset.rb
index 50e4e1e..a66d935 100644
--- a/lib/puppet/file_serving/fileset.rb
+++ b/lib/puppet/file_serving/fileset.rb
@@ -9,7 +9,7 @@ require 'puppet/file_serving/metadata'
# Operate recursively on a path, returning a set of file paths.
class Puppet::FileServing::Fileset
attr_reader :path, :ignore, :links
- attr_accessor :recurse, :recurselimit
+ attr_accessor :recurse, :recurselimit, :checksum_type

# Produce a hash of files, with merged so that earlier files
# with the same postfix win. E.g., /dir1/subfile beats /dir2/subfile.
@@ -105,7 +105,7 @@ class Puppet::FileServing::Fileset
end

def initialize_from_request(request)
- [:links, :ignore, :recurse, :recurselimit].each do |param|
+ [:links, :ignore, :recurse, :recurselimit, :checksum_type].each do |param|
if request.options.include?(param) # use 'include?' so the values can be false
value = request.options[param]
elsif request.options.include?(param.to_s)
diff --git a/lib/puppet/file_serving/terminus_helper.rb b/lib/puppet/file_serving/terminus_helper.rb
index c88bacc..6f5d52b 100644
--- a/lib/puppet/file_serving/terminus_helper.rb
+++ b/lib/puppet/file_serving/terminus_helper.rb
@@ -16,6 +16,7 @@ module Puppet::FileServing::TerminusHelper

Puppet::FileServing::Fileset.merge(*filesets).collect do |file, base_path|
inst = model.new(base_path, :relative_path => file)
+ inst.checksum_type = request.options[:checksum_type] if request.options[:checksum_type]
inst.links = request.options[:links] if request.options[:links]
inst.collect
inst
diff --git a/lib/puppet/type/file.rb b/lib/puppet/type/file.rb
index 2f5b5df..fdb5a35 100644
--- a/lib/puppet/type/file.rb
+++ b/lib/puppet/type/file.rb
@@ -592,7 +592,14 @@ module Puppet
end

def perform_recursion(path)
- Puppet::FileServing::Metadata.search(path, :links => self[:links], :recurse => (self[:recurse] == :remote ? true : self[:recurse]), :recurselimit => self[:recurselimit], :ignore => self[:ignore])
+ params = {
+ :links => self[:links],
+ :recurse => (self[:recurse] == :remote ? true : self[:recurse]),
+ :recurselimit => self[:recurselimit],
+ :ignore => self[:ignore]
+ }
+ params[:checksum_type] = self[:checksum] if self[:checksum] == :none
+ Puppet::FileServing::Metadata.search(path, params)
end

# Remove any existing data. This is only used when dealing with
diff --git a/lib/puppet/type/file/checksum.rb b/lib/puppet/type/file/checksum.rb
index 23a3e5a..0c45aad 100755
--- a/lib/puppet/type/file/checksum.rb
+++ b/lib/puppet/type/file/checksum.rb
@@ -23,7 +23,7 @@ Puppet::Type.type(:file).newproperty(:checksum) do

@unmanaged = true

- @validtypes = %w{md5 md5lite timestamp mtime time}
+ @validtypes = %w{md5 md5lite timestamp mtime time none}

def self.validtype?(type)
@validtypes.include?(type)
@@ -52,6 +52,7 @@ Puppet::Type.type(:file).newproperty(:checksum) do
cache(type, sum)
return type
else
+ return :none if value.nil? or value.to_s == "" or value.to_s == "none"
if FileTest.directory?(@resource[:path])
return :time
elsif @resource[:source] and value.to_s != "md5"
diff --git a/lib/puppet/util/checksums.rb b/lib/puppet/util/checksums.rb
index 98bf5de..39477ee 100644
--- a/lib/puppet/util/checksums.rb
+++ b/lib/puppet/util/checksums.rb
@@ -68,6 +68,11 @@ module Puppet::Util::Checksums
File.stat(filename).send(:ctime)
end

+ # Return a "no checksum"
+ def none_file(filename)
+ ""
+ end
+
private

# Perform an incremental checksum on a file.
diff --git a/spec/unit/file_serving/fileset.rb b/spec/unit/file_serving/fileset.rb
index 55cc2a9..c03522d 100755
--- a/spec/unit/file_serving/fileset.rb
+++ b/spec/unit/file_serving/fileset.rb
@@ -42,6 +42,12 @@ describe Puppet::FileServing::Fileset, " when initializing" do
set.links.should == :manage
end

+ it "should accept a 'checksum_type' option" do
+ File.expects(:lstat).with("/some/file").returns stub("stat")
+ set = Puppet::FileServing::Fileset.new("/some/file", :checksum_type => :test)
+ set.checksum_type.should == :test
+ end
+
it "should fail if 'links' is set to anything other than :manage or :follow" do
proc { Puppet::FileServing::Fileset.new("/some/file", :links => :whatever) }.should raise_error(ArgumentError)
end
diff --git a/spec/unit/file_serving/terminus_helper.rb b/spec/unit/file_serving/terminus_helper.rb
index d0c06f1..98c64b7 100755
--- a/spec/unit/file_serving/terminus_helper.rb
+++ b/spec/unit/file_serving/terminus_helper.rb
@@ -76,6 +76,16 @@ describe Puppet::FileServing::TerminusHelper do
@helper.path2instances(@request, "/my/file")
end

+ it "should set the request checksum_type if one is provided" do
+ @one.expects(:checksum_type=).with :test
+ @two.expects(:checksum_type=).with :test
+ @model.expects(:new).returns(@one)
+ @model.expects(:new).returns(@two)
+
+ @request.options[:checksum_type] = :test
+ @helper.path2instances(@request, "/my/file")
+ end
+
it "should collect the instance's attributes" do
@one.expects(:collect)
@two.expects(:collect)
diff --git a/spec/unit/type/file.rb b/spec/unit/type/file.rb
index 1b3fe6a..b5963a6 100755
--- a/spec/unit/type/file.rb
+++ b/spec/unit/type/file.rb
@@ -320,6 +320,13 @@ describe Puppet::Type.type(:file) do
@file.expects(:newchild).with("my/file").returns "fiebar"
@file.recurse_local.should == {"my/file" => "fiebar"}
end
+
+ it "should set checksum_type to none if this file checksum is none" do
+ @file[:checksum] = :none
+ Puppet::FileServing::Metadata.expects(:search).with { |path,params| params[:checksum_type] == :none }.returns [@metadata]
+ @file.expects(:newchild).with("my/file").returns "fiebar"
+ @file.recurse_local
+ end
end

it "should have a method for performing link recursion" do
diff --git a/spec/unit/type/file/checksum.rb b/spec/unit/type/file/checksum.rb
new file mode 100644
index 0000000..5d715d1
--- /dev/null
+++ b/spec/unit/type/file/checksum.rb
@@ -0,0 +1,28 @@
+#!/usr/bin/env ruby
+
+Dir.chdir(File.dirname(__FILE__)) { (s = lambda { |f| File.exist?(f) ? require(f) : Dir.chdir("..") { s.call(f) } }).call("spec/spec_helper.rb") }
+
+checksum = Puppet::Type.type(:file).attrclass(:checksum)
+describe checksum do
+ before do
+ # Wow that's a messy interface to the resource.
+ @resource = stub 'resource', :[] => nil, :[]= => nil, :property => nil, :newattr => nil, :parameter => nil
+ end
+
+ it "should be a subclass of Property" do
+ checksum.superclass.must == Puppet::Property
+ end
+
+ it "should have default checksum of :md5" do
+ @checksum = checksum.new(:resource => @resource)
+ @checksum.checktype.should == :md5
+ end
+
+ [:none, nil, ""].each do |ck|
+ it "should use a none checksum for #{ck.inspect}" do
+ @checksum = checksum.new(:resource => @resource)
+ @checksum.should = "none"
+ @checksum.checktype.should == :none
+ end
+ end
+end
diff --git a/spec/unit/util/checksums.rb b/spec/unit/util/checksums.rb
index d31d7a0..615ed90 100755
--- a/spec/unit/util/checksums.rb
+++ b/spec/unit/util/checksums.rb
@@ -14,7 +14,7 @@ describe Puppet::Util::Checksums do
end

content_sums = [:md5, :md5lite, :sha1, :sha1lite]
- file_only = [:ctime, :mtime]
+ file_only = [:ctime, :mtime, :none]

content_sums.each do |sumtype|
it "should be able to calculate %s sums from strings" % sumtype do
@@ -104,4 +104,10 @@ describe Puppet::Util::Checksums do
end
end
end
+
+ describe "when using the none checksum" do
+ it "should return an empty string" do
+ @summer.none_file("/my/file").should == ""
+ end
+ end
end
--
1.6.6.1

Rein Henrichs

unread,
Mar 13, 2010, 3:33:30 PM3/13/10
to puppe...@googlegroups.com
Brice,

I think the "real" fix for the underlying problem is fixing the regression bug introduced in the file serving refactor. That said, if this hot patch works around the 100% CPU usage on recursive chowns bug in the user list thread, I think it's desirable to get this into the 0.25.x branch ASAP.

+1

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To post to this group, send email to puppe...@googlegroups.com.
To unsubscribe from this group, send email to puppet-dev+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/puppet-dev?hl=en.


Brice Figureau

unread,
Mar 13, 2010, 4:57:42 PM3/13/10
to puppe...@googlegroups.com
On 13/03/10 21:33, Rein Henrichs wrote:
> Brice,
>
> I think the "real" fix for the underlying problem is fixing the regression
> bug introduced in the file serving refactor.

Indeed, but I think this patch is close to this.

> That said, if this hot patch
> works around the 100% CPU usage on recursive chowns bug in the user list
> thread, I think it's desirable to get this into the 0.25.x branch ASAP.

The real underlying issue is that file metadata terminus doesn't care
about the file {} checksum settings, it always compute the md5 checksum
of said files. Why? Because this is what is correct for remote files
metadatas.
Unfortunately the same code is called in the direct file serving
terminus system, where the checksum should not be computed (it will be
done by the checksum property).

This patch is not 100% correct (but my minimal testing show that it
works), and I don't expect it to be merged directly in 0.25.x (on which
it is based).

It basically does two things:
* allow setting checksum => none
* make sure file metadata doesn't compute anything if checksum is none
(but only for the direct file server terminus).

A more correct patch would be to prevent checksums to be computed in
local file metadata search.
--
Brice Figureau
My Blog: http://www.masterzen.fr/

Markus Roberts

unread,
Mar 13, 2010, 4:58:37 PM3/13/10
to puppet-dev
On Sat, Mar 13, 2010 at 12:33 PM, Rein Henrichs <re...@reductivelabs.com> wrote:
I think the "real" fix for the underlying problem is fixing the regression bug introduced in the file serving refactor. That said, if this hot patch works around the 100% CPU usage on recursive chowns bug in the user list thread, I think it's desirable to get this into the 0.25.x branch ASAP.

+1
 
 Agreed on both counts.

Markus Roberts

unread,
Mar 13, 2010, 5:00:42 PM3/13/10
to puppet-dev
Brice --


> A more correct patch would be to prevent checksums to be computed in
local file metadata search.

I had looked breifly for a clean way to do this and drew a blank.  Do you have any insights as to the best way to go about it?

-- Markus 

Luke Kanies

unread,
Mar 14, 2010, 3:46:44 AM3/14/10
to puppe...@googlegroups.com
I actually think this *is* the fix for the regression.

You basically need some way to signal the metadata class that you're not interested in the checksum; this is one such way.

The only change I would make is to change it so this is the default - checksum should default to 'none', and only default to 'md5' if something like 'source' is specified.

This behaviour will be a bit harder to test all the behaviours of, but it doesn't require that people change their Puppet code, and it would behave as people expect.


-- 
The Ninety-Ninety Rule of Project Schedules:
    The first 90% of the task takes 90% of the time, and the last
    10% takes the other 90%.
---------------------------------------------------------------------
Luke Kanies  -|-   http://reductivelabs.com   -|-   +1(615)594-8199

Luke Kanies

unread,
Mar 14, 2010, 3:48:38 AM3/14/10
to puppe...@googlegroups.com
On Mar 13, 2010, at 1:57 PM, Brice Figureau wrote:

> On 13/03/10 21:33, Rein Henrichs wrote:
>> Brice,
>>
>> I think the "real" fix for the underlying problem is fixing the
>> regression
>> bug introduced in the file serving refactor.
>
> Indeed, but I think this patch is close to this.
>
>> That said, if this hot patch
>> works around the 100% CPU usage on recursive chowns bug in the user
>> list
>> thread, I think it's desirable to get this into the 0.25.x branch
>> ASAP.
>
> The real underlying issue is that file metadata terminus doesn't care
> about the file {} checksum settings, it always compute the md5
> checksum
> of said files. Why? Because this is what is correct for remote files
> metadatas.

This actually shouldn't always need to be true - it's definitely
unlikely that we wouldn't be interested in the remote md5, but it's at
least possible.

> Unfortunately the same code is called in the direct file serving
> terminus system, where the checksum should not be computed (it will be
> done by the checksum property).
>
> This patch is not 100% correct (but my minimal testing show that it
> works), and I don't expect it to be merged directly in 0.25.x (on
> which
> it is based).

Why don't you expect that?

> It basically does two things:
> * allow setting checksum => none
> * make sure file metadata doesn't compute anything if checksum is none
> (but only for the direct file server terminus).
>
> A more correct patch would be to prevent checksums to be computed in
> local file metadata search.

I suppose that would be sufficient - just have a 'local' flag,
effectively, and an additional flag saying "i don't care about the
checksum", but... then the latter becomes a superset of the former, so
skip the former, use what you have here, and move on.

Or am I missing something?

--
SCSI is *not* magic. There are fundamental technical reasons
why it is necessary to sacrifice a young goat to your SCSI chain
now and then.

Brice Figureau

unread,
Mar 14, 2010, 5:37:35 AM3/14/10
to puppe...@googlegroups.com

Local metadata are always called by the direct file server. This
instance can set a flag on each filesets to not compute checksum and
we're done.

Brice Figureau

unread,
Mar 14, 2010, 5:42:55 AM3/14/10
to puppe...@googlegroups.com
On 14/03/10 08:48, Luke Kanies wrote:
> On Mar 13, 2010, at 1:57 PM, Brice Figureau wrote:
>
>> On 13/03/10 21:33, Rein Henrichs wrote:
>>> Brice,
>>>
>>> I think the "real" fix for the underlying problem is fixing the
>>> regression
>>> bug introduced in the file serving refactor.
>>
>> Indeed, but I think this patch is close to this.
>>
>>> That said, if this hot patch
>>> works around the 100% CPU usage on recursive chowns bug in the user list
>>> thread, I think it's desirable to get this into the 0.25.x branch ASAP.
>>
>> The real underlying issue is that file metadata terminus doesn't care
>> about the file {} checksum settings, it always compute the md5 checksum
>> of said files. Why? Because this is what is correct for remote files
>> metadatas.
>
> This actually shouldn't always need to be true - it's definitely
> unlikely that we wouldn't be interested in the remote md5, but it's at
> least possible.

OK, it makes sense, but I have hard time thinking (in the current puppet
system) when we wouldn't want to have remote metadatas including the
checksum?

>> Unfortunately the same code is called in the direct file serving
>> terminus system, where the checksum should not be computed (it will be
>> done by the checksum property).
>>
>> This patch is not 100% correct (but my minimal testing show that it
>> works), and I don't expect it to be merged directly in 0.25.x (on which
>> it is based).
>
> Why don't you expect that?

My sentence wasn't complete: it is missing an "as is".

When I wrote this I just had thought that directing the direct file
server to say to the fileset that they are local (and as such don't have
to compute a checksum) would be a better direction.

I must also add that I did only minimal testing of said patch (ie tried
only in the lab with various files (ie sourced, local, recursive). But
the file resource is so vast, I didn't check managing links, copying
files, etc, were still working.

>> It basically does two things:
>> * allow setting checksum => none
>> * make sure file metadata doesn't compute anything if checksum is none
>> (but only for the direct file server terminus).
>>
>> A more correct patch would be to prevent checksums to be computed in
>> local file metadata search.
>
> I suppose that would be sufficient - just have a 'local' flag,
> effectively, and an additional flag saying "i don't care about the
> checksum", but... then the latter becomes a superset of the former, so
> skip the former, use what you have here, and move on.
>
> Or am I missing something?

I don't think so, but I'm quite ignorant about the interaction between
the checksum file property and the rest of the system, so I want to make
sure I'm not introducing any new regressions.

Brice Figureau

unread,
Mar 14, 2010, 5:46:13 AM3/14/10
to puppe...@googlegroups.com
On 14/03/10 08:46, Luke Kanies wrote:
> I actually think this *is* the fix for the regression.
>
> You basically need some way to signal the metadata class that you're not
> interested in the checksum; this is one such way.

Yes, this was the less intrusive way of doing this.

> The only change I would make is to change it so this is the default -
> checksum should default to 'none', and only default to 'md5' if
> something like 'source' is specified.

That would make sense. I'll try to see if I can come with something that
acts like this.

> This behaviour will be a bit harder to test all the behaviours of, but
> it doesn't require that people change their Puppet code, and it would
> behave as people expect.

That's correct and looks better on drawing board.

I just need to see how the checksum property can know if there is a
source or a content property in the file resource.

Peter Meier

unread,
Mar 14, 2010, 8:54:05 AM3/14/10
to puppe...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> I think the "real" fix for the underlying problem is fixing the regression
> bug introduced in the file serving refactor. That said, if this hot patch
> works around the 100% CPU usage on recursive chowns bug in the user list
> thread, I think it's desirable to get this into the 0.25.x branch ASAP.

I would really appreciate that. Thanks Brice for the patch!

cheers pete
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkuc3G0ACgkQbwltcAfKi3/7pgCgq3dMh+oY9Gk9CsEsox34JBof
FO4An3dJGhkPUsvTYGrF1giOef/jB65q
=akKt
-----END PGP SIGNATURE-----

Luke Kanies

unread,
Mar 14, 2010, 6:49:54 PM3/14/10
to puppe...@googlegroups.com

*We* wouldn't, but there's someone out there who's using remote file
recursion just for setting owner/mode/etc, I promise. Crazy, but
probably true. (Note: I don't know that this is true, I'm just
estimating it based on experience.)

That being said, if we have a simple fix that only works for local
file serving, I'm perfectly happy with that and we should do it.

>>> Unfortunately the same code is called in the direct file serving
>>> terminus system, where the checksum should not be computed (it
>>> will be
>>> done by the checksum property).
>>>
>>> This patch is not 100% correct (but my minimal testing show that it
>>> works), and I don't expect it to be merged directly in 0.25.x (on
>>> which
>>> it is based).
>>
>> Why don't you expect that?
>
> My sentence wasn't complete: it is missing an "as is".
>
> When I wrote this I just had thought that directing the direct file
> server to say to the fileset that they are local (and as such don't
> have
> to compute a checksum) would be a better direction.
>
> I must also add that I did only minimal testing of said patch (ie
> tried
> only in the lab with various files (ie sourced, local, recursive). But
> the file resource is so vast, I didn't check managing links, copying
> files, etc, were still working.

The basic model is sound; the only confusing bits are around how
checksum is used by source and content. That needs to be refactored
anyway (e.g., I somehow broke Puppet's Tripwire-like functionality in
0.25), partially because it can be so hard to track.

I've been thinking that 'checksum' needs the treatment 'source' got -
turned into a parameter instead of a property. Theoretically,
checksum and content should be isomorphic, as long as you have source
specified (or a filebucket), so checksum should just be a utility
parameter rather than a real property. Given that, we should probably
also fix the confusion where the checksum parameter sets the checksum
type but also determines and returns the actual checksum. This is
especially stupid now that we only ever print a file's checksum rather
than its contents.

So, seems like we should switch the 'checksum' parameter to just
support the various values (md5 et al), and move all checksum
calculating into utility methods that the rest of the file resource
uses. This should hopefully clean everything up enough that it
actually becomes maintainable.

Then our tripwire-like behaviour (which is that it passively monitors
files, logging when a file's contents change) comes from tracking the
content rather than the checksum:

file { "/tmp/foobar": check => content }

This doesn't actually work right now, but I haven't had the time to
figure out why.

>>> It basically does two things:
>>> * allow setting checksum => none
>>> * make sure file metadata doesn't compute anything if checksum is
>>> none
>>> (but only for the direct file server terminus).
>>>
>>> A more correct patch would be to prevent checksums to be computed in
>>> local file metadata search.
>>
>> I suppose that would be sufficient - just have a 'local' flag,
>> effectively, and an additional flag saying "i don't care about the
>> checksum", but... then the latter becomes a superset of the former,
>> so
>> skip the former, use what you have here, and move on.
>>
>> Or am I missing something?
>
> I don't think so, but I'm quite ignorant about the interaction between
> the checksum file property and the rest of the system, so I want to
> make
> sure I'm not introducing any new regressions.

Ok.

Do you want to attempt to tackle the above refactor, would you like to
collaborate with me on it, or would you prefer to skip it for now and
just get this fix in?

--
It's not that I'm afraid to die. I just don't want to be there when it
happens. -- Woody Allen

Luke Kanies

unread,
Mar 14, 2010, 10:58:53 PM3/14/10
to puppe...@googlegroups.com
On Mar 14, 2010, at 1:46 AM, Brice Figureau wrote:
[...]

>> This behaviour will be a bit harder to test all the behaviours of,
>> but
>> it doesn't require that people change their Puppet code, and it would
>> behave as people expect.
>
> That's correct and looks better on drawing board.
>
> I just need to see how the checksum property can know if there is a
> source or a content property in the file resource.

resource[:source] and resource[:content] should work, right?

--
You will notice that BeOS has taken the best parts from all the major
operating systems and made them its own. We've got the power of the
Unix command line, the ease of use of the Macintosh interface, and
Minesweeper from Windows. -- Tyler Riti

Brice Figureau

unread,
Mar 15, 2010, 4:56:45 AM3/15/10
to puppe...@googlegroups.com
On Sun, 2010-03-14 at 19:58 -0700, Luke Kanies wrote:
> On Mar 14, 2010, at 1:46 AM, Brice Figureau wrote:
> [...]
> >> This behaviour will be a bit harder to test all the behaviours of,
> >> but
> >> it doesn't require that people change their Puppet code, and it would
> >> behave as people expect.
> >
> > That's correct and looks better on drawing board.
> >
> > I just need to see how the checksum property can know if there is a
> > source or a content property in the file resource.
>
> resource[:source] and resource[:content] should work, right?

I'll do another attempt later this week, but my first attempt miserably
failed. There is a complex intrication between content, source and
checksum, which I'm not sure I completely understood....
--
Brice Figureau
Follow the latest Puppet Community evolutions on www.planetpuppet.org!

Brice Figureau

unread,
Mar 15, 2010, 5:06:17 AM3/15/10
to puppe...@googlegroups.com

I think so too. I'd just appreciate if someone else could test the
current patch (which is available in my github repository at
tickets/0.25.x/2929).

Yes, I think this would be much simpler.

> Theoretically,
> checksum and content should be isomorphic, as long as you have source
> specified (or a filebucket), so checksum should just be a utility
> parameter rather than a real property.

That would make sense. Checksum should just be an indication of the
checksum type. And the real checksum would just be driven by either
content or source.

> Given that, we should probably
> also fix the confusion where the checksum parameter sets the checksum
> type but also determines and returns the actual checksum. This is
> especially stupid now that we only ever print a file's checksum rather
> than its contents.

That's what made my second attempt at fixing this issue so difficult.

> So, seems like we should switch the 'checksum' parameter to just
> support the various values (md5 et al), and move all checksum
> calculating into utility methods that the rest of the file resource
> uses. This should hopefully clean everything up enough that it
> actually becomes maintainable.
>
> Then our tripwire-like behaviour (which is that it passively monitors
> files, logging when a file's contents change) comes from tracking the
> content rather than the checksum:
>
> file { "/tmp/foobar": check => content }

I'm not sure about this syntax: do you mean we should add a new property
(ie check) to reintroduce tripwire functionality?

> This doesn't actually work right now, but I haven't had the time to
> figure out why.
>
> >>> It basically does two things:
> >>> * allow setting checksum => none
> >>> * make sure file metadata doesn't compute anything if checksum is
> >>> none
> >>> (but only for the direct file server terminus).
> >>>
> >>> A more correct patch would be to prevent checksums to be computed in
> >>> local file metadata search.
> >>
> >> I suppose that would be sufficient - just have a 'local' flag,
> >> effectively, and an additional flag saying "i don't care about the
> >> checksum", but... then the latter becomes a superset of the former,
> >> so
> >> skip the former, use what you have here, and move on.
> >>
> >> Or am I missing something?
> >
> > I don't think so, but I'm quite ignorant about the interaction between
> > the checksum file property and the rest of the system, so I want to
> > make
> > sure I'm not introducing any new regressions.
>
> Ok.
>
> Do you want to attempt to tackle the above refactor, would you like to
> collaborate with me on it, or would you prefer to skip it for now and
> just get this fix in?

Frankly I'd prefer we fix the issue in 0.25.x with this patch (granted
someone else can test it with success).

I don't really have lot of time until June for puppet dev (except those
occasional fixes), but I can at least try to start something. If you
want to do it, then go ahead, that will certainly be much more effective
than my procrastination :-)

Luke Kanies

unread,
Mar 15, 2010, 4:44:17 PM3/15/10
to puppe...@googlegroups.com

I'll be testing it this week, definitely.

This is actually a metaparam that's existed since the earliest days of
Puppet - its intent is that Puppet will not manage those properties
but will instead just track changes to them over time and alert when
changes happen.

Ok. I'll get this patch tested, and if I have some airplane time or
something see what a full checksum refactor would look like. The
'source' refactor ended up being relatively easy in the end and it had
a huge impact on maintainability.

--
The world is round; it has no point.
-- Adrienne E. Gusoff

Brice Figureau

unread,
Mar 16, 2010, 2:55:09 PM3/16/10
to puppe...@googlegroups.com
On 15/03/10 21:44, Luke Kanies wrote:
> On Mar 15, 2010, at 2:06 AM, Brice Figureau wrote:
>
>> On Sun, 2010-03-14 at 15:49 -0700, Luke Kanies wrote:
>>> On Mar 14, 2010, at 1:42 AM, Brice Figureau wrote:
[snipped]

>>
>> Frankly I'd prefer we fix the issue in 0.25.x with this patch (granted
>> someone else can test it with success).
>>
>> I don't really have lot of time until June for puppet dev (except those
>> occasional fixes), but I can at least try to start something. If you
>> want to do it, then go ahead, that will certainly be much more effective
>> than my procrastination :-)
>
> Ok. I'll get this patch tested, and if I have some airplane time or
> something see what a full checksum refactor would look like. The
> 'source' refactor ended up being relatively easy in the end and it had a
> huge impact on maintainability.

Peter tested my patch today and found that it was working as intended
(there is no more md5 checksumming of all the files).

But he also discovered something that we didnt't even thought about: the
first run of:

file {
"/deep/hierarchy/full/of/files":
recurse => true, checksum => none, mode => 0600
}

produces many changes in the transaction (assuming the files are not
0600 of course). This in turn generates a large consumption of memory
and 100% of cpu use for a looong time (ie more than 30 min for 7.5k
files said Peter, several minutes for 2.5k on my fast computer).

Why?

After a quick search, we found that most of the time was taken around
Puppet::SimpleGraph#matching_edges or around line 218 of
Puppet::Transaction (ie I'm not sure it's the call itselft or iteration
on the result).

I didn't do a deep analysis of the problem, but to me we are generating
tons of events that we are propagating up to the root file (that spawned
all those sub-file-resources). I don't think in this case that it is
necessary, unless there is a notify/subscribe (which is not the case).

One of the problem I think (no evidence to back this up, it's only my
understanding) is that we dup the current edge in a fast paced loop. The
poor ruby GC has really no time to collect the previous edge instance so
we pile up new edge instances at a really fast pace, spending most of
it's time in the allocator (the ruby one or the libgc one, which is
costly operation).

I'll do more tests later this week on this code (I first have to
understand it of course :-)).

Anyway commenting line 218 in transaction.rb, and the aforementioned
manifest becomes really fast (if not instantaneous).

Any comments?

Brice Figureau

unread,
Mar 17, 2010, 3:09:46 PM3/17/10
to puppe...@googlegroups.com

OK, I identified the issue (patch coming soon):

Puppet::SimpleGraph#matching_edge is simply completely inefficient in
front of lots of events for a given resource (which is our case).
The problem is that matching_edge returns the list of edges for which we
should trigger an event. In the aforementioned manifest, since there is
no subscribe/notify, the list is empty.

Unfortunately with the current code, it takes about 22s to find this
result for about 1100 events. With the (soon to be published patch), I
was able to reduce this time to 4s.

The whole issue was that to find this list of edges the process was
abusing memory allocations (ie appending to arrays again and again),
producing a large list of empty arrays which finally ends up as being
flattened and then compacted.

I rewrote this part (sorry the code is not as beautiful as it was before
I touched it :-)) to never create those transient arrays, limiting as
much as we can array appending.

Brice Figureau

unread,
Mar 17, 2010, 3:14:55 PM3/17/10
to puppe...@googlegroups.com
With recursive file resources, many change events can be generated.
The method used to find the good ones is pretty inefficient allocating
arrays and/or appending to arrays which is a slow operation.

This patch rewrite the code to never append or allocate an array if
there is no matching edge, or to limit those operations to their strict
minimum needed in the other cases.

Results for matching 1100 events:
* old code: 22s
* new code: 4s

This patch also helps on the memory consumption side since the GC has
almost no work to perform.

Signed-off-by: Brice Figureau <brice-...@daysofwonder.com>
---

lib/puppet/simple_graph.rb | 20 ++++++++++++++++----
lib/puppet/transaction.rb | 18 ++++++++++--------
2 files changed, 26 insertions(+), 12 deletions(-)

diff --git a/lib/puppet/simple_graph.rb b/lib/puppet/simple_graph.rb
index 5e8f5cd..591c75d 100644
--- a/lib/puppet/simple_graph.rb
+++ b/lib/puppet/simple_graph.rb
@@ -64,6 +64,14 @@ class Puppet::SimpleGraph
end
end

+ def each_out_edges
+ @adjacencies[:out].values.each do |edges|
+ edges.each do |edge|
+ yield edge
+ end
+ end
+ end
+
# The other vertex in the edge.
def other_vertex(direction, edge)
case direction
@@ -146,7 +154,7 @@ class Puppet::SimpleGraph
# Collect all of the edges that the passed events match. Returns
# an array of edges.
def matching_edges(events, base = nil)
- events.collect do |event|
+ events.inject([]) do |matching,event|
source = base || event.source

unless vertex?(source)
@@ -156,10 +164,14 @@ class Puppet::SimpleGraph
# Get all of the edges that this vertex should forward events
# to, which is the same thing as saying all edges directly below
# This vertex in the graph.
- adjacent(source, :direction => :out, :type => :edges).find_all do |edge|
- edge.match?(event.name)
+
+ if wrapper = @vertices[source]
+ wrapper.each_out_edges do |edge|
+ matching << edge if edge.match?(event.name)
+ end
end
- end.compact.flatten
+ matching
+ end
end

# Return a reversed version of this graph.
diff --git a/lib/puppet/transaction.rb b/lib/puppet/transaction.rb
index e132b72..3d8e38e 100644
--- a/lib/puppet/transaction.rb
+++ b/lib/puppet/transaction.rb
@@ -215,15 +215,17 @@ class Transaction
# Collect the targets of any subscriptions to those events. We pass
# the parent resource in so it will override the source in the events,
# since eval_generated children can't have direct relationships.
- relationship_graph.matching_edges(events, resource).each do |orig_edge|
- # We have to dup the label here, else we modify the original edge label,
- # which affects whether a given event will match on the next run, which is,
- # of course, bad.
- edge = orig_edge.class.new(orig_edge.source, orig_edge.target, orig_edge.label)
- edge.event = events.collect { |e| e.name }
- set_trigger(edge)
+ Puppet::Util.benchmark(:debug, "Time for triggering events to edges") do
+ relationship_graph.matching_edges(events, resource).each do |orig_edge|
+ # We have to dup the label here, else we modify the original edge label,
+ # which affects whether a given event will match on the next run, which is,
+ # of course, bad.
+ puts "new edge: #{orig_edge.target} <- #{orig_edge.source} #{orig_edge.label}"
+ edge = orig_edge.class.new(orig_edge.source, orig_edge.target, orig_edge.label)
+ edge.event = events.collect { |e| e.name }
+ set_trigger(edge)
+ end
end
-
# And return the events for collection
events
end
--
1.6.6.1

Brice Figureau

unread,
Mar 17, 2010, 3:27:34 PM3/17/10
to puppe...@googlegroups.com

Note: 22s doesn't seem that many, but I also tested with about 5000
events and got about 2min (new code), and stopped it after more than
15min (old code).

It seems the previous code was at least quadratic (damn ruby allocator,
not the code itself).

Peter, if you have some spare time, I suggest you try this new patch on
top of the previous one (although they're not related).

Warning: I didn't really test the new code, and it isn't covered with
tests (it shouldn't modify any behavior).

Markus Roberts

unread,
Mar 17, 2010, 6:03:43 PM3/17/10
to puppet-dev
Tentative +1
I want to convince myself of its correctness in an edge case or two (it probably is) but other than that it looks great.


Luke Kanies

unread,
Mar 17, 2010, 7:38:17 PM3/17/10
to puppe...@googlegroups.com
You've got a spare 'put's in there, but otherwise, +1.

Do you think this might be just the wrong way of doing this matching
overall? My instincts around graphs seem to be particularly horrible,
so there's probably a much better way to do this.

> --
> You received this message because you are subscribed to the Google
> Groups "Puppet Developers" group.
> To post to this group, send email to puppe...@googlegroups.com.
> To unsubscribe from this group, send email to puppet-dev+...@googlegroups.com
> .
> For more options, visit this group at http://groups.google.com/group/puppet-dev?hl=en
> .
>


--
Man is the only animal that can remain on friendly terms with the
victims he intends to eat until he eats them.
-- Samuel Butler (1835-1902)

Luke Kanies

unread,
Mar 17, 2010, 7:39:45 PM3/17/10
to puppe...@googlegroups.com

Still... 2minutes just for the graph operations? That's pretty gross.

I suppose this wouldn't be mitigated at all if we switched to not
creating individual file resources per file, since you'd still need
the same number of events.

> It seems the previous code was at least quadratic (damn ruby
> allocator,
> not the code itself).
>
> Peter, if you have some spare time, I suggest you try this new patch
> on
> top of the previous one (although they're not related).
>
> Warning: I didn't really test the new code, and it isn't covered with
> tests (it shouldn't modify any behavior).

I think that code is pretty well covered, so it should be a straight
refactor.

--
'Tis better to be silent and be thought a fool, than to speak and
remove all doubt. --Abraham Lincoln

Luke Kanies

unread,
Mar 17, 2010, 9:15:49 PM3/17/10
to puppe...@googlegroups.com

IMO, you're the best positioned (because you actually understand
graphs) to see whether my original approach to the problem was flawed.

--
We all have strength enough to endure the misfortunes of others.
-- Francois de La Rochefoucauld

Trevor Vaughan

unread,
Mar 17, 2010, 10:17:47 PM3/17/10
to puppe...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Though it's a massive PITA, would anyone be up to trying a comparison
with RGL?

I've had good luck with it and, in theory, it's as optimized as you can
get in Ruby.

Trevor


- --
Trevor Vaughan
Vice President, Onyx Point, Inc.
email: tvau...@onyxpoint.com
phone: 410-541-ONYX (6699)

- -- This account not approved for unencrypted sensitive information --


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkuhjUcACgkQyWMIJmxwHpTaAACgyniT0zbmZ5mEiht7dRHY6zy5
iwgAnjv/R6HF2pGouC+rnw62CG3Ecf5p
=YWo/
-----END PGP SIGNATURE-----

Luke Kanies

unread,
Mar 18, 2010, 3:05:56 AM3/18/10
to puppe...@googlegroups.com
On Mar 17, 2010, at 7:17 PM, Trevor Vaughan wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Though it's a massive PITA, would anyone be up to trying a comparison
> with RGL?
>
> I've had good luck with it and, in theory, it's as optimized as you
> can
> get in Ruby.

It's been a while since I looked, but AFAIR RGL is just a ruby skin
over a C/C++ implementation of a graph library and, while relatively
fast, is also heinously non-ruby. As in, it feels really bad actually
writing code in it.

Really, though, I'm probably misinformed, because I know more about
GRATR -- which was essentially a response to RGL -- than I do about
RGL itself, so we could probably do well to back-end off of it.

I've also been considering experimenting with the java graph db. But
that's just because I ran into Emil again last night.

--
Computers are not intelligent. They only think they are.

Brice Figureau

unread,
Mar 18, 2010, 4:27:45 AM3/18/10
to puppe...@googlegroups.com
On Wed, 2010-03-17 at 16:38 -0700, Luke Kanies wrote:
> You've got a spare 'put's in there, but otherwise, +1.

OK, I'll remove this in a new version.

> Do you think this might be just the wrong way of doing this matching
> overall? My instincts around graphs seem to be particularly horrible,
> so there's probably a much better way to do this.

I must admit I'm not good in graphs at all. I even didn't try to
understand the whole concept of the SimpleGraph, I just tried to
micro-optimize the piece that was given me troubles.

But I think that even if the current implementation can be slow
simetimes (and fine in most cases), it is working fine, so I doubt the
implementation is incorrect. Maybe there can be a better implementation
or layout, but since I'm more than ignorant about graph data
structures/algorithms, my advices won't be that useful :-)

Brice Figureau

unread,
Mar 18, 2010, 4:29:19 AM3/18/10
to puppe...@googlegroups.com
On Thu, 2010-03-18 at 00:05 -0700, Luke Kanies wrote:
> On Mar 17, 2010, at 7:17 PM, Trevor Vaughan wrote:
>
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > Though it's a massive PITA, would anyone be up to trying a comparison
> > with RGL?
> >
> > I've had good luck with it and, in theory, it's as optimized as you
> > can
> > get in Ruby.
>
> It's been a while since I looked, but AFAIR RGL is just a ruby skin
> over a C/C++ implementation of a graph library and, while relatively
> fast, is also heinously non-ruby. As in, it feels really bad actually
> writing code in it.
>
> Really, though, I'm probably misinformed, because I know more about
> GRATR -- which was essentially a response to RGL -- than I do about
> RGL itself, so we could probably do well to back-end off of it.
>
> I've also been considering experimenting with the java graph db. But
> that's just because I ran into Emil again last night.

Are you talking about Neo4J or sth else?

Rein Henrichs

unread,
Mar 18, 2010, 5:56:45 AM3/18/10
to puppe...@googlegroups.com
Bruce's fork of GRATR, graphy, would be a better place to start any graphy algorithmy experimentation. Also, I was talking to the creator of Nanoc, a Ruby static site generation tool, and he reimplemented its own DAG class recently with pretty impressive performance improvements. There are definitely some examples in Ruby of graphing being done reasonably well. Also, I'm quite certain that Markus has already forgotten more about graph theory than I will ever learn.
--

Peter Meier

unread,
Mar 18, 2010, 6:11:17 AM3/18/10
to puppe...@googlegroups.com
>> OK, I identified the issue (patch coming soon):
>>
>> Puppet::SimpleGraph#matching_edge is simply completely inefficient in
>> front of lots of events for a given resource (which is our case).
>> The problem is that matching_edge returns the list of edges for which we
>> should trigger an event. In the aforementioned manifest, since there is
>> no subscribe/notify, the list is empty.
>>
>> Unfortunately with the current code, it takes about 22s to find this
>> result for about 1100 events. With the (soon to be published patch), I
>> was able to reduce this time to 4s.
>
> Note: 22s doesn't seem that many, but I also tested with about 5000
> events and got about 2min (new code), and stopped it after more than
> 15min (old code).
>
> It seems the previous code was at least quadratic (damn ruby allocator,
> not the code itself).
>
> Peter, if you have some spare time, I suggest you try this new patch on
> top of the previous one (although they're not related).

Brice, thanks for that follow up. I tested your patch on a directory
containing ~8k of files or directories. Puppet finished chmod and
chown the files as before in about 1-2 minutes. Still it hangs
afterwards burning cpu. However it does not anymore grow an enormous
amount of memory! The amount of memory remains stable now, which is
great.
However after those 1-2 minutes it hangs for about 60 minutes burning
cpu as before and finishes after 62 minutes. However as discussed in
IRC this is imho more a matter of traveling the graph.

To be clear: this is really only a problem within the first run when
puppet touches _every_ file. If it has to touch only a few files it
finishes much faster. For example in about 7 minutes to fix 1k of
files. Which means imho that the main problem remains in passing the
events up.

Still regarding the features and advantages you get by chown/chmoding
such a structure with puppet resources, it would be great if that
could be passed in a reasonable amount of time.

cheers pete

Brice Figureau

unread,
Mar 18, 2010, 6:21:57 AM3/18/10
to puppe...@googlegroups.com

60 minutes is still too high for managing 8k change events that won't
ever match (or be used in the end) :-(

It's certainly better (in time) to what it would be without the patch
(ie do you have this number, assuming the server is not swapping)?

I think we need a smarter way of matching those events, something
involving pruning the graph of some sort. I'll try to think about this.

> To be clear: this is really only a problem within the first run when
> puppet touches _every_ file. If it has to touch only a few files it
> finishes much faster. For example in about 7 minutes to fix 1k of
> files. Which means imho that the main problem remains in passing the
> events up.
>
> Still regarding the features and advantages you get by chown/chmoding
> such a structure with puppet resources, it would be great if that
> could be passed in a reasonable amount of time.

The problem is not only for chown/chmoding. It just means we are
inefficient at managing many events (granted there might not be too many
real situation generating that many events).

Peter Meier

unread,
Mar 18, 2010, 6:27:32 AM3/18/10
to puppe...@googlegroups.com
> 60 minutes is still too high for managing 8k change events that won't
> ever match (or be used in the end) :-(

+1

> It's certainly better (in time) to what it would be without the patch
> (ie do you have this number, assuming the server is not swapping)?

yes, I think that might be the best explanation.

> I think we need a smarter way of matching those events, something
> involving pruning the graph of some sort. I'll try to think about this.

Thanks! I'm happy to test anything you bring up. Unfortunately I don't
have enough time to support you in finding an appropriate proposal.

>> To be clear: this is really only a problem within the first run when
>> puppet touches _every_ file. If it has to touch only a few files it
>> finishes much faster. For example in about 7 minutes to fix 1k of
>> files. Which means imho that the main problem remains in passing the
>> events up.
>>
>> Still regarding the features and advantages you get by chown/chmoding
>> such a structure with puppet resources, it would be great if that
>> could be passed in a reasonable amount of time.
>
> The problem is not only for chown/chmoding. It just means we are
> inefficient at managing many events (granted there might not be too many
> real situation generating that many events).

yes.

cheers pete

Brice Figureau

unread,
Mar 18, 2010, 7:06:01 AM3/18/10
to puppe...@googlegroups.com
On Wed, 2010-03-17 at 16:38 -0700, Luke Kanies wrote:
> You've got a spare 'put's in there, but otherwise, +1.
>
> Do you think this might be just the wrong way of doing this matching
> overall? My instincts around graphs seem to be particularly horrible,
> so there's probably a much better way to do this.

There's something I'm not sure I completely understood:
In this case of interest, all the "change" events are coming from the
generated sub file resources.
If I correctly read the code, we are collecting events from all those
generated resources.

Then we take the source (ie the root file of the recursive file) and for
each event scan all the edges out of this resources (most if not all
will be the auto-required generated sub file resources) to find those
that match (this cross join between events and out edges is what is
killing our performance).

As I understand it, we're trying to flow the events back to where they
were created, am I correct?
If yes, this should be completely un-needed, and we should prevent the
events to flow along those edges.

Trevor Vaughan

unread,
Mar 18, 2010, 8:18:36 PM3/18/10
to puppe...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I haven't used any of the other implementations, but I agree that a pure
Ruby solution would be preferred.

I was using it for the native Graphviz capabilities at the time.

Trevor

On 03/18/2010 03:05 AM, Luke Kanies wrote:
> On Mar 17, 2010, at 7:17 PM, Trevor Vaughan wrote:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Though it's a massive PITA, would anyone be up to trying a comparison
>> with RGL?
>>
>> I've had good luck with it and, in theory, it's as optimized as you can
>> get in Ruby.
>
> It's been a while since I looked, but AFAIR RGL is just a ruby skin over
> a C/C++ implementation of a graph library and, while relatively fast, is
> also heinously non-ruby. As in, it feels really bad actually writing
> code in it.
>
> Really, though, I'm probably misinformed, because I know more about
> GRATR -- which was essentially a response to RGL -- than I do about RGL
> itself, so we could probably do well to back-end off of it.
>
> I've also been considering experimenting with the java graph db. But
> that's just because I ran into Emil again last night.
>

- --

Trevor Vaughan
Vice President, Onyx Point, Inc.
email: tvau...@onyxpoint.com
phone: 410-541-ONYX (6699)

- -- This account not approved for unencrypted sensitive information --
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkuiwtwACgkQyWMIJmxwHpQlyQCfbKeyPzsALNmsOSnEkKvzbAJh
3lYAoMxNmYRKyhUde7Laqs9kVWfgSX9D
=y4QB
-----END PGP SIGNATURE-----

Trevor Vaughan

unread,
Mar 18, 2010, 8:22:39 PM3/18/10
to puppe...@googlegroups.com, Brice Figureau
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Would it be possible to detect a recursive file operation, stop at that
point in the graph and then simply call the corresponding Ruby code the
rest of the way down for specific operations?

I can, unfortunately, see this approach wreaking havok with my recent
symbolic mode patch as an example of where it wouldn't work.

Alternatively, instead of merging all of the sub-file objects into the
main graph, could you create a subgraph that is built off of a DFS from
that originating recurse point?

This would allow you to keep everything the same all the way, but
*should* vastly speed things up since this will be a subgraph liner DAG
operation instead of a full graph merge.

Trevor


>
> I think we need a smarter way of matching those events, something
> involving pruning the graph of some sort. I'll try to think about this.
>
>

- --

Trevor Vaughan
Vice President, Onyx Point, Inc.
email: tvau...@onyxpoint.com
phone: 410-541-ONYX (6699)

- -- This account not approved for unencrypted sensitive information --
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkuiw88ACgkQyWMIJmxwHpS30QCgzge6zcpwRUiy3q9HsDv0TAFp
OD0AoJk8/iZzsPSCq83vUeiUEkHnaFCM
=i4jS
-----END PGP SIGNATURE-----

Brice Figureau

unread,
Mar 19, 2010, 4:13:50 AM3/19/10
to puppe...@googlegroups.com
On Thu, 2010-03-18 at 20:22 -0400, Trevor Vaughan wrote:
> Would it be possible to detect a recursive file operation, stop at that
> point in the graph and then simply call the corresponding Ruby code the
> rest of the way down for specific operations?
>
> I can, unfortunately, see this approach wreaking havok with my recent
> symbolic mode patch as an example of where it wouldn't work.
>
> Alternatively, instead of merging all of the sub-file objects into the
> main graph, could you create a subgraph that is built off of a DFS from
> that originating recurse point?

I thought about this. It all depends on how we want to consider those
generated resources: do they deserve being regular resources, or are
they somewhat special?

> This would allow you to keep everything the same all the way, but
> *should* vastly speed things up since this will be a subgraph liner DAG
> operation instead of a full graph merge.

For the moment I'll fix the current issue, which is that we don't need
to propagate generated sub-resource events back to them (at least for
the file resources, if there are other generated resources needing this
let me know).

This is clearly a bug in the current implementation, coming from the
fact that the code masquerades the real event generator with the topmost
file resource. If we were sending the events from the subgenerated
resources we wouldn't do that at all. Unfortunately there are
"auto-require" relationships to this topmost node to every generated
sub-file-resources, thus the code "sends" the events back to every
generated sub-file-resources. This makes this algorithm something along
of O(kn^2) with n = number of generated resources, and k number of
different kind of changed events.

I believe it is not necessary to flow those sub-events back to where
they were spawned. The very next fix (coming during the week-end I
think) will make sure those events are propagated only to non-generated
resources. This should fix this issue one for all (we can still keep my
previous patch because it is clearly nicer in terms of memory usage).

Markus Roberts

unread,
Mar 19, 2010, 10:41:40 AM3/19/10
to puppet-dev
Brice --

Though I've been quiet on this thread I've been following it with interest.  I have two favors to ask:

* Since we appear agreed the previous patch is a Good Thing, even if it's not the ultimate solution to everything, could you put it in a branch, add it to the ticket, and mark it "Ready for Testing"

* Could you open a new ticket for the forthcoming more general fix, and do the same with that change as soon as it's available?

I'm trying to get all the things the we'd like to see in Rowlf rolled into the testing branch in the next week, so we can bang them against each other.

-- Markus

Brice Figureau

unread,
Mar 19, 2010, 10:54:56 AM3/19/10
to puppe...@googlegroups.com
On Fri, 2010-03-19 at 07:41 -0700, Markus Roberts wrote:
> Brice --
>
> Though I've been quiet on this thread I've been following it with
> interest.

Too bad, your comments are always insightful, and you might have some
clever idea to fix the aforementioned issue.

> I have two favors to ask:
>
> * Since we appear agreed the previous patch is a Good Thing, even if
> it's not the ultimate solution to everything, could you put it in a
> branch, add it to the ticket, and mark it "Ready for Testing"

The branch is currently against 0.25.x, should I create a new one
against master and/or testing?

> * Could you open a new ticket for the forthcoming more general fix,
> and do the same with that change as soon as it's available?

Will do.

> I'm trying to get all the things the we'd like to see in Rowlf rolled
> into the testing branch in the next week, so we can bang them against
> each other.

OK, I understand.

More generally, if, say, I want to produce a new patch for Rowlf (I'm
thinking about some random puppetdoc fix or feature), what is the more
convenient for you:
* base it on top of the latest available testing (or somewhere in the
middle)
* base it on master

Thanks,

Brice Figureau

unread,
Mar 19, 2010, 11:07:42 AM3/19/10
to puppe...@googlegroups.com
On Fri, 2010-03-19 at 07:41 -0700, Markus Roberts wrote:
> Brice --
>
> Though I've been quiet on this thread I've been following it with
> interest. I have two favors to ask:
>
> * Since we appear agreed the previous patch is a Good Thing, even if
> it's not the ultimate solution to everything, could you put it in a
> branch, add it to the ticket, and mark it "Ready for Testing"

Done.

> * Could you open a new ticket for the forthcoming more general fix,
> and do the same with that change as soon as it's available?

Done, for reference it's ticket: #3396
http://projects.reductivelabs.com/issues/3396

Markus Roberts

unread,
Mar 19, 2010, 11:12:52 AM3/19/10
to puppet-dev
> Though I've been quiet on this thread I've been following it with
> interest.

Too bad, your comments are always insightful, and you might have some
clever idea to fix the aforementioned issue.

I'd love to have some clever ideas, but at present I've mostly got to-do list items.  :(
 
>  I have two favors to ask:
>
> * Since we appear agreed the previous patch is a Good Thing, even if
> it's not the ultimate solution to everything, could you put it in a
> branch, add it to the ticket, and mark it "Ready for Testing"

The branch is currently against 0.25.x, should I create a new one
against master and/or testing?

In general against master, though 0.25.x is probably fine for this.
 
More generally, if, say, I want to produce a new patch for Rowlf (I'm
thinking about some random puppetdoc fix or feature), what is the more
convenient for you:
 * base it on top of the latest available testing (or somewhere in the
middle)
 * base it on master

Against master unless I say otherwise; if ask you to base it against testing then shoot me and do it against anyway  master.

We've played around with various ways of basing things mid-testing but with an ephemeral branch it seems to bring nothing but grief.  The closest thing that works is, if there is a topic branch that needs a fix, to base the fix off the topic branch, but even that is problematic.

-- Markus


 

Luke Kanies

unread,
Mar 19, 2010, 1:01:22 PM3/19/10
to puppe...@googlegroups.com

You are right - we could probably do a quick test before we collect
events, to determine if there are any subscriptions. If not, we can
just ignore the events entirely and save a bunch of time.

There's probably also some kind of 'break' we can do - functionally,
one event is equivalent to 100 events, in terms of the behaviour of
the system, so we could just keep a boolean and pass it around, or
maybe just count the events or something.

I think there are a couple of different optimizations available.

--
The one thing more difficult than following a regimen is not imposing it
on others. -- Marcel Proust

Luke Kanies

unread,
Mar 19, 2010, 1:13:32 PM3/19/10
to puppe...@googlegroups.com

Yep. It's the only one around, AFIAK.

--
The point of living and of being an optimist, is to be foolish enough
to believe the best is yet to come. -- Peter Ustinov

Luke Kanies

unread,
Mar 19, 2010, 1:15:34 PM3/19/10
to puppe...@googlegroups.com

I agree.

>> To be clear: this is really only a problem within the first run when
>> puppet touches _every_ file. If it has to touch only a few files it
>> finishes much faster. For example in about 7 minutes to fix 1k of
>> files. Which means imho that the main problem remains in passing the
>> events up.
>>
>> Still regarding the features and advantages you get by chown/chmoding
>> such a structure with puppet resources, it would be great if that
>> could be passed in a reasonable amount of time.
>
> The problem is not only for chown/chmoding. It just means we are
> inefficient at managing many events (granted there might not be too
> many
> real situation generating that many events).

I agree, again. This is unacceptable.

--
It has recently been discovered that research causes cancer in
labratory rats.

Luke Kanies

unread,
Mar 19, 2010, 1:18:03 PM3/19/10
to puppe...@googlegroups.com
The main problem here, I think, isn't the performance of the graph but rather the fact that we're collecting huge numbers of events and trying to match them against graph vertices for no good reason.

I'd bet that our graph library actually performs pretty comparably to most others, other than, of course, GRATR, against which it will look like a GT-R pitted against a child's wooden car.  We don't have all the algorithms, but this is more of a conceptual shortcoming we have right now, rather than an algorithmic one.


-- 
I can win an argument on any topic, against any opponent. People know
this, and steer clear of me at parties. Often, as a sign of their
great respect, they don't even invite me. -- Dave Barry

Luke Kanies

unread,
Mar 19, 2010, 1:24:42 PM3/19/10
to puppe...@googlegroups.com
On Mar 19, 2010, at 1:13 AM, Brice Figureau wrote:

> On Thu, 2010-03-18 at 20:22 -0400, Trevor Vaughan wrote:
>> Would it be possible to detect a recursive file operation, stop at
>> that
>> point in the graph and then simply call the corresponding Ruby code
>> the
>> rest of the way down for specific operations?
>>
>> I can, unfortunately, see this approach wreaking havok with my recent
>> symbolic mode patch as an example of where it wouldn't work.
>>
>> Alternatively, instead of merging all of the sub-file objects into
>> the
>> main graph, could you create a subgraph that is built off of a DFS
>> from
>> that originating recurse point?
>
> I thought about this. It all depends on how we want to consider those
> generated resources: do they deserve being regular resources, or are
> they somewhat special?

IMO, this is the right approach, with some caveats. The only real
concerns I see with it are that we begin to lose some clarity over
what's specifically happening (because we're treating all of the files
as a collection, rather than individually), and if we do have rollback
at some point this will considerably complicate it.

However, I think that as long as we produce events with sufficient
information (i.e., they have to mention both the actual file and the
parent file), we should be able to do rollback without too much
effort, albeit it will require some hackery. That hackery will likely
rarely get used, though, and at this point we're doing way more
hackery in order to get recursive file management in its current form.

>> This would allow you to keep everything the same all the way, but
>> *should* vastly speed things up since this will be a subgraph liner
>> DAG
>> operation instead of a full graph merge.
>
> For the moment I'll fix the current issue, which is that we don't need
> to propagate generated sub-resource events back to them (at least for
> the file resources, if there are other generated resources needing
> this
> let me know).

There are no other resource types that generate resources mid-
transaction.

Are you sure we don't need to do any propagation? If a service is
subscribed to a file tree and only one change is made but it's lower
in the tree, don't we want that event to propagate?

> This is clearly a bug in the current implementation, coming from the
> fact that the code masquerades the real event generator with the
> topmost
> file resource. If we were sending the events from the subgenerated
> resources we wouldn't do that at all. Unfortunately there are
> "auto-require" relationships to this topmost node to every generated
> sub-file-resources, thus the code "sends" the events back to every
> generated sub-file-resources. This makes this algorithm something
> along
> of O(kn^2) with n = number of generated resources, and k number of
> different kind of changed events.
>
> I believe it is not necessary to flow those sub-events back to where
> they were spawned. The very next fix (coming during the week-end I
> think) will make sure those events are propagated only to non-
> generated
> resources. This should fix this issue one for all (we can still keep
> my
> previous patch because it is clearly nicer in terms of memory usage).

Ah; that makes sense.

--
Calamities are of two kinds: misfortunes to ourselves, and good
fortune to others. -- Ambrose Bierce

Reply all
Reply to author
Forward
0 new messages