<<<<<<<
def matching_edges(events, base = nil)
events.inject([]) do |matching,event|
source = base || event.source
unless vertex?(source)
Puppet.warning "Got an event from invalid vertex %s" %
source.ref
next
end
# Get all of the edges that this vertex should forward events
# to, which is the same thing as saying all edges directly below
# This vertex in the graph.
if wrapper = @vertices[source]
wrapper.each_out_edges do |edge|
matching << edge if edge.match?(event.name)
end
end
matching
end
end
|||||||
def matching_edges(events, base = nil)
events.collect do |event|
source = base || event.source
unless vertex?(source)
Puppet.warning "Got an event from invalid vertex %s" %
source.ref
next
end
# Get all of the edges that this vertex should forward events
# to, which is the same thing as saying all edges directly below
# This vertex in the graph.
adjacent(source, :direction => :out, :type =>
:edges).find_all do |edge|
edge.match?(event.name)
end
end.compact.flatten
end
=======
def matching_edges(event, base = nil)
source = base || event.resource
unless vertex?(source)
Puppet.warning "Got an event from invalid vertex #{source.ref}"
return []
end
# Get all of the edges that this vertex should forward events
# to, which is the same thing as saying all edges directly below
# This vertex in the graph.
adjacent(source, :direction => :out, :type => :edges).find_all do |edge|
edge.match?(event.name)
end
end
>>>>>>>
-- Markus
-----------------------------------------------------------
The power of accurate observation is
commonly called cynicism by those
who have not got it. ~George Bernard Shaw
------------------------------------------------------------
> In one corner, we have Luke's event patch (#2759); in the other,
> Brice's checksum => none patch (#2929). In the middle, the current
> champion, the code from master. So what should the final merged
> result look like?
It's unfortunately not that simple - if I'm reading this correctly, I
changed the method profile in 'matching_edges' to accept just one
event instead of a list of events.
In looking briefly at my code, I've split what a moderately ugly but
all-in-one-place method into a bunch of different methods, and I think
that splitting quite likely invalidated Brice's patch.
In looking briefly, I think the fix is to change
EventManager#queue_event to accept an array of events, and then change
SimpleGraph#matching_edges back to accepting an array, also. This
should allow Brice's patch to apply relatively cleanly, I think.
I also see two easy optimizations that should probably actually
drastically reduce the effort in this loop (both Brice's and my code):
1) find the edges-per-source outside the per-event loop - for file
recursion, this will all be the same source, which means you're only
doing one test for adjacency - and 2) collect the event names and just
test on that, since we're likely (again, for large numbers of events)
to have few event names (e.g., file_changed is generated by most of
the file operations, I think). My guess is that those two together
will eliminate the problems we're having here.
Brice, would you and Peter be able to test this on the 0.25.x version
of the patch, and then I can make my recommended changes to my events
branch?
> --
> You received this message because you are subscribed to the Google
> Groups "Puppet Developers" group.
> To post to this group, send email to puppe...@googlegroups.com.
> To unsubscribe from this group, send email to puppet-dev+...@googlegroups.com
> .
> For more options, visit this group at http://groups.google.com/group/puppet-dev?hl=en
> .
>
--
He attacked everything in life with a mix of extraordinary genius and
naive incompetence, and it was often difficult to tell which was
which. -- Douglas Adams
---------------------------------------------------------------------
Luke Kanies -|- http://reductivelabs.com -|- +1(615)594-8199
I'll test and try to give feedback on whatever comes up for 0.25.x.
I'm now running anyway on 0.25.x (with brice's and bryan's patches)
and I have some use cases where I would be happy to use rather puppet
to chown files and directories recursively.
cheers pete
I already started doing this more or less.
2) collect the event names and just
> test on that, since we're likely (again, for large numbers of events) to
> have few event names (e.g., file_changed is generated by most of the
> file operations, I think). My guess is that those two together will
> eliminate the problems we're having here.
I hadn't thought of this one, but it's a good idea. In the recursive
file resource case, we're propagating one event type (ie file_changed),
that should drastically reduce the matching.
> Brice, would you and Peter be able to test this on the 0.25.x version of
> the patch, and then I can make my recommended changes to my events branch?
I'll send a patch later in the week-end.
--
Brice Figureau
My Blog: http://www.masterzen.fr/
I've a good news for you. I removed from #2929 the patch that produces
the conflict and isolated it in brice:tickets/0.25.x/3396.
That way, when Luke will modify his events branch, you'll be able to
merge relatively easily #3396.
Here is my attempt at fixing this issue and taking into account Luke's
ideas. I did only some minimal testing, but it looks like it works fine
(and is now ultra fast for the case Peter is concerned about).
The patch is still against 0.25.x, and can be located in my github repo-
sitory in the branch tickets/0.25.x/3396. This branch also contains the
checksum => none patch of #2929.
Please review,
Thanks,
Brice
Original commit msg:
This method has two issues:
* it is inefficient when there are many events
* it tries to match edges that shouldn't be matched
With recursive file resources, many change events can be generated.
The method used to find the good ones is pretty inefficient, allocating
arrays and/or appending to arrays which is a slow operation that can
consume lot of memory.
Still with recursife file resources, the current code tries to match the
events with edges pointing to generated sub-file-resources, which is not
necessary. In fact this all happens because we masquerade the sub-generated
resources with the topmost resource whic itself has auto-required links
to them. There is no reason to send back those events to where they were
generated.
This patch tries to minimize allocations or array appending, it also collect
event names (instead of events themselve) while matching since there are
great chances there are way less events names than events (and we're matchin
by name).
This patch also makes sure we select only edges that don't point back to
the event sources.
Results for matching 1100 events:
* old code: 22s
* new code: 0.02s
This patch also helps on the memory consumption side since the GC has
almost no work to perform.
Signed-off-by: Brice Figureau <brice-...@daysofwonder.com>
---
lib/puppet/simple_graph.rb | 49 +++++++++++++++++++++++++++++++++----------
lib/puppet/transaction.rb | 18 +++++++++-------
spec/unit/simple_graph.rb | 14 ++++++++++++
3 files changed, 61 insertions(+), 20 deletions(-)
diff --git a/lib/puppet/simple_graph.rb b/lib/puppet/simple_graph.rb
index 5e8f5cd..37a08e7 100644
--- a/lib/puppet/simple_graph.rb
+++ b/lib/puppet/simple_graph.rb
@@ -64,6 +64,14 @@ class Puppet::SimpleGraph
end
end
+ def each_out_edges
+ @adjacencies[:out].values.each do |edges|
+ edges.each do |edge|
+ yield edge
+ end
+ end
+ end
+
# The other vertex in the edge.
def other_vertex(direction, edge)
case direction
@@ -146,20 +154,37 @@ class Puppet::SimpleGraph
# Collect all of the edges that the passed events match. Returns
# an array of edges.
def matching_edges(events, base = nil)
- events.collect do |event|
- source = base || event.source
-
- unless vertex?(source)
- Puppet.warning "Got an event from invalid vertex %s" % source.ref
- next
+ # collect edges out from sources
+ if base
+ # consider only edges which are not pointing to any event sources
+ # which is what a recursive file resources produces
+ event_sources = events.inject({}) { |hash, event| hash[event.source] = event.source ; hash}
+ edges = (adjacent(base, :direction => :out, :type => :edges) - event_sources.keys)
+ else
+ edges = events.inject([]) do |edges,event|
+ if wrapper = @vertices[event.source]
+ wrapper.each_out_edges do |edge|
+ edges << edge
+ end
+ else
+ Puppet.warning "Got an event from invalid #{event.source.ref}"
+ end
+ edges
end
- # Get all of the edges that this vertex should forward events
- # to, which is the same thing as saying all edges directly below
- # This vertex in the graph.
- adjacent(source, :direction => :out, :type => :edges).find_all do |edge|
- edge.match?(event.name)
+ end
+
+ # find all the different event names, we assume there won't be that many
+ # which should greatly shorten the next loop and reduce the cross-join
+ # between events x out-edges
+ event_names = events.inject({}) { |hash, event| hash[event.name] = event.name ; hash }
+
+ # match all our events to the edges
+ event_names.keys.inject([]) do |matching, event_name|
+ edges.each do |edge|
+ matching << edge if edge.match?(event_name)
end
- end.compact.flatten
+ matching
+ end
end
# Return a reversed version of this graph.
diff --git a/lib/puppet/transaction.rb b/lib/puppet/transaction.rb
index e132b72..53b1c51 100644
--- a/lib/puppet/transaction.rb
+++ b/lib/puppet/transaction.rb
@@ -215,15 +215,17 @@ class Transaction
# Collect the targets of any subscriptions to those events. We pass
# the parent resource in so it will override the source in the events,
# since eval_generated children can't have direct relationships.
- relationship_graph.matching_edges(events, resource).each do |orig_edge|
- # We have to dup the label here, else we modify the original edge label,
- # which affects whether a given event will match on the next run, which is,
- # of course, bad.
- edge = orig_edge.class.new(orig_edge.source, orig_edge.target, orig_edge.label)
- edge.event = events.collect { |e| e.name }
- set_trigger(edge)
+ Puppet::Util.benchmark(:debug, "Time for triggering #{events.size} events to edges") do
+ b = relationship_graph.matching_edges(events, resource)
+ b.each do |orig_edge|
+ # We have to dup the label here, else we modify the original edge label,
+ # which affects whether a given event will match on the next run, which is,
+ # of course, bad.
+ edge = orig_edge.class.new(orig_edge.source, orig_edge.target, orig_edge.label)
+ edge.event = events.collect { |e| e.name }
+ set_trigger(edge)
+ end
end
-
# And return the events for collection
events
end
diff --git a/spec/unit/simple_graph.rb b/spec/unit/simple_graph.rb
index 22d6ebe..7919d0c 100755
--- a/spec/unit/simple_graph.rb
+++ b/spec/unit/simple_graph.rb
@@ -372,6 +372,20 @@ describe Puppet::SimpleGraph do
edges.should be_include(@edges["a/b"])
edges.should be_include(@edges["a/c"])
end
+
+ describe "from generated resources" do
+ before :each do
+ @topsource = stub 'source'
+ @edges["a/topsource"] = Puppet::Relationship.new(@topsource, "a", {:event => :yay, :callback => :refresh})
+ @graph.add_edge(@edges["a/topsource"])
+ end
+
+ it "should not match with edges pointing back to events sources" do
+ @edges["a/b"].expects(:match?).never
+ @edges["a/topsource"].expects(:match?)
+ @graph.matching_edges([@event], @topsource)
+ end
+ end
end
describe "when determining dependencies" do
--
1.6.6.1
Do you expect cases where it shouldn't be ultra fast now?
> The patch is still against 0.25.x, and can be located in my github repo-
> sitory in the branch tickets/0.25.x/3396. This branch also contains the
> checksum => none patch of #2929.
>
> Please review,
debug: Time for triggering 3 events to edges in 0.00 seconds
debug: Time for triggering 23044 events to edges in 13.70 seconds
debug: Time for triggering 0 events to edges in 0.00 seconds
debug: Time for triggering 0 events to edges in 0.00 seconds
debug: Time for triggering 0 events to edges in 0.00 seconds
debug: Time for triggering 0 events to edges in 0.00 seconds
debug: Finishing transaction 23456251255100 with 23044 changes
real 4m9.660s
user 3m3.727s
sys 0m25.846s
Great Work!
It was still hanging for about 30s after reporting to have finished
the transaction and still burning cpu, but this is rather negligible
concerning the amount of time it wasted before and this might also not
be related to that issue.
cheers pete
> Brice's checksum => none patch (#2929). In the middle, the currentI've a good news for you. I removed from #2929 the patch that produces
> champion, the code from master.
the conflict and isolated it in brice:tickets/0.25.x/3396.
That way, when Luke will modify his events branch, you'll be able to
merge relatively easily #3396.
We're actually getting to the point where recursive file management is
reasonable to do on large file sets. Not quite, but we're a heckuva
lot closer. Wish we'd known earlier what a barrier this (relatively
simple, in the end) problem was to efficiency.
--
Today at work an ethernet switch decided to take the 'N' out of NVRAM
-- Richard Letts
btw: ohad mentioned that puppet might be calculating the yaml for the
report while still burning cpu. This might actually be the case.
However 1) does puppet standalone actually do any reports? b) if yes,
looks like it can't be turned off with --report false
cheers pete
I think it does the transaction cleanup which involves removing from the
graph all the generated resources. Granted it has no use in puppet, but
not puppetd.
Wish we'd known earlier what a barrier this (relatively simple, in the end) problem was to efficiency.
It's unfortunately not that simple - if I'm reading this correctly, I changed the method profile in 'matching_edges' to accept just one event instead of a list of events.
In looking briefly at my code, I've split what a moderately ugly but all-in-one-place method into a bunch of different methods, and I think that splitting quite likely invalidated Brice's patch.
In looking briefly, I think the fix is to change EventManager#queue_event to accept an array of events, and then change SimpleGraph#matching_edges back to accepting an array, also. This should allow Brice's patch to apply relatively cleanly, I think.
Yes, I added the benchmark as a way to monitor if we did progress or
not. It's absolutely not necessary and we (at least I) should remove it.
Especially if that makes merging easier.
Yes, the first 3 are failing because they're waiting for an array but
Luke's patch sends events one by one (apparently, I didn't re-read the
code).
The last one is more problematic, since if Luke removed the source
accessor from the event, we won't be able to prune our relationship
graph (which is where we got all the gain) in the case of the recursive
file resource.
> From Luke's comments
> on the cage-match thread:
>
>
>> It's unfortunately not that simple - if I'm reading this correctly, I
>> changed the method profile in 'matching_edges' to accept just one event
>> instead of a list of events.
>>
>> In looking briefly at my code, I've split what a moderately ugly but
>> all-in-one-place method into a bunch of different methods, and I think that
>> splitting quite likely invalidated Brice's patch.
>>
>> In looking briefly, I think the fix is to change EventManager#queue_event
>> to accept an array of events, and then change SimpleGraph#matching_edges
>> back to accepting an array, also. This should allow Brice's patch to apply
>> relatively cleanly, I think.
>>
>
> I gather that when I get that patch from him it should all go smoothly, with
> the possible exception of the event.source test. Does that sound correct?
I have the same analysis. I'm going to read Luke's event patch again
(was it published on puppet-dev?) and see what happened to this source
accessor.
See the discussion thread titled "Events and reports refactor" from
last November and the code storm on 20 January of this year.
-- Markus
I explored Lak's repositories and found that event.source was renamed
into event.resource (apparently).
> On 21/03/10 15:48, Markus Roberts wrote:
>>> I have the same analysis. I'm going to read Luke's event patch again
>>> (was it published on puppet-dev?) and see what happened to this
>>> source
>>> accessor.
>>
>> See the discussion thread titled "Events and reports refactor" from
>> last November and the code storm on 20 January of this year.
>
> I explored Lak's repositories and found that event.source was renamed
> into event.resource (apparently).
Something like that - it looks like that should work for the purposes
of this, although I'd definitely test it. I think for most cases both
the resource and property will be set (to strings), but I think there
are some cases where just the source_description is set. For some
reason I've made 'log_source' a private method; this would probably
actually be the best thing to use, since it picks the best value.
--
I believe that if it were left to artists to choose their own labels,
most would choose none. -- Ben Shahn
Wish we'd known earlier what a barrier this (relatively simple, in the end) problem was to efficiency.
I had it drummed into me thirty some years ago that the very first step in solving performance problems is _always_ nailing down what's causing the problem with hard data and repeatable cases, and I've never seen a counter example. Or, to turn it around, all our performance problems will turn out to be "relatively simple" once someone tracks them down and nails them to the floor with a hot rivet gun.
Great work Brice!
That's not actually done any more, is it? I don't see it in the
testing branch, at least, and I don't see a 'finish' call or anything
similar in 0.25.x.
And FTR - yes, puppet (stand-alone) does both reporting and graphing
if you ask it to.
I don't know the status of testing, but in 0.25.x we call
Transaction#cleanup which removes all the generated resources from the
graph.
> And FTR - yes, puppet (stand-alone) does both reporting and graphing
> if you ask it to.
OK, good to know.
--
Brice Figureau
Follow the latest Puppet Community evolutions on www.planetpuppet.org!
Hmm, you're right; I missed it because it's called by the catalog
rather than the transaction.
We should be able to get rid of that. Peter - could you test
commenting out that call in Catalog.rb and see if that removes the
large wait you have?
>> And FTR - yes, puppet (stand-alone) does both reporting and graphing
>> if you ask it to.
>
> OK, good to know.
--
To have a right to do a thing is not at all the same as to be right
in doing it. -- G. K. Chesterton
>>>>> btw: ohad mentioned that puppet might be calculating the yaml for the
>>>>> report while still burning cpu. This might actually be the case.
>>>>> However
>>>>> 1) does puppet standalone actually do any reports? b) if yes, looks
>>>>> like
>>>>> it can't be turned off with --report false
>>>>
>>>> I think it does the transaction cleanup which involves removing from
>>>> the
>>>> graph all the generated resources. Granted it has no use in puppet,
>>>> but
>>>> not puppetd.
>>>
>>> That's not actually done any more, is it? I don't see it in the
>>> testing branch, at least, and I don't see a 'finish' call or anything
>>> similar in 0.25.x.
>>
>> I don't know the status of testing, but in 0.25.x we call
>> Transaction#cleanup which removes all the generated resources from the
>> graph.
>
> Hmm, you're right; I missed it because it's called by the catalog rather
> than the transaction.
>
> We should be able to get rid of that. Peter - could you test commenting
> out that call in Catalog.rb and see if that removes the large wait you
> have?
I owed you all an answer for that an while cleaning up my marked e-mails
in the inbox (ha! sunday) I commented the two cleanup statements out and
it was really not hanging anymore at the end. So yes looks like it hangs
at the cleanup.
Do you need any more information, should I file a bug or is that anyway
removed in rowfl?
cheers pete
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAkvBzR4ACgkQbwltcAfKi3/IIQCdFCt0NW225Y3PIXbVk0rp+wDr
pqEAn3n7jPZhlWbafyV8qk/WRCgRQh8i
=iZ6Q
-----END PGP SIGNATURE-----
If you could file a high-priority bug, I'd appreciate it (unless Brice
can just knock it out today). I can't fix it right this minute, and
we need a bug associated with it anyway.
--
Never confuse movement with action. -- Ernest Hemingway
---------------------------------------------------------------------
Luke Kanies -|- http://puppetlabs.com -|- +1(615)594-8199
http://projects.puppetlabs.com/issues/3533
cheers pete