Trying to consolidate/batch command execution in a provider

36 views
Skip to first unread message

Shawn Ferry

unread,
Jan 27, 2016, 1:52:15 PM1/27/16
to puppe...@googlegroups.com
I have a command that takes one or more effectively free form arguments and executes somewhat slowly and sometimes if things are changing much more slowly. Lets say 30s nominal execution in the simple case.

I’m not finding a list of hooks to see if anything is appropriate. Flush would be great if I was processing a bunch of individual arguments for a resource but I need something that allows me to defer these updates until the end of processing for a provider and flush them together.

Am I missing an alternate method to do something like this or the correct place in the docs?


Thanks
Shawn


If I have something like the following it takes at least 2 minutes

slow_command { [‘thing1’, 'thing2', ‘thing3’, ’thing4']:
value => true
}

/tmp/slow_command thing1=true
/tmp/slow_command thing2=true
/tmp/slow_command thing3=true
/tmp/slow_command thing4=true

If I manually invoke or exec it it takes 30s

/tmp/slow_command thing1=true thing2=true thing3=true thing4=true


:::slow_command:::
#!/bin/sh
sleep 30
echo $*

Michael Smith

unread,
Jan 27, 2016, 4:03:25 PM1/27/16
to puppe...@googlegroups.com
I'm not aware of anything that supports this explicitly. If making your own type and provider, you could provide a parameter that takes an array of instances to apply and use that (kind of like the file resource has an overload parameter 'path'). You could then write

slow_command { 'things_1_2_3_4':
  things => ['thing1', 'thing2', 'thing3', 'thing4'],
  value  => true,
}

but any dependencies would have to be expressed against the namevar 'things_1_2_3_4'.

----

I've been part of discussions about how this would be nice for package providers (`yum install a b c` is faster than calling `yum install a; yum install b; yum install c`). However some extensions to the provider interface would be needed to make it happen.

Conceptually providers would need a way to state they support merging individual declarations, some way of representing a pluralization of the type, and processing in the graph to merge instances of the same type/provider pair that don't have intermediate dependencies.

Catalog ordering limits the use of this to your use case, where you declare several instances of the same type consecutively. It would be more interesting without catalog ordering, where we could merge instances of a type declared in different modules as long as they don't have intermediate dependencies.


--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-dev/9D1DB354-8D30-448E-A461-54C67D4B8B26%40oracle.com.
For more options, visit https://groups.google.com/d/optout.



--
Michael Smith
Developer, Puppet Labs

Luke Kanies

unread,
Jan 28, 2016, 1:31:07 PM1/28/16
to puppe...@googlegroups.com
We’ve tried multiple times to build something to support this within the RAL, but we’ve never been able to come up with something sufficiently robust.

We do have ‘prefetch’ for combining discovery commands, but nothing for combining the execution of the commands, AFAIK.

For more options, visit https://groups.google.com/d/optout.


— 

Shawn Ferry

unread,
Jan 28, 2016, 2:49:21 PM1/28/16
to puppe...@googlegroups.com
Yeah, I don’t like how the composite resource removes the individual resource-y nature of each thing.  It would work but ‘all_the_true_things’, ‘all_the_false_things’, and ‘all_the_removed_things’ is unpalatable. 

That is what I’m finding as well and prefetch is working, I was hoping that there were higher level hooks I was missing that would let me carry some execution across resources something like a ‘flush_provider’.

I’m wondering if I can create resources dynamically for the required changes execute, update, and fall back to individual resource modification; ‘batched_slow_command’ unless that really sounds viable I’ll just shelve this as something we need to live with. I’m afraid that while it could probably be made to work it will be overly messy, confusing, and the applicability is too specific to this one case.

Thanks
Shawn

Gary Larizza

unread,
Jan 28, 2016, 7:05:48 PM1/28/16
to puppe...@googlegroups.com
I don't know if it would help, but did anything ever get decided from the post_xxx_eval propositions in this thread --> https://groups.google.com/forum/#!msg/puppet-dev/8j2IjZ1Ilog/KTsMaATo4DAJ

That would give you a method that could be called at specific times, BUT, again, it's just another hook AFTER every resource was synchronized (and so wouldn't stop the slow command from being synchronized every time).


For more options, visit https://groups.google.com/d/optout.



--
Gary Larizza
Professional Services Engineer
Puppet Labs

Shawn Ferry

unread,
Jan 28, 2016, 9:18:59 PM1/28/16
to puppe...@googlegroups.com

I had looked at post_resource_eval but discarded it given the docs description for cleanup. If I define ‘cleanup’ as actually making the desired changes it would seem to work. 

I over simplified slow_command, it’s not particularly slow on query it’s just slow on change so if I’ve already determined that nothing needs to change I’m not really paying a penalty for doing it.

Shawn

Erik Dalén

unread,
Jan 29, 2016, 5:04:23 AM1/29/16
to puppe...@googlegroups.com
Well, there was an old thread here about this. The conclusions were summarized here: http://projects.puppetlabs.com/issues/2198#note-41

AFAIK none of that has been implemented though. And perhaps bigger refactorings of the RAL would be better to allow concurrent processing as well as batching.

Kylo Ginsberg

unread,
Jan 29, 2016, 10:04:30 AM1/29/16
to puppe...@googlegroups.com
On Fri, Jan 29, 2016 at 2:04 AM, Erik Dalén <erik.gus...@gmail.com> wrote:
Well, there was an old thread here about this. The conclusions were summarized here: http://projects.puppetlabs.com/issues/2198#note-41

Yes, and that thinking was somewhat captured in this epic in Jira: https://tickets.puppetlabs.com/browse/PUP-146
 

AFAIK none of that has been implemented though.

That's correct. Not lack of will, just lack of time :(

However, in PUP-146 someone pointed out there's an old patch that could perhaps be revived and used as a starting point for getting this functionality into mainline puppet. If someone wants to take a pass at reviving/rebasing that patch, I'd love to collaborate on getting this into upstream.
 
And perhaps bigger refactorings of the RAL would be better to allow concurrent processing as well as batching.

Absolutely. Even more work, but very desirable IMHO, would be to allow both parallel catalog application and batching (depending on how you look at it, batching could be seen as a special case of parallel application). True parallel catalog application is something I've always seen as a potential win from a native (C++) rewrite of the agent.

Kylo


For more options, visit https://groups.google.com/d/optout.



--
Kylo Ginsberg | ky...@puppetlabs.com | irc: kylo | twitter: @kylog

Trevor Vaughan

unread,
Feb 6, 2016, 4:09:59 PM2/6/16
to puppe...@googlegroups.com
Hi Shawn,

This is very much possible, but the implementation is going to be a true hack that pollutes the global namespace until https://tickets.puppetlabs.com/browse/PUP-4002 is fixed.

Ideally, it would be something that is GC'd with the catalog, but we'll see where things go.

I've done this both for IPTables and CGroups in the following modules mainly because any error in the application of their chain would be a disaster so we wanted to wait until we could check everything and apply them seamlessly to the system.


The CGroups example is going to be MUCH easier to follow.

Essentially, I create a class variable that compiles the end result and then executes it on the last resource in the catalog. The last resource in the catalog is detected by counting what resource you're on against the total number of that resource type in the catalog.

The main downside to this (besides the global namespace nastiness) is that you only see the last resource in the catalog change in your report. Not ideal, but certainly functional.

Make sure that you undef the class parameter at the end of your application so that you don't end up with memory leaks.

Trevor

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-dev/9D1DB354-8D30-448E-A461-54C67D4B8B26%40oracle.com.
For more options, visit https://groups.google.com/d/optout.



--
Trevor Vaughan
Vice President, Onyx Point, Inc
(410) 541-6699

-- This account not approved for unencrypted proprietary information --

Trevor Vaughan

unread,
Feb 6, 2016, 4:13:40 PM2/6/16
to puppe...@googlegroups.com
I'm looking forward to see what can be done with catalog parallelism but, in reality, I don't think it will be used all that often in production.

Pegging 1 CPU at 100% for a catalog apply is bad enough, pegging more than one (even for a half time run) will be controversial in many environments.

Obviously, you would add a CPU limiter onto this and being able to set the nice level would be great but that tends to propagate down to execs and services that are called and wreak havoc on the system.

Trevor


For more options, visit https://groups.google.com/d/optout.



--

Shawn Ferry

unread,
Mar 2, 2016, 3:31:27 PM3/2/16
to puppe...@googlegroups.com
On Feb 6, 2016, at 4:09 PM, Trevor Vaughan <tvau...@onyxpoint.com> wrote:

Hi Shawn,

This is very much possible, but the implementation is going to be a true hack that pollutes the global namespace until https://tickets.puppetlabs.com/browse/PUP-4002 is fixed.

Thank you for the examples.

The nature of the work is different so I don’t see how to use the num_runs == num_resources approach to determine when to actually apply the changes. Instead I ended up using self.post_resource_eval which I think is sub-optimal. Is there some other method that I’m missing which is called at the end of each resource even if it doesn’t change?

I’m seeing two different approaches:
$iptables_rule_classvars = {
@@cgroup_rule_classvars = { 

@@classvar doesn’t work for me without going through the class itself otherwise I get 'class variable access from toplevel’ warnings and later 'uninitialized class variable' errors.
e.g. Puppet::Type::Pkg_facet::ProviderPkg_facet.send(:class_variable_set works, while self.send(:class_variable_set gets undefined method `class_variable_get’

I can remove_class_variable when I’m done and also not pollute the global namespace this way. I don’t really like needing to use the full class name instead of self.send but I’m not seeing why it fails otherwise.


Thanks
Shawn

ruby 2.1.6p336 (2015-04-13 revision 50298) [amd64-solaris2.12]
Puppet v3.6.2

Trevor Vaughan

unread,
Mar 2, 2016, 3:52:54 PM3/2/16
to puppe...@googlegroups.com
Ugh, I forgot about that. I fixed it over in IPTables but may not have gone back to cgroups since it was pretty much deprecated under systemd in EL7.


Basically, nailing up the fact that these are globals.

I think that I figured out how to do these actually correctly but I haven't had time to test it.

Trevor


For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages