Puppet & Automatic Resource state changes...

47 views
Skip to first unread message

Gavin Williams

unread,
Feb 18, 2013, 3:13:01 AM2/18/13
to puppe...@googlegroups.com
Morning All

I posted this on Puppet-users a few days ago, but I thought i'd post it on here aswell to get a Dev's view-point...

Firstly, apologies for the length of this post, however I thought it probably most useful to fully outline the challenge and the desired result...

Ok, so we're in the process of Puppetizing our Oracle/NetApp platform for Live/DR running.

The current manual process, upon setting up a new database, a set of volumes are created to contain the various Oracle DB elements, and these are then SnapMirror'd to the DR site.
This SnapMirror process requires a period of time to copy the base data over... This time period is directly relational to the amount of data required... I.e. a copy of 20Gb may take an hour, 200Gb may take 10 hours...
During this period, the SnapMirror resource is in an 'initializing' state. Once the data copy is complete, then the resource will change to an 'initialized' state.
The next step in the process is then to break the relationship so that the DR end can be used in a R/W mode...

Now, in order to Puppetize this, I need to be able to replicate the above behaviour...
I've got Puppet to create and initialize the relationship, and that works as expected. However Puppet doesn't currently care about the relationship state. Now that's easy enough to add in as a new property against the type/provider.
However what I'm struggling to understand is how, or if it's even possible, to automate the switch from 'Initialized' state to a 'Broken' state upon completion of the initialization stage???

Now these databases definitions are currently driven from a YAML backend which maintains information such as database name, volume information, primary netapp controller, replication netapp controller, etc... Currently, this YAML file is just a file on the puppet master... However there are ambitions to move this into a more dynamic backend, such as CouchDB or similar... So that opens the possibility to automatically update the YAML resource state.. However Puppet still needs to be able to support updating that backend based on the information it gets from the actual resource...

So to flow it out:
  1. Create a new database in backend ->
  2. Puppet creates volumes on primary ->
  3. Data is added to volumes ->
  4. Backend updated to indicate replication is required ->
  5. Puppet creates volumes on Secondary and adds Snapmirror relationship ->
  6. Snapmirror initializes in background ->
  7. Puppet periodically runs against network device and checks resource state ->
  8. Backend resource state is updated following each run? ->
  9. Snapmirror initialization completes ->
  10. Puppet runs, detects new resource state and then triggers break?
  11. Backend resource state updated to 'broken'?

Now 1 to 7 above are fine, but 8 to 11 are where I get a bit unsure... 

So, that's the challenge... Am I barking up the wrong tree, or is this something that Puppet could manage?

Cheers in advance for any responses.

Regards
Gavin


Andy Parker

unread,
Feb 18, 2013, 5:50:44 AM2/18/13
to puppe...@googlegroups.com
I just took a look and see that you got no responses on puppet-users. That is unfortunate :(

On Mon, Feb 18, 2013 at 12:13 AM, Gavin Williams <fatm...@gmail.com> wrote:
Morning All

I posted this on Puppet-users a few days ago, but I thought i'd post it on here aswell to get a Dev's view-point...

Firstly, apologies for the length of this post, however I thought it probably most useful to fully outline the challenge and the desired result...

Ok, so we're in the process of Puppetizing our Oracle/NetApp platform for Live/DR running.

The current manual process, upon setting up a new database, a set of volumes are created to contain the various Oracle DB elements, and these are then SnapMirror'd to the DR site.
This SnapMirror process requires a period of time to copy the base data over... This time period is directly relational to the amount of data required... I.e. a copy of 20Gb may take an hour, 200Gb may take 10 hours...
During this period, the SnapMirror resource is in an 'initializing' state. Once the data copy is complete, then the resource will change to an 'initialized' state.
The next step in the process is then to break the relationship so that the DR end can be used in a R/W mode...

Now, in order to Puppetize this, I need to be able to replicate the above behaviour...
I've got Puppet to create and initialize the relationship, and that works as expected. However Puppet doesn't currently care about the relationship state. Now that's easy enough to add in as a new property against the type/provider.

Based on how you are describing this, I'm not sure that expressing it as a parameter is best. It sounds like you are describing a situation where there are a few states that you care about, but transitioning between those states requires sitting in other "non-interesting" states for a while. Describing the "non-interesting" states pushes the management of those state transitions outside of puppet and possibly makes them harder to work with.
 
However what I'm struggling to understand is how, or if it's even possible, to automate the switch from 'Initialized' state to a 'Broken' state upon completion of the initialization stage???


Yeah. Normally puppet deals with achieving the desired state in a single run of puppet. So one possible solution is to have puppet block! I really don't think that in this situation that would be a good idea, since it would leave everything else on the machine unmanaged for an unknown length of time.
 
Now these databases definitions are currently driven from a YAML backend which maintains information such as database name, volume information, primary netapp controller, replication netapp controller, etc... Currently, this YAML file is just a file on the puppet master... However there are ambitions to move this into a more dynamic backend, such as CouchDB or similar... So that opens the possibility to automatically update the YAML resource state.. However Puppet still needs to be able to support updating that backend based on the information it gets from the actual resource...

So to flow it out:
  1. Create a new database in backend ->
  2. Puppet creates volumes on primary ->
  3. Data is added to volumes ->
  4. Backend updated to indicate replication is required ->
  5. Puppet creates volumes on Secondary and adds Snapmirror relationship ->
  6. Snapmirror initializes in background ->
  7. Puppet periodically runs against network device and checks resource state ->
  8. Backend resource state is updated following each run? ->
  9. Snapmirror initialization completes ->
  10. Puppet runs, detects new resource state and then triggers break?
  11. Backend resource state updated to 'broken'?

Now 1 to 7 above are fine, but 8 to 11 are where I get a bit unsure... 

I think you have most of the picture here. Puppet manages some of the transitions between states in order to get to that final "broken" state. Using defined resource types or parameterized classes won't get you there since the information about whether the next step of the management of the resource can be taken is on the node. As you said earlier, it is once the snapmirror process reaches the "initialized" state that puppet should finish its job.

Since the data needs to come from the node, then there are a couple of choices:
  * a custom fact: doesn't seem good since you would be encoding in facter the presence of particular resources
  * an ENC the probes the Snapmirror system: seems doable, but once again encodes the presence of particular resources outside the manifests
  * a custom type: probably the best solution, the replication itself is a kind of resource that you want to manage, and what needs to be done is heavily dependent on the current state and desired state of the resource

So I would suggest creating a custom type and provider for a "replicated data" resource, or even try splitting it up into several different resources. Doing this will let you make the final transition without having to change the catalog.

I'll admit, though, that puppet doesn't really have a concept of an "in progress" convergence of a resource, so I'm not sure how the report will work out for these kinds of resources. I suspect that it would show a change every time that puppet runs and the replication is still in progress.

So, that's the challenge... Am I barking up the wrong tree, or is this something that Puppet could manage?

Cheers in advance for any responses.

Regards
Gavin


--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+...@googlegroups.com.
To post to this group, send email to puppe...@googlegroups.com.
Visit this group at http://groups.google.com/group/puppet-dev?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

fatmcgav

unread,
Feb 18, 2013, 6:21:52 AM2/18/13
to puppe...@googlegroups.com
Andy,

Cheers for taking the time to respond...

Comments in-line below...

Cheers
Gavin

On 18 February 2013 10:50, Andy Parker <an...@puppetlabs.com> wrote:
I just took a look and see that you got no responses on puppet-users. That is unfortunate :(

On Mon, Feb 18, 2013 at 12:13 AM, Gavin Williams <fatm...@gmail.com> wrote:
Morning All

I posted this on Puppet-users a few days ago, but I thought i'd post it on here aswell to get a Dev's view-point...

Firstly, apologies for the length of this post, however I thought it probably most useful to fully outline the challenge and the desired result...

Ok, so we're in the process of Puppetizing our Oracle/NetApp platform for Live/DR running.

The current manual process, upon setting up a new database, a set of volumes are created to contain the various Oracle DB elements, and these are then SnapMirror'd to the DR site.
This SnapMirror process requires a period of time to copy the base data over... This time period is directly relational to the amount of data required... I.e. a copy of 20Gb may take an hour, 200Gb may take 10 hours...
During this period, the SnapMirror resource is in an 'initializing' state. Once the data copy is complete, then the resource will change to an 'initialized' state.
The next step in the process is then to break the relationship so that the DR end can be used in a R/W mode...

Now, in order to Puppetize this, I need to be able to replicate the above behaviour...
I've got Puppet to create and initialize the relationship, and that works as expected. However Puppet doesn't currently care about the relationship state. Now that's easy enough to add in as a new property against the type/provider.

Based on how you are describing this, I'm not sure that expressing it as a parameter is best. It sounds like you are describing a situation where there are a few states that you care about, but transitioning between those states requires sitting in other "non-interesting" states for a while. Describing the "non-interesting" states pushes the management of those state transitions outside of puppet and possibly makes them harder to work with.
 
Ok, that makes sense... Unless I do lots of masking and mapping of the intermediate status' into something that Puppet knows, but again, that adds complication etc...
 
 
However what I'm struggling to understand is how, or if it's even possible, to automate the switch from 'Initialized' state to a 'Broken' state upon completion of the initialization stage???


Yeah. Normally puppet deals with achieving the desired state in a single run of puppet. So one possible solution is to have puppet block! I really don't think that in this situation that would be a good idea, since it would leave everything else on the machine unmanaged for an unknown length of time.

Yeh, we could be looking at transfer times of 24-48 hours on some of our larger datasets, so wouldn't want Puppet blocking for that long a period...
 
 
Now these databases definitions are currently driven from a YAML backend which maintains information such as database name, volume information, primary netapp controller, replication netapp controller, etc... Currently, this YAML file is just a file on the puppet master... However there are ambitions to move this into a more dynamic backend, such as CouchDB or similar... So that opens the possibility to automatically update the YAML resource state.. However Puppet still needs to be able to support updating that backend based on the information it gets from the actual resource...

So to flow it out:
  1. Create a new database in backend ->
  2. Puppet creates volumes on primary ->
  3. Data is added to volumes ->
  4. Backend updated to indicate replication is required ->
  5. Puppet creates volumes on Secondary and adds Snapmirror relationship ->
  6. Snapmirror initializes in background ->
  7. Puppet periodically runs against network device and checks resource state ->
  8. Backend resource state is updated following each run? ->
  9. Snapmirror initialization completes ->
  10. Puppet runs, detects new resource state and then triggers break?
  11. Backend resource state updated to 'broken'?

Now 1 to 7 above are fine, but 8 to 11 are where I get a bit unsure... 

I think you have most of the picture here. Puppet manages some of the transitions between states in order to get to that final "broken" state. Using defined resource types or parameterized classes won't get you there since the information about whether the next step of the management of the resource can be taken is on the node. As you said earlier, it is once the snapmirror process reaches the "initialized" state that puppet should finish its job.

Since the data needs to come from the node, then there are a couple of choices:
  * a custom fact: doesn't seem good since you would be encoding in facter the presence of particular resources
  * an ENC the probes the Snapmirror system: seems doable, but once again encodes the presence of particular resources outside the manifests
  * a custom type: probably the best solution, the replication itself is a kind of resource that you want to manage, and what needs to be done is heavily dependent on the current state and desired state of the resource

So I would suggest creating a custom type and provider for a "replicated data" resource, or even try splitting it up into several different resources. Doing this will let you make the final transition without having to change the catalog.

I'll admit, though, that puppet doesn't really have a concept of an "in progress" convergence of a resource, so I'm not sure how the report will work out for these kinds of resources. I suspect that it would show a change every time that puppet runs and the replication is still in progress.

Ok, I think I get where you're coming from... I guess what makes this one just slightly more complicated (oh the joy) is that the device being managed is a Network Device... Will mock some code up and see where I can get to...

Sounds like an 'in progress' convergence support *could* be an interesting feature..

Nan Liu

unread,
Feb 18, 2013, 8:47:00 PM2/18/13
to puppet-dev
On Mon, Feb 18, 2013 at 3:21 AM, fatmcgav <fatm...@gmail.com> wrote:

On 18 February 2013 10:50, Andy Parker <an...@puppetlabs.com> wrote:
I just took a look and see that you got no responses on puppet-users. That is unfortunate :(

On Mon, Feb 18, 2013 at 12:13 AM, Gavin Williams <fatm...@gmail.com> wrote:
Morning All

I posted this on Puppet-users a few days ago, but I thought i'd post it on here aswell to get a Dev's view-point...

Firstly, apologies for the length of this post, however I thought it probably most useful to fully outline the challenge and the desired result...

Ok, so we're in the process of Puppetizing our Oracle/NetApp platform for Live/DR running.

The current manual process, upon setting up a new database, a set of volumes are created to contain the various Oracle DB elements, and these are then SnapMirror'd to the DR site.
This SnapMirror process requires a period of time to copy the base data over... This time period is directly relational to the amount of data required... I.e. a copy of 20Gb may take an hour, 200Gb may take 10 hours...
During this period, the SnapMirror resource is in an 'initializing' state. Once the data copy is complete, then the resource will change to an 'initialized' state.
The next step in the process is then to break the relationship so that the DR end can be used in a R/W mode...

Now, in order to Puppetize this, I need to be able to replicate the above behaviour...
I've got Puppet to create and initialize the relationship, and that works as expected. However Puppet doesn't currently care about the relationship state. Now that's easy enough to add in as a new property against the type/provider.

Based on how you are describing this, I'm not sure that expressing it as a parameter is best. It sounds like you are describing a situation where there are a few states that you care about, but transitioning between those states requires sitting in other "non-interesting" states for a while. Describing the "non-interesting" states pushes the management of those state transitions outside of puppet and possibly makes them harder to work with.
 
Ok, that makes sense... Unless I do lots of masking and mapping of the intermediate status' into something that Puppet knows, but again, that adds complication etc...
 
 
However what I'm struggling to understand is how, or if it's even possible, to automate the switch from 'Initialized' state to a 'Broken' state upon completion of the initialization stage???


Yeah. Normally puppet deals with achieving the desired state in a single run of puppet. So one possible solution is to have puppet block! I really don't think that in this situation that would be a good idea, since it would leave everything else on the machine unmanaged for an unknown length of time.

Yeh, we could be looking at transfer times of 24-48 hours on some of our larger datasets, so wouldn't want Puppet blocking for that long a period...

So just to explore this a bit. An ensurable resource by default have a present absent state, and the transition between them is pretty straightforward.

present -> absent (def destroy)
absent -> present (def create)

I'm assuming present -> absent is short enough you can wait for the process to complete, so create is the only problematic state.

For now the closest thing appears to be a transition state that fails (intending to block dependent resource):

absent -> initializing -> present

The custom ensurable block:
  ensurable do
    newvalue(:present) do
      unless provider.initialized?
        provider.create
      else
        provider.progress
      end 
    end 

    newvalue(:absent) do
      provider.destroy
    end 

    newvalue(:initializing) do
      provider.progress
    end 
  end 

So when a resource is in an initializing state just report back the progress status and fail:
  def progress
    percent = File.stat(resource[:name]).size / 100.0
    fail("Creation in progress #{percent}% complete.")
  end

Here's the output example:

# initial create
$ puppet apply tests/transition.pp 
err: /Stage[main]//Transition[/tmp/demo_a]/ensure: change from absent to present failed: Creation in progress 0.0% complete.
notice: /Stage[main]//Notify[complete]: Dependency Transition[/tmp/demo_a] has failures: true
warning: /Stage[main]//Notify[complete]: Skipping because of failed dependencies
notice: Finished catalog run in 0.08 seconds

# in progress
$ puppet apply tests/transition.pp
err: /Stage[main]//Transition[/tmp/demo_a]/ensure: change from absent to present failed: Creation in progress 12% complete.
notice: /Stage[main]//Notify[complete]: Dependency Transition[/tmp/demo_a] has failures: true
warning: /Stage[main]//Notify[complete]: Skipping because of failed dependencies
notice: Finished catalog run in 0.08 seconds

# finished:
$ puppet apply tests/transition.pp
notice: complete
notice: /Stage[main]//Notify[complete]/message: defined 'message' as 'complete'
notice: Finished catalog run in 0.08 seconds

Now these databases definitions are currently driven from a YAML backend which maintains information such as database name, volume information, primary netapp controller, replication netapp controller, etc... Currently, this YAML file is just a file on the puppet master... However there are ambitions to move this into a more dynamic backend, such as CouchDB or similar... So that opens the possibility to automatically update the YAML resource state.. However Puppet still needs to be able to support updating that backend based on the information it gets from the actual resource...

So to flow it out:
  1. Create a new database in backend ->
  2. Puppet creates volumes on primary ->
  3. Data is added to volumes ->
  4. Backend updated to indicate replication is required ->
  5. Puppet creates volumes on Secondary and adds Snapmirror relationship ->
  6. Snapmirror initializes in background ->
  7. Puppet periodically runs against network device and checks resource state ->
  8. Backend resource state is updated following each run? ->
  9. Snapmirror initialization completes ->
  10. Puppet runs, detects new resource state and then triggers break?
  11. Backend resource state updated to 'broken'?

Now 1 to 7 above are fine, but 8 to 11 are where I get a bit unsure... 

I think you have most of the picture here. Puppet manages some of the transitions between states in order to get to that final "broken" state. Using defined resource types or parameterized classes won't get you there since the information about whether the next step of the management of the resource can be taken is on the node. As you said earlier, it is once the snapmirror process reaches the "initialized" state that puppet should finish its job.

Since the data needs to come from the node, then there are a couple of choices:
  * a custom fact: doesn't seem good since you would be encoding in facter the presence of particular resources
  * an ENC the probes the Snapmirror system: seems doable, but once again encodes the presence of particular resources outside the manifests
  * a custom type: probably the best solution, the replication itself is a kind of resource that you want to manage, and what needs to be done is heavily dependent on the current state and desired state of the resource

So I would suggest creating a custom type and provider for a "replicated data" resource, or even try splitting it up into several different resources. Doing this will let you make the final transition without having to change the catalog.

I'll admit, though, that puppet doesn't really have a concept of an "in progress" convergence of a resource, so I'm not sure how the report will work out for these kinds of resources. I suspect that it would show a change every time that puppet runs and the replication is still in progress.

The problem is failing is a bit misleading. Certainly an interesting use case if we can mark the resource as pending and subsequent resources simply noop, but as it stands we can't do anything like this:

$ puppet apply tests/transition.pp       
warning: Could not retrieve fact fqdn
notice: /Stage[main]//Transition[/tmp/demo_a]/ensure: current_value absent, should be present Progress: 0.0 % (pending)
notice: /Stage[main]//Notify[complete]/message: current_value absent, should be complete (noop)
notice: Finished catalog run in 0.08 seconds

Andy, is this worth filing a feature request?

Thanks,

Nan

Andrew Parker

unread,
Feb 19, 2013, 3:20:39 AM2/19/13
to puppe...@googlegroups.com
On Feb 19, 2013, at 2:47 AM, Nan Liu <nan...@gmail.com> wrote:

I'll admit, though, that puppet doesn't really have a concept of an "in progress" convergence of a resource, so I'm not sure how the report will work out for these kinds of resources. I suspect that it would show a change every time that puppet runs and the replication is still in progress.

The problem is failing is a bit misleading. Certainly an interesting use case if we can mark the resource as pending and subsequent resources simply noop, but as it stands we can't do anything like this:

$ puppet apply tests/transition.pp       
warning: Could not retrieve fact fqdn
notice: /Stage[main]//Transition[/tmp/demo_a]/ensure: current_value absent, should be present Progress: 0.0 % (pending)
notice: /Stage[main]//Notify[complete]/message: current_value absent, should be complete (noop)
notice: Finished catalog run in 0.08 seconds

Andy, is this worth filing a feature request?

I think we can record this in a feature request. There has been this case of wanting to control slow resource and another one from Dan where he wanted to control virtual machines.


Thanks,

Nan

fatmcgav

unread,
Feb 19, 2013, 3:40:42 AM2/19/13
to puppe...@googlegroups.com
Nan

That looks like a great way of working around this requirement...

One quick q - how does 'provider.initialized?' get set?

Other than that, looks great... Will start working up some code now :)

Cheers
Gav


--

fatmcgav

unread,
Feb 19, 2013, 3:41:34 AM2/19/13
to puppe...@googlegroups.com
Andy

I would certainly +1 that request... Could use this case as an example aswell if you wanted...

Cheers
Gav

Gavin Williams

unread,
Feb 19, 2013, 9:31:03 AM2/19/13
to puppe...@googlegroups.com
Ok, I've started trying to work up some code to outline the change below...

However I'm hitting an issue whereby the catalogue is failing due to:
"Error: Failed to apply catalog: Parameter ensure failed on Netapp_snapmirror[actint-star-nactl02:/vol/v_puppet_db01_redo/q_puppet_db01_redo]: Invalid value "initializing". Valid values are present, absent."

I've updated my ensurable block to be:
  ensurable do
    desc "Netapp Snapmirror resource state. Valid values are: present, absent, initializing."

    defaultto(:present)

    newvalue(:present) do
      if provider.initialized
        #provider.create
        provider.progress
      else
        # provider.create

        provider.progress
      end
    end

    newvalue(:absent) do
      provider.destroy
    end

    newvalue(:initializing) do
      provider.progress
    end

  end

I've tried restarting the puppetmaster service several times, however keep coming back with that error :(

Any ideas??

Cheers
Gavin
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+unsubscribe@googlegroups.com.

Nan Liu

unread,
Feb 20, 2013, 11:41:55 AM2/20/13
to puppet-dev
I realized there's an issues with the PoC code going from initializing to absent so I haven't posted it yet. In ensurable I forgot to include :attr_reader :initializing which is causing the error. I'll send a follow up when I have some time to dig through this again.

Thanks,

Nan

fatmcgav

unread,
Feb 20, 2013, 11:43:24 AM2/20/13
to puppe...@googlegroups.com
Nan,

Cheers for the feedback...

Will keep an eye out for your email :)

Cheers
Gavin


--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+...@googlegroups.com.

fatmcgav

unread,
Feb 27, 2013, 3:50:21 PM2/27/13
to puppe...@googlegroups.com
Nan

Did you have any joy with updating the code?

Cheers
Gavin

Nan Liu

unread,
Mar 1, 2013, 4:40:48 PM3/1/13
to puppet-dev
On Wed, Feb 27, 2013 at 12:50 PM, fatmcgav <fatm...@gmail.com> wrote:
Did you have any joy with updating the code?

Sorry for the delay, I've been swamped lately. I think this gist should give you an idea:

Thanks,

Nan 

fatmcgav

unread,
Mar 1, 2013, 4:42:50 PM3/1/13
to puppe...@googlegroups.com

No worries, I guessed as much given all the recent puppet vmware news :-)

Will take a look, and have a play.

Cheers
Gav

--
You received this message because you are subscribed to a topic in the Google Groups "Puppet Developers" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/puppet-dev/AvM_W_YBKwg/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to puppet-dev+...@googlegroups.com.

fatmcgav

unread,
Mar 1, 2013, 5:35:20 PM3/1/13
to puppe...@googlegroups.com
Quickly tried to mock some code up this eve, and kept getting the following error...

Debug: /Filebucket[puppet]: Skipping host resources because running on a device
Debug: Node[actint-star-nactl02]: Skipping host resources because running on a device
Error: /Stage[main]//Node[actint-star-nactl02]/Netapp_snapmirror[/vol/v_puppet_db02_data/q_puppet_db02_data]: Could not evaluate: No ability to determine if netapp_snapmirror exists
Debug: Finishing transaction 70040102432860

Not sure if it's because I'm running against a network device maybe...

Will get some code up first thing in the morn... As time for bed here :)

Cheers
Gav
Reply all
Reply to author
Forward
0 new messages