Do you use a linear or non-linear version control system model?

728 views
Skip to first unread message

Mike Svoboda

unread,
Apr 6, 2015, 9:09:33 AM4/6/15
to help-c...@googlegroups.com
We're currently in the process of switching our version control system from SVN to GIT.  With SVN, we've always kept a linear model of pushing changes to production...  So, our very simple workflow was something like this...


Linear model:
1.  Develop policies on some test machine
2.  Commit developed policies to a "testing branch" with a policy server, some clients, etc.  This is a small deployment to verify the change will work as expected on all platforms.
3.  Merge from our "testing branch" into trunk.   Use class statements in policy to ramp up/down changes to various datacenter.

In this aspect, our workflow for promoting changes has always been linear.  If we need to fix a policy, its a "roll forward" where we develop the policy on the testing machines and merge into trunk.   We never have to worry about merge conflicts, rebasing, etc. because the test branch is always either at truck's revision or above it.   Truck is a snapshot of the testing branch at some point in time.

Now that we're moving to GIT, we're throwing around the idea of supporting a branch-per-datacenter.  In this model, we may not have a linear revision history.  Individual changes would be tracked per branch instead of the linear concept above.

Non-linear model:
1.  Develop policies on some test machine
2.  Commit change to GIT master branch
3.  GIT cherry pick change from master into datacenter branch X
4.  Other engineer commits policies to datacenter X via git cherry pick

In this method, instead of performing a branch merge from master to the datacenter branch, each change is tracked individually.  Changes aren't ramped up using class statements in policy.  They become activated once they are cherry picked into a GIT branch.  In a sense, the policy execution of datacenter X could be different than datacenter Y, depending which changes had been cherry picked into their respective branches.

I'm curious what experience folks have had with the non-linear model.  Our greatest concerns at this point is the rise in complexity and possibilities of conflicts when the cherry-pick of change commits happens.  If you use the non-linear model and track commits per branch, what methodologies do you use to verify a commit has been cherry-picked into all branches?  Any advise from past experience / pain here would be deeply appreciated.

Cheers
Mike


Neil Watson

unread,
Apr 6, 2015, 9:21:02 AM4/6/15
to help-c...@googlegroups.com
Mike,

My biggest client is using git and have sandbox, dev, qa, and prod
branches. It is mostly linear, moving from sandbox to prod. Usually it's
a straight merge, but sometimes there are cherry picks if not all
changes are ready. I agree that it's better to push small changes often
than larger changes infrequently. The former can be challenging if the
organization has old style change management meetings, but I doubt that
is a problem at Linkedin.

I think it's best to stick with the linear model. Dealing with complex
merges and conflicts is frustrating and error prone. Use the KISS
method and rock on.

--
Neil H Watson
Sr. Partner, Architecture and Infrastructure
CFEngine reporting: https://github.com/evolvethinking/delta_reporting
CFEngine policy: https://github.com/evolvethinking/evolve_cfengine_freelib
CFEngine and vim: https://github.com/neilhwatson/vim_cf3
CFEngine support: http://evolvethinking.com

Jarle Bjørgeengen

unread,
Apr 7, 2015, 5:44:11 AM4/7/15
to help-c...@googlegroups.com
We use gerrit [1] for version control, code-review and basic automatic testing by integrating jenkins with gerrit. We have 3 roles read-only, contributor, and gatekeeper.

The workflow is (roughly)  like this:
  • User with the role contributor (or gatekeeper) push to the magic for-master branch in gerrit
  • Gerrit creates a new "change id"
  • Jenkins listens for events and detects the new change.
  • Jenkins clones the current production, patch it with the changes for that particular change and run our test-script on it.
  • If the script returns 0 jenkins reports OK - verified on the change inside gerrit
  • gatekeeper review the change (in the web gui) and submit it to production if it is justified and and the gatekeeper decides that the structure fits with the existing code.
  • If another change is submitted to production before this one makes it, it can be rebased easily in the web-gui of gerrit. This will re-trigger the jenkins test.
We try to make small and frequent changes in order to reduce risk for breaking things and the risk of not being able to correct mistakes if something happen. We went away from the staging to DEV->TEST->PROD because it made it more difficult to keep the pace with small and frequent changes. This method has served us very well, and we now have 26 contributors and 6 gatekeepers.

The testing of each change involves:
  • rolling back a vmware snapshot of a RHEL6 machine
  • destrying some basic stuff that we know Cfengine should fix 
  • runing the cf-agente a couple of times.
  • Verifying that the agent did fix the stuff.
  • Of course if cf-promises fails it will instantly return a failure,


[1] https://code.google.com/p/gerrit/

Ted Zlatanov

unread,
Apr 9, 2015, 3:49:02 PM4/9/15
to help-c...@googlegroups.com
On Mon, 6 Apr 2015 09:09:28 -0400 Mike Svoboda <michael....@gmail.com> wrote:

MS> Now that we're moving to GIT, we're throwing around the idea of supporting
MS> a branch-per-datacenter.
...
MS> In a sense, the policy execution of datacenter X could be different
MS> than datacenter Y, depending which changes had been cherry picked
MS> into their respective branches.

Do you do this today? I would continue your current practices unless
there's a strong reason to change them. The VCS branching model
shouldn't control your build+release pipeline.

Generally I'd rather ship the same code everywhere and configure it
differently for each data center, so the linear model is more appealing
to me. If I *had* to write different code for each data center, I would
hide it behind an abstract interface.

Ted

Jason Tucker

unread,
Apr 9, 2015, 4:02:11 PM4/9/15
to help-c...@googlegroups.com

We do something similar to this. Development/testing happens in a branch, and once the submitter is happy, it gets pushed to gerrit for code review and merging into the master branch.

I guess what makes our environment a bit different is that all of our cfengine clients pull their policy directly from a git repo, rather than via cf-serverd on the policy hub. In this way, we can have individual test clients pointed at a particular branch for testing, and then put them back on master after the merge is complete.

__Jason

William Orr

unread,
Apr 9, 2015, 6:32:29 PM4/9/15
to help-c...@googlegroups.com
Reply inline

On 4/6/15 9:20 AM, Neil Watson wrote:
> Mike,
>
> My biggest client is using git and have sandbox, dev, qa, and prod
> branches. It is mostly linear, moving from sandbox to prod. Usually it's
> a straight merge, but sometimes there are cherry picks if not all
> changes are ready.

This is more or less the model that we're considering wrt. non-linear
VCS. Have you experienced any major issues with merge conflicts from the
occasional cherry pick?

I agree that it's better to push small changes often
> than larger changes infrequently. The former can be challenging if the
> organization has old style change management meetings, but I doubt that
> is a problem at Linkedin.
>
> I think it's best to stick with the linear model. Dealing with complex
> merges and conflicts is frustrating and error prone. Use the KISS
> method and rock on.
>

Thanks,
William Orr

signature.asc

Neil Watson

unread,
Apr 9, 2015, 7:44:50 PM4/9/15
to help-c...@googlegroups.com
On Thu, Apr 09, 2015 at 06:32:27PM -0400, William Orr wrote:
>> My biggest client is using git and have sandbox, dev, qa, and prod
>> branches. It is mostly linear, moving from sandbox to prod. Usually it's
>> a straight merge, but sometimes there are cherry picks if not all
>> changes are ready.
>
>This is more or less the model that we're considering wrt. non-linear
>VCS. Have you experienced any major issues with merge conflicts from the
>occasional cherry pick?

We still cherry pick in the same direction, so the operation is still a
merge, just a smaller one. We've had no problems.

Moore, Joe

unread,
Apr 10, 2015, 11:55:12 AM4/10/15
to help-c...@googlegroups.com
William Orr Wrote:
> On 4/6/15 9:20 AM, Neil Watson wrote:
> > Mike,
> >
> > My biggest client is using git and have sandbox, dev, qa, and prod
> > branches. It is mostly linear, moving from sandbox to prod. Usually it's
> > a straight merge, but sometimes there are cherry picks if not all
> > changes are ready.
>
> This is more or less the model that we're considering wrt. non-linear
> VCS. Have you experienced any major issues with merge conflicts from the
> occasional cherry pick?

Personally, I try to avoid cherrypicking, because it makes a mess of the commit log... "Merge a twig from branch dev-foo into sandbox" followed by "Merge another little leaf from dev-foo" and finally "Merge dev-foo into sandbox"

I prefer that I (and my developers) create a new branch with the small change that they need (possibly by branching their dev, making the changes and rebasing those against prod, creating a branch that doesn't depend on commits in their dev)

Also, while I think it is possible to have multiple "prod" branches for different datacenters, it may cause problems if a changeset targeted for datacenter 1 and 2 relies on commits that have been committed into DC1, but not DC2. The changeset would have to be rebased against the latest common ancestor of DC1&2, so that it could be applied to both.

Computing and updating the Latest Common Ancestor across all the datacenters and relying on your devs to start/pull from only that base for all the feature branches could be a headache. But maybe if this LCA is named "master", it would be the default, and once a commit is in every "git log --short master...DCx", that commit can be automatically added to master.

And I would still recommend putting guards in the CF code to make sure an errant merge doesn't cause problems in a non-targetted DC. Those guards could be relaxed or eliminated if this feature is promoted into another DC, as a new commit and merge.

--Joe

Danny Sauer

unread,
Apr 10, 2015, 12:14:42 PM4/10/15
to help-c...@googlegroups.com
This is a problem I have with git - people switch to it because they hired some kids who just aren't happy unless whatever they're using is currently "hip", and then they proceed to break all sorts of processes just to make git fit. ;)

My counterparts who maintain Puppet code in the same environment where I manage CFEngine code (don't ask) use git, while I use Subversion.  They switched to Git because developers don't communicate and end up with merge conflicts.  I set up Trac and required that people document what they're doing so they don't create conflicts in the first place.  Both environments end up with huge piles of abandoned branches, and both environments implemented some form of automated promotion mechanism.  Both also essentially merge individual files / changesets into a trunk which then gets promoted out through a series of directories.  The git folks use branches for those directories, while I use tags and svn externals.  In both cases, the release candidate (trunk) is always a single choke point, and there is a fixed process for the entire policy to go from RC testing on a few test machines to the sequential test environments and then production.  If you don't have a single path from test to production, then your testing is potentially invalid, so I don't see a benefit to promoting code to production when the unchanged part of the codebase might differ between test and prod.  That seems like it could well promote instability.  Treating the entire policy as a single entity seems like it would be substantially more reliable.

--Danny

Moore, Joe

unread,
Apr 10, 2015, 1:05:05 PM4/10/15
to help-c...@googlegroups.com

Danny Sauer Wrote:

Ø  If you don't have a single path from test to production, then your testing is potentially invalid, so I don't see a benefit to promoting code to production when the unchanged part of the codebase might differ between test and prod.  That seems like it could well promote instability.  Treating the entire policy as a single entity seems like it would be substantially more reliable.

 

This is another good point.

 

While CFE does try to make changes convergent, and our developers try not to write incompatible/conflicting policies, what happens if a feature implemented in DC1 does not play nicely with a feature implemented in DC2, when both of those features are implemented in DC3?

 

Unless there’s a test/qa branch for each of the DC’s configurations (or an exponential number of test cases), a feature hasn’t been tested.

 

In Summary:

Go linear, use guards (DC1:: inputs => feature.cf) to protect new code, but have it pushed everywhere, loosen the guards (DC1|DC2:: inputs => feature.cf)when the policy can be rolled out to another DC, until your guard is “any::”, then you can take the guard out.

 

Oh, and make a post-commit hook for the Master branch to tell your developers to “git pull” into their working branches J

 

--Joe

William Orr

unread,
Apr 10, 2015, 9:11:06 PM4/10/15
to help-c...@googlegroups.com
On 4/10/15 1:05 PM, Moore, Joe wrote:
> Danny Sauer Wrote:**
> --
> You received this message because you are subscribed to the Google
> Groups "help-cfengine" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to help-cfengin...@googlegroups.com
> <mailto:help-cfengin...@googlegroups.com>.
> To post to this group, send email to help-c...@googlegroups.com
> <mailto:help-c...@googlegroups.com>.
> Visit this group at http://groups.google.com/group/help-cfengine.
> For more options, visit https://groups.google.com/d/optout.

My worry, and I say this because I've seen it happen at our site, is
that this additional code change can be a source of errors in and of itself.

When you have a large number of datacenters, and the guards become
slightly complex, then it becomes non-trivial for some contributors to
ramp a change.

Using a model where changes are pushed out incrementally prevents this
from happening.

Another point against this, is that these guards are opt-in. We make a
lot of changes per-day. If these guards have to be opt-in, it's unlikely
that "trivial" changes will get these guards. Those "trivial" changes
are sometimes sources of catastrophic errors.

I think it definitely helps *not* to have each datacenter necessarily
run the same codebase, because you have opt-out ramping and you reduce
code churn (which may introduce new bugs).

signature.asc

Brian Bennett

unread,
Apr 10, 2015, 9:14:27 PM4/10/15
to Danny Sauer, help-c...@googlegroups.com
Don't blame git here. It sounds like what makes your repo successful and theirs not, is code review and a gatekeeper.

You can have clean or messy history no matter which VCS you use. It's simply a tool. The process your team builds around it will make it successful or not.


-- 
Brian Bennett
Looking for CFEngine training?
http://www.verticalsysadmin.com/

--
You received this message because you are subscribed to the Google Groups "help-cfengine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to help-cfengin...@googlegroups.com.
To post to this group, send email to help-c...@googlegroups.com.

Danny Sauer

unread,
Apr 13, 2015, 11:55:40 AM4/13/15
to help-c...@googlegroups.com, dannysa...@gmail.com, brian....@verticalsysadmin.com
I should probably clarify that I was saying both repositories work (and get messy) pretty much the same way for pretty much the same reasons, and that they both are able to work because we both established a process first and then selected a tool to fit the process rather than the other way around.  I think that was my point, anyway. :)

--Danny

Mike Svoboda

unread,
Apr 18, 2015, 8:06:13 PM4/18/15
to Danny Sauer, help-c...@googlegroups.com, Brian Bennett
Thanks everyone for adding in your $0.02!  This was a lot of help!.   I really appreciate it.

Cheers
Mike

--

Nick Anderson

unread,
Jun 22, 2015, 5:08:39 PM6/22/15
to help-c...@googlegroups.com, dannysa...@gmail.com, brian....@verticalsysadmin.com
It's been a few months and I am interested in what you decided to do. How is the new workflow working for you?

One thing that I noticed in this thread was the use of deep class guards to guard subsets of infrastructure from using the newly released policy. The thing that struck me was that we are using those deep class guards to protect against a change.

I was working on a similar thing recently. I wanted to support rolling out changes in a more controlled way and avoid the use of deep class guards. I thought the most reliable way to NOT deliver new policy to a host would actually be to NOT deliver any new policy. I wanted to be able to release a policy change for select sets without any change to others. I ended up using checkouts based on git tags. One host (in my case the hub) managed all of the various checkouts. We start with a very simple /var/cfengine/masterfiles and /var/cfengine/old_masterfiles. All remote clients get switched to /var/cfengine/old_masterfiles which starts off pointing to the same tag as /var/cfengine/masterfiles. Once verrified /var/cfengine/masterfiles would be pointed to a new tag of the code we want to release. Clients would look themselves up in a simple registry and point to masterfiles or old_masterfiles as appropriate. This way we can be selective about which specific hosts get the new policy and which hosts continue to run the old policy until we have transitioned all hosts over to /var/cfengine/masterfiles again. Once complete the process can start over.

We still view it as a single policy and try to keep it linear, moving changes from one environment to the next in sequence.

Initially I started by using branches, but it seemed too easy to just merge new changes in. Tags seemed to provide a more specific trail of crumbs to follow.

Aleksey Tsalolikhin

unread,
Jun 23, 2015, 3:18:07 PM6/23/15
to Nick Anderson, help-c...@googlegroups.com, Danny Sauer, brian....@verticalsysadmin.com


On Mon, Jun 22, 2015 at 2:08 PM, Nick Anderson <nick.a...@cfengine.com> wrote:

I was working on a similar thing recently. ... we can be selective about which specific hosts get the new policy and which hosts continue to run the old policy until we have transitioned all hosts over to /var/cfengine/masterfiles again.

Handy!  What about automatic transitioning from old_masterfiles to masterfiles based on the success of the transition of prior hosts?  

There are two ways to automatically verify success:
1. Internal: all hosts that received the new masterfiles are at 100% promise compliance STABLY -- let's say for 10 consecutive runs
2. External, using a 3rd party test framework. Neil Watson uses ServerSpec to test changes to EFL, for example.

What do you think about that?

Danny Sauer

unread,
Jun 23, 2015, 9:06:34 PM6/23/15
to help-c...@googlegroups.com
This is part of why I'm learning Python.  I've got the Subversion hooks written which update the path string in the common control section when it's updated, and the policy reports which version is running as part of the regular output.  Centralized logging is gathering that and identifying which systems are currently running which version.  Versions are based on the Subversion tag or branch name.  Systems know which segment of the network they're in (test, production, etc).  The /cfengine directory uses subversion externals to map a specific tag to a directory with the same name as the target environment.  Systems in test get their policy from /cfengine/test, in production they go to /cfengine/prod, etc.  To promote a tag, it's just an edit to the one line in the /cfengine directory's svn:externals definitions.  A system can be mapped to an alternate development location by a config file which contains a list of hostnames and the development branch name that they're in (which is also done with an external definition; /cfengine/branches/branchname).

Since the policy version contains the name of the tag or branch which it was last checked in under, centralized logging can identify when X% of the given environment has started running the new version.  That's where the Python comes in; it has a pretty nice set of tooling for manipulating a Subversion repo.  I'm currently integrating with our enterprise change management system to identify when a deployment change window is active.  When the change window starts it uses the subversion hooks to update the externals definition (based on the tag indicated in the record), which triggers a revprop-change hook to update the /cfengine filesystem from the svn repo, and marks the change as "in progress".  It then starts polling the centralized logging system to see how much of the environment has moved to the new tag.  When the environment is over some percentage migrated, the change is marked as complete.  If it's not complete within the expected timeframe (well before the end of the service window), an alert is generated to get someone to see what went wrong.

Because the deployment is done with essentially symlinks to tags, it's pretty simple to switch the link (external definition) back to the previous tag.  The failsafe.cf is set up to always get whatever's currently production, and only runs if the main policy fails (which is somewhat contrary to the way it's used in the policy that comes out of the box with CFEngine).  So, if some error finds its way to a deployment, it's just a matter of putting a working tag in production (which is how production should *always* be anyway), and it'll self resolve.  Mostly that's useful for development branches; break something, and the system automatically goes back to production, and then tries the branch again.  It keeps on that cycle until the developer fixes it or the branch override is removed from the system - so that also keeps production policy checking stuff periodically even with a broken dev branch.  Managing everything within the version control system makes it easy to identify who deployed what when, revert mistakes, and automate. :)
Reply all
Reply to author
Forward
0 new messages