Google Groups

feasibility and implementation of a tree-structured development process


Tom Roche Apr 22, 2012 5:06 PM
Posted in group: Git for human beings

summary: background, motivation, and plan for a tree-structured development process is presented. My question is (roughly), should the "code bucket" to which code is committed at each level of the process be implemented by a separate repository, a separate branch on a single repository, something else, or does it matter?

details:

Apologies for the length of this post, but it seems there's a lot to explain. In reverse order, I have a question about a plan, for which I present the motivation and some background: feel free to skip over parts, but I suspect it "all ties together."

BACKGROUND

I recently began to work with a group that recently had an embarrassingly extended release. The short story is, we kept throwing what we thought was good code over the wall to the beta testers, who kept throwing it back--for 5 months. The long story/etiology includes:

1 Our people are software engineers only by default. They're really scientists who code, and who have learned a bit about software engineering "just by doing." But their focus has been on what they do with their code, not their tools or development process (until now), which can seem pretty crude (at least, to a coder who's starting to learn the science, like me).

2 We have a very centralized dev process. There is one CVS repo to which everyone commits. (Technically, there are several: since they don't know how to branch or create read-only users, they just clone the filesystem everytime they want to freeze something. But for commits there is only one repo.) Everyone commits to HEAD, for the reasons why most CVS users don't branch. Theoretically everyone runs a big bucket o' tests before committing; in practice, there's a small group (2 guys) who manage releases and actually/reliably test.

3 We have a very long release cycle: several years, for which there are apparently some legitimate reasons. But we don't do intermediate integrations, or manage dependencies; ISTM, that's just slack, and means that pre-release testing follows a painful pre-release integration of code from our many contributors.

Related to this etiology are the following continuing constraints:

4 Resource: our funding is flat, and our group's headcount is actually declining (retirees are not being replaced). We are supplied with contractors who service our clusters (more below), but no other computing support (other than desktop support for "productivity apps" like Lotus Notes). We need more contributions from our community of users (which I suspect many could/would give), but, for legal reasons (not related to licensing--the code is open-source), it's hard for us to enable access to code that has not been "fully reviewed." (More on excessive security below.) These are longer-term problems :-(

5 Automated testing of large-scale scientific models seems inherently hard. (If there's anyone out there working on this problem, please ping me offline--I'd like to learn more.) There are ways to attack this in which I'm definitely interested, but that's also a longer-term problem.

6 We are not mobile developers. We run and test our code on a couple of clusters which are behind some exceedingly strict firewalls--so strict that few folks have the ability to VPN (aggravated by the resource constraint), and it's painful for those that do. We can't ssh or https out of the cluster, which complicates sharing of code (via, e.g., github) and data. Hence folks work on code almost entirely from their desks (which are on LANs that have cluster access) and not from home or on the road. This is also not likely to change anytime soon.

MOTIVATION

My group intensively uses our tool for our scientific work (we majorly "eat our own dogfood"), but we also have a significant external community of users. The 5-month delay of an announced release was therefore rather embarrassing, and we also realize that it wasted lots of time/effort. Now that we're planning for the next release, I'm proposing some process upgrades to address those problems. Some proposals are no-brainers, or at least are off-topic for this post:

* CVS -> git: the following plan presupposes we do this. This is not quite a no-brainer, since we'll hafta train folks how to use git, but I can't see disadvantages to migration that aren't outweighed by the advantages of git.

* dependency management (a bit more on dependencies below)

* shorter release cycle || intermediate integration builds using specified dependencies (and that's a boolean 'or')

(If you've got reasons why not to do those, please post me separately, and not on this thread/Subject.)

PLAN

My final proposal is more complex. I'd appreciate comments on it, particularly regarding an implementation detail discussed below. This implementation detail reflects the similarity and differences between git repositories (or remotes) and their branches. Since in git the difference to the user between {pushing code to, pulling code from} any particular branch on any particular repository can be made fairly transparent (am I missing something?), I'll just use the term "code bucket" to refer to something from/to which one can pull/push.

For better testing and evaluation, I'm proposing that we move from a centralized process/repository to a tree structure. The release managers (who have other jobs--they do this "on the side") are empirically overwhelmed, so ISTM we need better "division of labor," i.e., distribution of test and integration effort. Furthermore, we already have workgroups which discuss and prioritize big function chunks (e.g., chemistry, meteorology, land cover), and project groups working on smaller ones (e.g., aerosol nucleation), "in between" the individual scientist/coder and the top-level management/repository. (Note that everyone belongs to more than one workgroup and project team: software is modular, but nature is not.) So I'm trying to leverage those groups to get the necessary integration/test work done, and give the release managers "fewer throats to choke." The proposal is, bottom up:

1 Each coder gets her/his own bucket, for her/his own code, on which s/he tests as s/he will. The main difference between that and the status quo (besides cvs -> git) is, s/he will be required to publicly declare (on our group's wiki) what test(s) s/he runs.

2 Each project (i.e., one or a few function points we want to add or modify) gets assigned to a project team (PT). Each PT

* has a declared lead, who is responsible for that project, and represents the PT at workgroup meetings.

* must declare what test(s) it runs on its code.

* has its own separate code bucket. When a member coder wants to "commit," s/he requests pull from her/his PT lead, who pulls/merges/tests. The PT evaluates the results; if satisfactory, the PT lead commits to that PT's bucket.

3 Each workgroup (WG) is like a super-PT: a WG integrates the code from its member PTs in the way that each PT integrates its team members. A WG

* has a declared lead, who is responsible for its set of function, and represents the WG when meeting with the release managers.

* must declare what test(s) it runs on its code.

* has its own separate code bucket. When a member PT wants to "commit," it requests pull from its WG lead, who pulls/merges/tests. The WG evaluates the results; if satisfactory, the WG lead commits.

4 The release managers (RMs) integrate the code from the workgroups. The RMs collectively determine, for a given release or integration build (IB),

* dates

* what its dependencies will be (i.e., on what versions of (e.g.) libraries and compilers that release or IB must run)
 
* what function goes in (the determination and arbitration of which seems consume lotsa work)

  The RMs also

* must declare what test(s) it runs on the release or IB

* manage the top-level (separate) code bucket. When a WG wants to "commit," it requests pull from an RM, who pulls/merges/tests. The RMs evaluates the result; if satisfactory, the RMs commit.

QUESTION

My general questions are, does the plan above seem

* feasible, given our constraints?

* solvent: does it seem likely to solve the problems described above? (notably, that the centralization of our process is overwhelming the folks at the center)

My specific question regards the implementation of the "code bucket" at each of the levels above: should it be implemented by

* a separate repository

* a separate branch on a shared repository

* Something Completely Different

? I'm leaning toward separate repositories, but am wondering if there are performance or operational details of which I'm unaware, given the following constraints. To be more specific, the implementation I currently favor is, for each level:

1 Each coder gets a separate git repository on her/his desktop, which is on a LAN that can ssh (and therefore run protocol=git) and https into the cluster. Unfortunately these are mostly windows (XP), but I'm presuming git runs well enough on that--am I missing something? (I run debian, and am mostly blissfully ignorant of platforms != linux.) Coders would also be free to create repositories on the /home filesystems on our clusters (which run RHEL 5, but may soon be moving to CentOS 6). On their repositories, a coder would be free to create branches and tags as desired.

2 Each project team lead gets a separate repository on one or both of the clusters. (We can ssh/git between the clusters, and between the clusters and the desktop LAN, but can neither ssh/git nor https from either clusters to the outside world.) PT leads are also free to branch and tag at will on their repo.

3 Each workgroup lead gets a separate repository on one or both of the clusters. WG leads are also free to branch and tag at will on their repo.

4 The release managers would maintain a separate repository on one or both of the clusters. Branch=master would, at any given time, hold the latest release or integration build. Immediately before a release or integration is declared (only following its successful testing!), the current contents of branch=master would be branched with the date of the integration, or the release number; then the contents of the current release/IB would be committed to branch=master. RMs may also create other branches or tags to facilitate integration and release.

your review is appreciated, Tom Roche <Tom_...@pobox.com>