Hi Anon,
A couple quick notes about your anonymity before I respond to your questions. First and foremost, we want to help. There are plenty of folks here who are just a phone call away, and who would be more than happy to help you engage with your leadership, but we can't do that if we don't know who you are. Second, by its very nature, if you're going to develop in public you have to be open to the possibility of "looking bad" occasionally (some might say frequently), so sooner or later you’re going to have to take that leap of faith one way or another. J Perhaps taking that risk with a sympathetic community like ours might actually give the opportunity to demonstrate to your organization the positive effects of doing so?
So, with that being said I will do my best to address your concerns as openly and honestly as your anonymity will allow.
Preface: All concepts that follow depend on a high degree of faith in your engineers' professional judgment. If your relationship with your dev, and ops, and security engineers is at all adversarial then you have bigger problems that might need to be addressed first.
The first thing I'll say is that iterative development and continuous integration (or as close as you can get to it) is your friend. I think high-frequency, small releases help solve a lot the problems you've raised.
Put simply, the more frequently you release, the smaller amount of code goes in to each release, thus reducing the burden of review. This benefits you in two ways. First, it allows you to integrate a lighter-weight review process in to your development work flow, which, if done well also happens to get you better code all around. Second, I presume the "review process" you refer to means an external review (presumably by security/comms personnel), by producing a high volume of review requests to the external stakeholders, the vast majority of which are small, less significant commits, allows you to a) quickly establish a pattern of trust with those stakeholders and b) solidify the notion that not all commits are equal. Thus, hopefully over a reasonably short time this will allow you to shift away from a policy of "all commits must be reviewed" to "critical commits must be reviewed." And having previously established that trust, hopefully that revised policy will place trust in the engineers to determine which commits are "critical" or at least establish a bright line of some sort, e.g. all commits over N lines must be reviewed, or all commits that touch authentication, or any commit messages or issue updates that touch on policy, timeline, roadmaps, etc. etc.
The rest of my thoughts are in-line below.
TL;DR - These are in order of decreasing importance:
1. Can developers be allowed to push directly to GitHub, without every commit going through a review process?
No. In fact I submit you actually don't want this anyway. But per above, it all depends on how you shape the review process itself. A development workflow that requires peer review of all commits allows for an opportunity to flag potential commits that require a higher level review, as well as common sense review of commit messages. One option here is to use a combination of private and public repos so you can do your code review in the private repo then pull to public when it passes review.
Also, you get better code.
2. What to say to those who don't want to expose our messy coding practices?
A. Say "let’s stop using messy coding practices." As you mentioned a couple times, if you separate the problem of existing code from new code, this becomes a bit more realistic. If you establish in the culture that you're building for open source, even with code that will never be part of a public repo, you essentially challenge your engineers to develop and indoctrinate the kinds of practices that no longer raise those types of concerns. As Ben points out in the post I just linked, there's also a level of self-awareness that occurs— people who KNOW their code is going to be open source write code differently.
And again, you get better code. And,
B. Accept that on some level, all code is always messier than you'd like it to be. One of the core concepts of open-source development is transparency— it's as much about community-building as it is about the code itself— and with that comes a level of intellectual honesty. In that context, trying to present the world with fully polished code and perfect coding practices and hiding away the warts is the open-source equivalent of astroturfing.
3. Won't it be a lot more work to support community interaction?
For this one, it's really as much or as little as you want it to be. Obviously more begets more, and less begets less. But one viable strategy is to time box it. Basically have one engineer for a set period of time, say four or eight hours per week, triage the issue queues and pull requests. If you rotate that responsibility weekly across your dev team, the impact ends up being surprisingly low. Theoretically, if your engagement is effective this time box will need to grow over time, but hopefully by then not only will your leadership be more bought in to it, but you’ll also have a better sense of the value added to your own programs by the community with which you’re engaging, and have a solid footing on which to base any cost-benefit tradeoffs.
4. Isn't it okay (good, even) to put unfinished / unpolished code out in the open?
Yes. See response to #2 above. This resistance is always present initially, especially when dealing with legacy code. Engineers always want more time to "polish the code" before making it public: "I don't want my name associated with this crappy code," "this is embarrassing and will make us look bad," etc. etc. But let me ask you, have you ever met a developer who said "this piece code is exactly how I want it"? J As I mentioned in my intro There's a point at which you just have to take the leap of faith. There will always be detractors, no matter how good the code is, and that simply can't be helped. But at the end of the day the community can tell the difference between honest engagement vs. "checking the open source box," and it's been my experience that the former invites allies and the latter invites detractors.
Also, one of the other core concepts is that open source means that you get help with your code. If you're only publishing polished code you're actually denying yourself one of the biggest benefits, so why bother in the first place?
To sum up, there are two separate levels of concern here. First is the institutional fear, and I think the only way to overcome it is to embrace that fear by engaging with those concerned, and welcoming a level of oversight that both allays those fears and gets you better code, and if you implement that oversight right, it will be less burdensome than you expect, and will also lighten over time as leadership becomes more comfortable with the notion. The second is engineering fear – the "I don't want my bad code out there" argument – which I think you tackle by challenging the engineers to evolve. If the current processes are too messy for their standards, then you can engage with them and challenge them to come up with, and adhere to new ones.
I guess what I’m saying is that the cultural changes cannot be achieved simply in the context of a desire for an open-source culture, but that they need to be part of a larger holistic shift in how you develop code. But then I’m probably stating the obvious there.
Anyway, I know this was long, but I hope it helps, and I’d be happy to talk more, if only I knew who you were…
-L
1. Can developers be allowed to push directly to GitHub, without every commit going through a review process?
No. In fact I submit you actually don't want this anyway. But per above, it all depends on how you shape the review process itself. A development workflow that requires peer review of all commits allows for an opportunity to flag potential commits that require a higher level review, as well as common sense review of commit messages. One option here is to use a combination of private and public repos so you can do your code review in the private repo then pull to public when it passes review.
Also, you get better code.
2. What to say to those who don't want to expose our messy coding practices?A. Say "let’s stop using messy coding practices." As you mentioned a couple times, if you separate the problem of existing code from new code, this becomes a bit more realistic. If you establish in the culture that you're building for open source, even with code that will never be part of a public repo, you essentially challenge your engineers to develop and indoctrinate the kinds of practices that no longer raise those types of concerns. As Ben points out in the post I just linked, there's also a level of self-awareness that occurs— people who KNOW their code is going to be open source write code differently.
And again, you get better code. And,
B. Accept that on some level, all code is always messier than you'd like it to be. One of the core concepts of open-source development is transparency— it's as much about community-building as it is about the code itself— and with that comes a level of intellectual honesty. In that context, trying to present the world with fully polished code and perfect coding practices and hiding away the warts is the open-source equivalent of astroturfing.
3. Won't it be a lot more work to support community interaction?
For this one, it's really as much or as little as you want it to be. Obviously more begets more, and less begets less. But one viable strategy is to time box it. Basically have one engineer for a set period of time, say four or eight hours per week, triage the issue queues and pull requests. If you rotate that responsibility weekly across your dev team, the impact ends up being surprisingly low. Theoretically, if your engagement is effective this time box will need to grow over time, but hopefully by then not only will your leadership be more bought in to it, but you’ll also have a better sense of the value added to your own programs by the community with which you’re engaging, and have a solid footing on which to base any cost-benefit tradeoffs.
4. Isn't it okay (good, even) to put unfinished / unpolished code out in the open?
Yes. See response to #2 above. This resistance is always present initially, especially when dealing with legacy code. Engineers always want more time to "polish the code" before making it public: "I don't want my name associated with this crappy code," "this is embarrassing and will make us look bad," etc. etc. But let me ask you, have you ever met a developer who said "this piece code is exactly how I want it"? J As I mentioned in my intro There's a point at which you just have to take the leap of faith. There will always be detractors, no matter how good the code is, and that simply can't be helped. But at the end of the day the community can tell the difference between honest engagement vs. "checking the open source box," and it's been my experience that the former invites allies and the latter invites detractors.
Also, one of the other core concepts is that open source means that you get help with your code. If you're only publishing polished code you're actually denying yourself one of the biggest benefits, so why bother in the first place?
To sum up, there are two separate levels of concern here. First is the institutional fear, and I think the only way to overcome it is to embrace that fear by engaging with those concerned, and welcoming a level of oversight that both allays those fears and gets you better code, and if you implement that oversight right, it will be less burdensome than you expect, and will also lighten over time as leadership becomes more comfortable with the notion. The second is engineering fear – the "I don't want my bad code out there" argument – which I think you tackle by challenging the engineers to evolve. If the current processes are too messy for their standards, then you can engage with them and challenge them to come up with, and adhere to new ones.
I guess what I’m saying is that the cultural changes cannot be achieved simply in the context of a desire for an open-source culture, but that they need to be part of a larger holistic shift in how you develop code. But then I’m probably stating the obvious there.
Anyway, I know this was long, but I hope it helps, and I’d be happy to talk more, if only I knew who you were…
-L
From: Eric Mill [mailto:er...@sunlightfoundation.com]
Sent: Tuesday, February 18, 2014 11:15 PM
To: Heyman, Leigh
Cc: Anony Moose; us-govern...@googlegroups.com
Subject: Re: Help with making the case for opening up software development
On Tue, Feb 18, 2014 at 10:52 PM, Heyman, Leigh <Leigh_...@oa.eop.gov> wrote:
1. Can developers be allowed to push directly to GitHub, without every commit going through a review process?
No. In fact I submit you actually don't want this anyway. But per above, it all depends on how you shape the review process itself. A development workflow that requires peer review of all commits allows for an opportunity to flag potential commits that require a higher level review, as well as common sense review of commit messages. One option here is to use a combination of private and public repos so you can do your code review in the private repo then pull to public when it passes review.
Also, you get better code.
This whole email was an *excellent* response, and I'm not (and haven't ever been) a government employee, so I can't truly speak to the poster's circumstances, take it with grains of salt, etc.
Thanks!
However, this particular answer is mixing a bunch of things together - you should always be able to arrive at a place where developers push directly to Github, without every commit being reviewed.
* Peer code reviews are A+, but a huge fundament of Github's entire pull request workflow is to do those code reviews *inside Github*. Individual developers can perform work on their own branches or forks, and discuss that work before merging upstream. That's much different than not allowing code to appear publicly at all without review.
That's all, I agreed with the general thrust of the answer -- it just seems very grim to say that government agencies should not allow their developers to push directly to Github as a blanket statement.
Right, sorry, I agree completely, I should have been more explicit-- in fact I¹m kicking myself a little here because I had an earlier draft that was more specific about how I meant this but it was already getting too long. The version that I cut stated specifically that you don¹t want engineers pushing directly to your org¹s public/production version of a repo without review. That’s more or less what I was getting at with the recommendation of devs working on forks of a private repo before pulling to a public one. I am definitely not recommending against leveraging github¹s workflows.
One final general statement: working in public, treating it like breathing, changes human behavior in all sorts of ways, and in my experience, all for the better. It'd be difficult for me now to work any other way.
Very well put!
1. Can developers be allowed to push directly to GitHub, without every commit going through a review process?
No. In fact I submit you actually don't want this anyway. But per above, it all depends on how you shape the review process itself. A development workflow that requires peer review of all commits allows for an opportunity to flag potential commits that require a higher level review, as well as common sense review of commit messages. One option here is to use a combination of private and public repos so you can do your code review in the private repo then pull to public when it passes review.
Also, you get better code.
This whole email was an *excellent* response, and I'm not (and haven't ever been) a government employee, so I can't truly speak to the poster's circumstances, take it with grains of salt, etc.
Thanks!
2. What to say to those who don't want to expose our messy coding practices?
3. Won't it be a lot more work to support community interaction?
It can be. More realistically, do you anticipate the project to be so popular that community interaction will be an issue? This is actually a great position to be in, if you get there. Take an extremely popular project like Bootstrap with two core developers (@mdo (https://github.com/mdo) and fat (https://github.com/fat)) and around 500 total contributors. The reward is more feedback and greater quality and ultimately a successful project:
https://github.com/twbs/bootstrap/issues
https://github.com/twbs/bootstrap/pulls (over 23,000 forks)
4. Isn't it okay (good, even) to put unfinished / unpolished code out in the open?
Absolutely. You want feedback and improvements early. This is the hardest thing for developers not used to the open source community to get used to. You'll find that a lot of the embarrassment goes away for rough code when you at least commit a reasonable number of unit tests with it and a decent readme.
The main concern is with security. Now, I understand that we can't expose our internal infrastructure, such as server names and directory paths (not to mention credentials).
You should not check in these types of dependencies. Spell out what the dependencies are so that everyone can set up their own environment. See this link about configuration for Twelve-Factor apps:
Environment variables can contain the values or the paths to the data necessary to get an application correctly configured and provisioned with credentials, etc. As the link explains, configuration files can be used, but it's easy and tempting to pass them around and possibly check into version control accidentally.
Bottom line, it's great that you championing efforts to increase productivity and quality through change and transparency in your software culture. Best of luck!
Tony
@subfuzion
C) Also be sure to dive into these two:
D) There's also now a Government Open Source listserve that's worth hitting up in the future as well. I already told them about this thread.
E) For specific examples of what you're looking for, there are a number of projects that you can often detect through # of forks and/or activity. Check out these two mashups.
Along the lines of Leigh's points about anonymity, feel free to directly get in touch and I'm happy to give more specific examples directly.
Gray
Thanks to all the posters. There's some great collective data here!One thing I know government security teams care about is the ability to satisfy federal audits. Audits can be very rigorous, authoritative, and are key measures with many CISOs. Audits involve presenting artifacts surrounding controls as defined by NIST 800-53. Well defined processes and procedures are critical pieces to answering audits. These need to be in writing *and* you need to demonstrate that they are followed. In other words, you need to align what's done in practice with documented policies and procedures.
With respect to source code, there are a couple of tactical approaches:(1) communicate and demonstrate that source code in of itself is *not* executing software in a production environment (e.g, if hosted on GitHub)(2) stand on the shoulder of giants--look at the stellar OSS work of the DoD--http://dodcio.defense.gov/OpenSourceSoftwareFAQ.aspx, who formally defined source code as "data", which may dictate that it can be handled differently(4) Conversations are important, but you need concrete artifacts: I started a generalized template that might help: http://if.io/open-source-program-template/index.html. It has some policies and procedures that security teams appreciate. It also has an open source policy, a checklist, and a link to a proof-of-concept tool that inspects git repositories for string patterns that shouldn't be there (PII or personal identifiable information): https://github.com/virtix/clouseau (We're actually thinking about making this a plugin for Travis and have implemented a simple post-commit hook that blocks commits when they contain certain strings. These are the kinds of demonstrable controls that help with adoption.)(5) Start small. Get buy-in to pilot something small and very low risk.
On the subject of anonymity, it should be a matter of principle and ethics—if you're not willing to be accountable for something you say or write, then you're either not ready to say it, or, perhaps, it shouldn't be said. The amount of change you can make is directly proportional to your willingness to deal with friction and heartbreak.
-bill
Thanks for all your responses. Sorry for staying anonymous; no one appreciates the irony more than I do. I feel like I've been fighting a real uphill battle, and have in the past tried to push the envelope, and did more harm than good, because, for example, by carelessly exposing an internal server name on a GitHub repo, now our security folks have an example of why they need to keep things locked down.
But unless we're going to have a mix of private/public repos, then the branches and the pull requests should happen on the same public repository.
But that is what our security folks are saying that we cannot allow. I didn't see anything in your response to suggest that, from a security standpoint, every developer commit must be reviewed before being pushed.
On 2/20/14 12:56 AM, "Anony Moose" <feicha...@gmail.com> wrote:
But unless we're going to have a mix of private/public repos, then the branches and the pull requests should happen on the same public repository.
So why not use a mix of public/private? It let's your engineers develop the kind of workflows and coding practices that will get you better code even if it never ends up in a public repo; it lets you build workflows that integrate security checks/reviews in to them. It let's you partner with the security team to collaborate on a set of workflows that they will (eventually) approve of. Open source isn't an all-or-nothing proposition.
A bit slow to respond, but a great conversation and lots of great advice here from Leigh and James, among others.
every change-set would have to get reviewed before being pushed to public
Delay code review not to when it hits the open source master branch (which doesn’t impact the organization’s security posture) but before it hits production (where it’s no longer an abstract project, but now production code). As you do with modules within the software, separate the organization-specific concerns from the code itself. Abstraction is both a best practice and your friend here.
So why not use a mix of public/private? It let’s your engineers develop the kind of workflows and coding practices that will get you better code even if it never ends up in a public repo; it lets you build workflows that integrate security checks/reviews in to them. It let’s you partner with the security team to collaborate on a set of workflows that they will (eventually) approve of. Open source isn’t an all-or-nothing proposition.
Whatever route you take, it’s important to realize that the technology’s the easy part. A move like this is simply leveraging technology as an excuse to motive cultural change. It’s a different workflow and a different power dynamic than many primarily proprietary organizations are accustomed to. Starting small and privately, with something insignificant, just to go through the motions and find the friction points between collaborative development and your organization’s culture is hugely valuable in and of itself. Make a repo with the best places to get lunch near by, or a style guide, or other non-code thing to get a feel for how things works before starting the conversation to go public. Baby steps.
I was talking about the ability to use GitHub as the main development repository, independent of the specific workflow that’s established for a project. So, yes, devs should work in a branch, and code should be reviewed before merged to master, say. But unless we’re going to have a mix of private/public repos, then the branches and the pull requests should happen on the same public repository.
One word of caution, is that having an internal private repository (or other version control workflow), and an external public repository, is not open source, and can be an especially bad experience if you have an internal bug tracker to which the open source community is not privy. How’d you like it if you dedicated 6 hours of your time to add a feature or submit a bug only to hear “oh yeah, we’ve been secretly working on that. We’ll push it directly to master next week”. You’d never contribute again. Likewise, you’d want more eyes on your code review, as when you open source your project, you expand the nexus of stakeholders to beyond those within your organization. An imbalance of information between sides of the firewall is one of the best ways to ensure an open source project fails.
OP, more than glad to chat confidentially if you have questions about specifics / logistics of using GitHub. Feel free to reach out to gover...@github.com any time.
Cheers and open source,
- Ben
I am interesting in exploring the specific use case of developing a specific site or application in the open. This is somewhat different than traditional open source development, which is mostly focused on the development of generic, reusable libraries that are not applications in themselves but are instead reusable components that can be used to create a specific application. Of course, there are some notable exceptions like the Firefox web browser (really all of Mozilla's stuff). There are also some "in between" things where the open source software is a fully functioning application, but can be easily modified and re-purposed because it addresses a very common use case and is designed to be modified and configured (i.e. Wordpress, Discourse, and CKAN).
It is much less common to see a specific application with specific and unique business requirements developed in the open. Do people have thoughts on why this might be? I have a few ideas:
In my mind, the benefits of developing an app in the open are:
I am curious if people feel as ambivalent about this decision as me. Do we definitely think developing applications in the open is a best practice? Should all government software be developed in the open by default? Or is it more nuanced than that?
Are there examples (government or otherwise) of specific applications developed in the open? The only one that comes to mind for me is Data.gov. There must be others?