|Commit loss prevention||Kohsuke Kawaguchi||11/11/13 10:25 PM|
Now that the commits have been recovered and things are almost back to normal, I think it's time to think about how to prevent this kind of incidents in the future.
|Re: Commit loss prevention||lucamilanesio||11/11/13 11:05 PM|
Seems a very good idea, it is basically a remote audit trail.
The only concern is the throttling on the GitHub API: it would be better then to do the scripting on a local mirror of the GitHub repos. When you receive a forced update you do have anyway all the previous commits and the full reflog.
However as you said by being triggered via web hook the number of API calls can be reduced to the minimum.
I would submit a proposal to the Git mailing list of a "fetch by SHA1" which is a missing feature in Git IMHO.
Thanks to everyone including GitHub for the help and cooperation in getting this sorted out !!
Sent from my iPhone
|Re: Commit loss prevention||Christopher||11/12/13 2:05 AM|
On 12/11/13 07:25, Kohsuke Kawaguchi wrote:I agree that the policy of allowing everyone to have a repo and to
commit relatively freely remains a good idea, but having the option to
give new developers push access to 1100 repositories due to how GitHub
teams and our IRC bot work is an issue that has been been raised before:
Would it be reasonable to suggest that we remove the option to add
people to the "Everyone" team from IRC and, if GitHub still adds
newly-forked repos to every team by default, that we have some sort of
process to automatically clean up the teams, as mentioned in that thread?
|Re: Commit loss prevention||Stephen Connolly||11/12/13 2:16 AM|
I think part of the issue is that our canonical repositories are on github...
I would favour jenkins-ci.org being masters of its own destiny... hence I would recommend hosting canonical repos on project owned hardware and using GIT as a mirror of those canonical repositories... much like the way ASF uses GIT. That would allow us to implement policies such as preventing forced push to specific branches, etc...
Of course that would be another pom.xml <scm> update change, namely the <developerConnection> would point to the canonical repo while the <connection> would point to the github repo... (with some use of http://developer.github.com/v3/users/keys/#list-public-keys-for-a-user we should be able to let users just register their keys in github)
e.g. the <scm> details would look like:
Maven will then do the "right thing" for pushing releases *even if you checkout from github*... and we just have the canonical repos force push to github and put proper permission sets on the canonical repos... most developers will thus see no effective difference :-)
|Re: Commit loss prevention||Dariusz Łuksza||11/12/13 3:19 AM|
In CollabNet we already implemented so called History Protection. We already put some thoughts on this topic and come up with solution for unintended force pushes and branch deletion. Maybe you can reuse some of our approaches. Here is short description of this feature.
History Protection it is an extension to Gerrit Code Review that will create special ref when ever somebody is deleting a branch or force pushing.
When force push occur our plugin will create a special ref in refs/rewrites. This ref will point to old version of branch and it's name will contain additional informations like rewritten branch name, sha-1 of new head commit, timestamp and name of user that actually did this push.
Same goes for branch deletion, but in that case new ref will be created in refs/deleted.
Our plugin is also blocking write access to refs/rewrite and refs/deleted (therefore no body can modify them), but anybody can read those refs to be able recreate deleted/overwritten history.
Other then that it will send email to Gerrit Administrators group and put entry in audit log.
You can find more about this in my blog posts and youtoube video
|Re: Commit loss prevention||Kevin Fleming (BLOOMBERG/ 731 LEXIN)||11/12/13 6:40 AM|
When you say 'canonical' in this proposal, do you mean the repositories used for making releases, or the repositories where development (and especially, pull requests) would be handled?
If it's the former, I could see that being worthwhile, especially if *nobody* has permissions to push to the canonical repositories; if a developer pushes code to the master branch of their repo on GitHub, they'd have to wait a short time for that update to be mirrored to the release repo before they could make a release. Of course, this would put extra pressure on the people who are maintaining the project infrastructure, to be sure that this mirroring process is working reliably all the time.
|Re: Commit loss prevention||Stephen Connolly||11/12/13 7:12 AM|
On 12 November 2013 14:40, Kevin Fleming (BLOOMBERG/ 731 LEXIN) <kpfl...@bloomberg.net> wrote:
I mean that they are the "official" repositories and all others are just mirrors... this is the way GIT at ASF works...
No, I'd let developers be able to push to the canonical repositories... but just not `git push --force`. There are a set of git permissions that basically ensure you cannot rewrite the past, and those would be applied to the canonical repositories. I would then perhaps prevent developers from pushing to github... but there are possibly ways to permit that.
Pull requests, forking etc would still work at github though, so no major change there... this would just introduce a set of "one true repositories"
|Re: Commit loss prevention||Kevin Fleming (BLOOMBERG/ 731 LEXIN)||11/12/13 7:43 AM|
Well, that would mean that merging a pull request on GitHub (especially the quick way, using the web UI) wouldn't update the canonical repository; the repo maintainer would need to push that change to the canonical repository, potentially dealing with a second round of merge conflicts if that repo's master branch has moved on. Sounds a bit complex :-)
There's been some discussion about using Gerrit as a front-end for all the repository activity, and I'd definitely support that move. The GitHub repos would then just be a distribution/forking point, but the workflow would be through Gerrit.
|Re: Commit loss prevention||Stephen Connolly||11/12/13 8:48 AM|
I am less keen on Gerrit. If anything this recent experience has me feeling that I don;t want Gerrit anywhere near my workflow
|Re: Commit loss prevention||slide||11/12/13 10:43 AM|
|Re: Commit loss prevention||Kohsuke Kawaguchi||11/13/13 10:56 AM|
With respect to throttling, the events API is designed for polling ,
so we just need to poll the events for the entire jenkinsci org  and
we'll have the whole history.
We already do an equivalent of local mirrors of the GitHub repos in
http://git.jenkins-ci.org/. The problem is that reflogs do not record
remote ref updates, so it will not protect against accidental ref
It does help however for the purpose of retaining commit objects, so we
need to keep this.
My recollection is that this was intentional for the security reason, so
that if a push is made accidentally and if it's removed, then those
objects shouldn't be accessible.
I think what's useful and safe is to allow us to create a ref remotely
on an object that doesn't exist locally. Again, the transport level
protocol allows this, so it'd be nice to expose this.
>> For more options, visit https://groups.google.com/groups/opt_out.--
Kohsuke Kawaguchi | CloudBees, Inc. | http://cloudbees.com/
Try Jenkins Enterprise, our professional version of Jenkins
|Re: Commit loss prevention||Kohsuke Kawaguchi||11/13/13 11:00 AM|
OK, that's a fair point.
I do recall writing a daemon that cleans up access control on
repositories (among other things like disabling issue tracker), but I'm
not too sure if we are running it regularly or not.
Maybe we can extend https://jenkins-ci.org/account so that people can
add/remove access to repositories by themselves? But then that means we
will get rid of the need to ask in the mailing list.
|Re: Commit loss prevention||lucamilanesio||11/13/13 11:55 PM|
Yes, it would be nice to be able to allow the people to auto-remove himself from push permissions to the repos he does not use.
For instance I normally push to no more than 5-6 repos, I should then be able to auto-restrict myself to those ones only.
|Re: Commit loss prevention||lucamilanesio||11/13/13 11:58 PM|
We need to make some tests on the scalability of the events API because of:
1) need to monitor over 1000 repos (one call per repo ? one call for all ?)
2) by monitoring the entire jenkinsci org, 300 events could be not enough in case of catastrophic events
Working at webhook level ? I'll investigate further about the reliability / scalability of the API (on a series of *test* repo *OUTSIDE* the Jenkins CI organisation)
|Re: Commit loss prevention||Kohsuke Kawaguchi||11/14/13 9:50 AM|
Hmm, I don't fully understand the Maven implication of such a setup, but
there's a whole lot more to switching canonical repositories from one
location to another than mass-updating pom.xml, such as communicating,
infra managing, pull requests, access control and backup, that I'm
pretty certain it's not as easy as you make it sound...
And I'm not yet sensing the appetite in the community for moving away
> I would favour jenkins-ci.org <http://jenkins-ci.org> being masters of
> its own destiny... hence I would recommend hosting canonical repos on> <http://github.com/jenkinsci/[plugin> name]-plugin.git</connection>
> <developerConnection>scm:git:git.jenkins-ci.org:jenkinsci/[plugin> <mailto:jenkinsci-dev%2Bunsubscribe@googlegroups.com>.
> For more options, visit https://groups.google.com/groups/opt_out.
|Re: Commit loss prevention||Kohsuke Kawaguchi||11/14/13 9:53 AM|
A feature like this makes a lot of for a Git hosting service. Good to
see that we are neither the first nor the only one to cause a trouble
like this .
|Re: Commit loss prevention||Dominik Bartholdi||11/14/13 9:54 AM|
I think this was an exception and we should treat it as a such…
Sure this could happen again but by doing some backups we should be fine. Maybe we would better ask GH why they provide the feature to block forced pushes just in there enterprise solution.
|Re: Commit loss prevention||Kohsuke Kawaguchi||11/14/13 10:02 AM|
OK, so the flow would be:
- Our IRC bot would put users into the "pre-approved" team, which by
itself doesn't grant access to any repositories, but is used to keep
track of who can add/remove themselves to other repositories.
- We'll improve http://jenkins-ci.org/account to allow people in the
"pre-approved" team to add/remove themselves to "Everyone" team
(which grants access to all the repos) and all the individual plugin
So if you are like me who wants to maintain access to all the repos
I can, but if you only want to work on a small number of repositories
you can do it that way, too.
This has a benefit of not getting bombarded by notification e-mails
for repositories you don't care.
I think this is actually tangential to the commit loss prevention, as I
can make the same mistake Luca did and mass update all the remote refs,
so we still need a measure to protect us from that.
Kohsuke Kawaguchi http://kohsuke.org/
|Re: Commit loss prevention||Dominik Bartholdi||11/14/13 10:07 AM|
I did not quite get what the default is… I think
per default no-one should have access to all repos, but he/she is able to grant itself this rights.
|Re: Commit loss prevention||Kohsuke Kawaguchi||11/14/13 10:11 AM|
On 11/14/2013 09:54 AM, domi wrote:
> I think this was an exception and we should treat it as a suchï¿½
Yes, I agree. And we were able to recover all the commits after all, so
I don't think we need to throw the baby out with the bath water.
Yes, we will ask about this feature. But even if GitHub disables forced
push, it's still not enough to prevent accidental or malicious data loss.
For example, if you look at a similar incident that happened a few years
ago in Eclipse , I bet these happened by mass deletion, not forced
updates. (thanks Dariusz for this pointer!)
What I think we want GitHub to consider is the equivalent of "History
Protection" Darius wrote as implemented in CollabNet.
But until that comes, I guess we are on our own to emulate that without
direct access to the server.
|Re: Commit loss prevention||Kohsuke Kawaguchi||11/14/13 10:23 AM|
On 11/13/2013 11:58 PM, Luca Milanesio wrote:The good news is that the push that removes/alters refs also take time.
I have the notification e-mail from your push to 186 repos, and it spans
over an hour.
So I'm hoping that polling 300 events every minute would cover us pretty
well. And like you say, a webhook can help us reduce this window down
There's another reason I'm optimistic about this scheme.
Suppose you are maliciously trying to cause data loss. If we are
regularly recording refs, you have to mount an attack immediately after
some commits go in so as to overwhelm the 300 event buffer, then keep
that saturation going so that your ref updates/removals will also be
dropped from the event buffer. And even with this much effort you can
only cause the data loss of the commits that went in right before yours.
So I think it makes the attack so ineffective that we can tolerate that
risk, and I find it unlikely that no accidents will look like this.
|Re: Commit loss prevention||lucamilanesio||11/15/13 12:51 AM|
True: however possibly the notifications took an hour but the push was pretty fast but still around 25 / min. 300 events per minute should be then fairly enough :-)
The only way to go over that limit is parallel push by multiple accounts ... but that I would say is very unlikely.