Re: gitolite: locking wildcard repositories

Sitaram Chamarty

nepřečteno,

17. 10. 2010 12:43:3117.10.10

komu: Matthew Trumbell, gito...@googlegroups.com, mitc...@kde.org

Hi Matthew,

[apologies for adding the mailing list to cc; I hope you
don't mind. Also, kept your entire email below my reply to
preserve context for list members.]

[Teemu, Hiren, Jeff, Eli, and Kevin: in particular, I need
your inputs as you guys use wildrepos heavily]

This is a great idea -- wish I'd thought of it right at the
start!

I made a couple of changes (mainly -- changed the name of
the file from "lock" to "gl-lock", just in case git.git ever
decides to use lock for their own purposes, and made the
gl-post-init go into the documentation (README) file for
this feature.

But now I'm having second thoughts: why not make this the
*default*?

----

Here's a question for you and for everyone else out there
using wild repos.

The solution you outlined still depends on the admin putting
the right things in place -- setting up the gl-post-init
hook and also manually fixing it for existing repos.

Would it hurt too much if I made this backward
*in*compatible? That is, if -- the next time you upgrade
the rmrepo "adc" script, your users are *forced* to first
unlock their repos, before rmrepo works?

In other words, would it be such a big problem if, instead
of:

if gl-lock file exists
reject the deletion attempt

we say

if gl-rm-ok file does not exist
reject the deletion attempt

This will make it "safe" as soon as you upgrade rmrepo
itself, as well as make it unnecessary to have a
gl-post-init hook to create that file; now it is the *lack*
of a file (named with the opposite sense) that protects the
repo from accidental deletion.

I would much prefer this, since it protects everyone
immediately and much more simply.

Thoughts?

On Sun, Oct 17, 2010 at 10:08:26AM -0500, Matthew Trumbell wrote:
> I just wanted to share a small feature I added to gitolite through ADC
> and hooks. You may or may not find it useful, any feedback on it (or
> alternate implementation suggestions) is very welcome.
>
> My problem was that I wanted a way for my users to be able to create
> and delete repositories at will, while still building in a layer of
> protection against doing something completely stupid. gitolite
> obviously handles repository creation very well and through ADC,
> handles deletion as well. My problem with rmrepo, as defined, is that
> it is very easy for a user to inadvertently delete a repository. In
> situations where the repositories are short lived, that might be an
> acceptable risk. But my users will be creating repositories that live
> on indefinitely, so an inadvertent delete might be very inconvenient
> for them. I do backups, but prefer to use them in disaster scenarios
> only. I wanted to avoid the disaster here.
>
> My solution was to build a simple locking mechanism for repositories:
> when locked, deletion fails. On creation, repositories are locked as
> a default state. Users are free to lock/unlock repositories with an
> ADC for those operations. When unlocked, deletion works as expected.
> Implementation is pretty naive, but works well in my scenario. I've
> attached a tarball with my lock/unlock/rmrepo ADC scripts and my
> gl-post-init hook script.
>
> And many thanks for all the work you've done on gitolite. I've
> deployed it both at my workplace and on a personal git server and it
> works splendidly in both scenarios. Great job!
>
> Matthew Trumbell

Eli Barzilay

nepřečteno,

17. 10. 2010 14:41:5117.10.10

komu: Sitaram Chamarty, Matthew Trumbell, gito...@googlegroups.com, mitc...@kde.org

An hour and a half ago, Sitaram Chamarty wrote:
> Hi Matthew,

>
> [Teemu, Hiren, Jeff, Eli, and Kevin: in particular, I need your
> inputs as you guys use wildrepos heavily]
>
> This is a great idea -- wish I'd thought of it right at the start!
>
> I made a couple of changes (mainly -- changed the name of the file
> from "lock" to "gl-lock", just in case git.git ever decides to use
> lock for their own purposes, and made the gl-post-init go into the
> documentation (README) file for this feature.
>
> But now I'm having second thoughts: why not make this the *default*?

Here's what I have about this:

* I also really don't want people to delete a repo by mistake, but I
deal with it differently. My delete command (a sort of a
predecessor to Sitaram's rmrepo) doesn't really delete the repo --
it just moves it to a `tmp' directory. The whole thing is backed up
daily, which means that after someone deletes a repo they can no
longer use it -- so a the daily backup picks it up after a day, and
it just stays there inactive and inaccessible.

If someone tells me that they had a mistake, it's easy to restore
the directory by moving it back in. (I didn't make a nice `undelete'
interface for that since it'll be very rare (never happened yet).)
If they tell me that after it got deleted from the `tmp' directory,
I still have my backup -- and since the repos were in there for a
while, the backup will have the last good version of the repo.

The name of the directory in `tmp' is the path of the original one
(minus the standard prefix directory), with slashes replaced by
stars -- so if I remove my "eli/foo/bar" repo, it'll move to
"eli*foo*bar" in the `tmp' directory. This makes name clashes
inexistent.

The only realy way that this kind of protection loses is if you
create some repo, delete it, create a new one with the same name,
then delete the new one (within a day), then ask me to restore the
first. But chances of making such a mistake *and* having something
worthwhile in a less-than-a-day-old repo are negligible.

(Oh, and BTW, my delete command tells people that it has been moved,
and it tells them in the rare case that there was a previous deleted
directory.)

* In contrast, just having that lock file seems like insufficient
protection. It sounds to me like an equivalent of an "are you
sure?" question -- an intentionally long an inconvenient version --
but I can still see people blindly digging up an "unlock; delete"
command line, changing the repo name and regretting it as usual.

* Unsurprisingly now, I like how I did it more. If I wanted to take
it to a proper level of a real solution, I'd probably change very
little: mainly setup some known expiration policy for removing
deleted repos that are too old, and possibly investing the 5 minutes
it would take to rename an old removed repo when a new one with the
same name is removed.

--
((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay:
http://barzilay.org/ Maze is Life!

Jeff Mitchell

nepřečteno,

19. 10. 2010 12:37:3119.10.10

komu: Eli Barzilay, Sitaram Chamarty, Matthew Trumbell, gito...@googlegroups.com

I think we can all agree that having a way to prevent accidental or
hasty deletes is good. I don't think that any complex system that
everyone can agree upon will be possible, however -- this is a product
being used by large companies and organizations and they're all going to
have different data retention policies and degrees of trust in their
users. So it seems to me that a simple locking feature is low-hanging
fruit -- it would suffice for some organizations, and those that need
more stringent data-protection systems can code in their own relatively
easily.

That all said, my suggestion is: implement the locking feature, but do
not turn it on for existing installations.

I would suggest a value in gitolite.rc to control the default behavior
for new repositories.

--Jeff

signature.asc

Sitaram Chamarty

nepřečteno,

19. 10. 2010 12:55:4619.10.10

komu: Jeff Mitchell, Eli Barzilay, Matthew Trumbell, gito...@googlegroups.com

On Tue, Oct 19, 2010 at 12:37:31PM -0400, Jeff Mitchell wrote:
> I think we can all agree that having a way to prevent accidental or
> hasty deletes is good. I don't think that any complex system that
> everyone can agree upon will be possible, however -- this is a product
> being used by large companies and organizations and they're all going to
> have different data retention policies and degrees of trust in their
> users. So it seems to me that a simple locking feature is low-hanging
> fruit -- it would suffice for some organizations, and those that need
> more stringent data-protection systems can code in their own relatively
> easily.

Perfect -- almost exactly what I was thinking.

> That all said, my suggestion is: implement the locking feature, but do
> not turn it on for existing installations.

Hmmm... would that be so bad? Even if it changes the
behaviour for old installations, it's changing it in a safer
way. A mistake is a mistake; can happen just as easily to
old installs, and if they don't like it they can remove it
when copying the new version to their GL_ADC_PATH.

I guess the advantage I have here is that an ADC is always
an explicit, manual, install -- it won't happen behind the
scenes via an upgrade. And a lot of admins have their own
ADCs that are variations of the shipped defaults so they
won't be blindly copying them too.

Just in case, I'll add a separate CHANGELOG to the adc
directory, and add a note to the top of
doc/admin-defined-commands.mkd that admins should be sure to
read that CHANGELOG before copying stuff over blindly.

That should cover it.

> I would suggest a value in gitolite.rc to control the default behavior
> for new repositories.

And I'd rather not load config info for ADCs into the RC
file. It just doesn't seem the right place -- "core" versus
"non-core".

Hiren D. Patel

nepřečteno,

19. 10. 2010 13:48:4019.10.10

komu: Jeff Mitchell, Eli Barzilay, Sitaram Chamarty, Matthew Trumbell, gito...@googlegroups.com

My uses are similar to that described by Eli. Typically, I do not let my users (students) delete the repos. I've had some cases where they say "Oh no, I wish I had file X from the repos that no longer exists." As a result, I also "move" the repos to a temporary location and keep it there for some time.

Do the locking and "move to tmp approach" mechanisms need to be mutually exclusive. Wouldn't it make sense to confirm with the locking mechanism, and then move it once it has been erased?

-- Hiren

Jeff Mitchell

nepřečteno,

19. 10. 2010 13:58:2819.10.10

komu: Hiren D. Patel, Eli Barzilay, Sitaram Chamarty, Matthew Trumbell, gito...@googlegroups.com

On 10/19/2010 1:48 PM, Hiren D. Patel wrote:
> My uses are similar to that described by Eli. Typically, I do not let
> my users (students) delete the repos. I've had some cases where they
> say "Oh no, I wish I had file X from the repos that no longer exists."
> As a result, I also "move" the repos to a temporary location and keep it
> there for some time.
>
> Do the locking and "move to tmp approach" mechanisms need to be mutually
> exclusive. Wouldn't it make sense to confirm with the locking
> mechanism, and then move it once it has been erased?

Sitaram,

Is it possible to have the ADC read input in?

If so, then instead of an unlock command, the user would simply have to
type "yes" to confirm the deletion. A second step, but a less roundabout
one, so this would serve the same purpose fairly well.

Then, for those that need more data integrity, an erased area could
maybe be built in as a further feature.

--Jeff

signature.asc

Sitaram Chamarty

nepřečteno,

19. 10. 2010 21:07:2519.10.10

komu: Jeff Mitchell, Hiren D. Patel, Eli Barzilay, Matthew Trumbell, gito...@googlegroups.com

On Tue, Oct 19, 2010 at 01:58:28PM -0400, Jeff Mitchell wrote:
> On 10/19/2010 1:48 PM, Hiren D. Patel wrote:
> > My uses are similar to that described by Eli. Typically, I do not let
> > my users (students) delete the repos. I've had some cases where they
> > say "Oh no, I wish I had file X from the repos that no longer exists."
> > As a result, I also "move" the repos to a temporary location and keep it
> > there for some time.
> >
> > Do the locking and "move to tmp approach" mechanisms need to be mutually
> > exclusive. Wouldn't it make sense to confirm with the locking
> > mechanism, and then move it once it has been erased?
>
> Sitaram,
>
> Is it possible to have the ADC read input in?

sure it is...

> If so, then instead of an unlock command, the user would simply have to
> type "yes" to confirm the deletion. A second step, but a less roundabout
> one, so this would serve the same purpose fairly well.
>
> Then, for those that need more data integrity, an erased area could
> maybe be built in as a further feature.

ok we have 4 different approaches suggested now:

- move to some tmp dir
- ask for a "yes"
- lock/unlock
- (and the current approach: just remove it)

Matthew: look what you started -- I hope you're happy now
;-)

Just kidding... I like that you all have different opinions
on this. It validates the idea that ADCs are (a) not core,
and (b) manually installed by the admin after due
consideration.

Here's what I am going to do. It's a time-honored tradition
in such circumstances called punting ;-)

rmrepo will now have all 4 code pieces, and the admin can
choose which one he wants to enable by setting a flag on
top.

I like the lock/unlock approach best because the 2-step
process reflects the *in*frequent nature of this operation
so that will be the default.

I'll get to it on or before the weekend.

regards,

sitaram

PS: The "tmp" option will need not just choosing it, but a
certain amount of local customisation. Where is your "tmp"?
What if he creates and deletes the same reponame multiple
times -- do you timestamp the moved versions or preserve
only the last? What is a good time gap to clean up moved
repos? What about monitoring disk space taken by deleted
repos and warning someone there are too many? And when that
happens how do you choose which ones go? (By size? Age?)

Eli: if you want to share your code for this I'd be happy to
put it in contrib somewhere.

Hiren D. Patel

nepřečteno,

19. 10. 2010 21:25:3819.10.10

komu: Sitaram Chamarty, Jeff Mitchell, Eli Barzilay, Matthew Trumbell, gito...@googlegroups.com

- move to some tmp dir
- ask for a "yes"
- lock/unlock
- (and the current approach: just remove it)

I was suggesting a combination of the first and the second, or the first and the third: Get some confirmation (or lock/unlock) when deleting, and then move it to a temporary directory.

-- H

Eli Barzilay

nepřečteno,

19. 10. 2010 21:42:5219.10.10

komu: Hiren D. Patel, Sitaram Chamarty, Jeff Mitchell, Matthew Trumbell, gito...@googlegroups.com

9 hours ago, Jeff Mitchell wrote:
> [...] this is a product being used by large companies and

> organizations and they're all going to have different data retention
> policies and degrees of trust in their users.

...which is why it shouldn't be left for the mercy of that policy.
What I suggested is writing code that will (a) move on deletion, (b)
expire after a configurable time period, which (c) admins will want to
set to a period long enough for the removed directory to stick around
in backup.

> So it seems to me that a simple locking feature is low-hanging fruit
> -- it would suffice for some organizations, and those that need more
> stringent data-protection systems can code in their own relatively
> easily.

Well, the rename thing is not really that high either. You can even
simplify it by leaving the junk around -- "mv $REPO $REPO.deleted".
The periodic cleanup becomes a simple one-line combination of `find',
`xargs', and `rm' (or some perl equivalent). Even better: it becomes
easy to provide commands to undelete such a repo.

And even simpler than that: provide a `rename' command, make `rmrepo'
do a `rename' instead (unless it's already called *.junk) and also
make it announce what it did to the user. This gives you a
verification step that is completely equivalent to the lock-file
solution:

$ ssh myserver unlock eli/foo/bar
eli/foo/bar can now be removed
$ ssh myserver rmrepo eli/foo/bar
eli/foo/bar has been removed

versus

$ ssh myserver rmrepo eli/foo/bar
Renamed eli/foo/bar -> eli/foo/bar.deleted.
You can now rmrepo eli/foo/bar.deleted to delete it forever.
(A deleted repo will be held for a period of 30 days.)
$ ssh myserver rmrepo eli/foo/bar.deleted
eli/foo/bar has been evaporated

But now that this is done, it's easy to do a bunch of more stuff,
like:

$ ssh myserver rmrepo eli/foo/bar
Renamed eli/foo/bar -> eli/foo/bar.deleted4
# I had removed that name three times earlier -- and now I have all
# four copies available for restoration

$ ssh myserver expand
hello eli, [...] the following repos on the server:
...
... no mention of *.deleted* repos
...

8 hours ago, Sitaram Chamarty wrote:
>
> And I'd rather not load config info for ADCs into the RC
> file. It just doesn't seem the right place -- "core" versus
> "non-core".

The ADCs are similar to my hack right? So you'd already have a
directory with their code, which should also be a good place for any
configuration options for them.

8 hours ago, Hiren D. Patel wrote:
> My uses are similar to that described by Eli. Typically, I do not
> let my users (students) delete the repos. I've had some cases where
> they say "Oh no, I wish I had file X from the repos that no longer
> exists." As a result, I also "move" the repos to a temporary
> location and keep it there for some time.

That's what made me think of the above alternative -- especially in a
situation of running a lab with potentially lots of people asking to
revive mistakes -- doing that explicit rename to something that users
can still reach mean that they can do their own restorations. They
can also shoot their own feet, of course, but going through the
process mean that a "tough luck" is a perfectly valid answer.

> Do the locking and "move to tmp approach" mechanisms need to be
> mutually exclusive. Wouldn't it make sense to confirm with the
> locking mechanism, and then move it once it has been erased?

(Personally I don't see any point in doing so...)

15 minutes ago, Sitaram Chamarty wrote:
>
> Here's what I am going to do. It's a time-honored tradition
> in such circumstances called punting ;-)
>
> rmrepo will now have all 4 code pieces, and the admin can
> choose which one he wants to enable by setting a flag on
> top.

Actually that would be the *opposite* of punting, no?

Eli Barzilay

nepřečteno,

19. 10. 2010 21:51:3219.10.10

komu: Hiren D. Patel, Sitaram Chamarty, Jeff Mitchell, Matthew Trumbell, gito...@googlegroups.com

A few minutes ago, Eli Barzilay wrote:
>
> $ ssh myserver expand
> hello eli, [...] the following repos on the server:
> ...
> ... no mention of *.deleted* repos
> ...

(I should have added that doing *that*, and another command that does
show deleted repos too, is just extra fluff that will require gitolite
to change -- unless `expand' is also a custom script...)

Sitaram:

> Eli: if you want to share your code for this I'd be happy to
> put it in contrib somewhere.

Forgot about that one. My script is below -- it's different than the
whole ADC thing, since I'm still using my own thing, but probably
close enough to be useful. Note that it doesn't do any expunging -- I
conveniently decided on a policy of "whenever I remember to do it
myself" as one that is long enough. The only bit that helps me do
that is the `touch' -- so I can have a reliable way to see when a
deleted repo is old enough.

-------------------------------------------------------------------------------
#!/bin/bash

#> delete <repo>

if [[ "$#" != 1 ]]; then echo "Usage: delete <repo>"; exit; fi

perms=(- $PERMS)
case "${perms[1]}" in
( *C* ) echo "Repository $1 does not exist" ;;
( *D* ) echo "Deleting $1! (a backup copy will be kept temporarily)"
target="$HOME/tmp/$(echo $1 | sed 's_/_*_g').git"
if [[ -d "$target" ]]; then
echo " (Deleting previous backup)"
rm -rf "$target"
fi
mv "$1.git" "$target"
# touch it, so it's easy to see when it was deleted
touch "$target"
echo " Gone!"
;;
( *R* | *W* ) echo "You do not own $1!" ;;
( * ) echo "Unknown repository: $1" ;;
esac
-------------------------------------------------------------------------------

Sitaram Chamarty

nepřečteno,

22. 10. 2010 8:12:0222.10.10

komu: Eli Barzilay, Hiren D. Patel, Jeff Mitchell, Matthew Trumbell, gito...@googlegroups.com

ok, here's what I will be pushing out soon: I replaced the old
"rmrepo" with two new ones: "rm" and "trash" (the name derives from
the ~/.Trash concept.

(what follows below is the README).

regards,

sitaram

----

By default, the old 'rmrepo' ADC (admin-defined command) just went and
deleted the repo -- no questions asked! Sometimes, that could be a
disaster -- you lose the whole thing in one mad moment of typo-ing or
frustration. Ouch.

This has been replaced by 2 families of ADCs. I say "families"
because each has one main command and 2 ancillary ones. Admins can
choose to install either, both, or neither family of commands.

Local settings for these ADCs can be found in the common settings file
"adc.common-functions".

1. 'rm' will remove the repo. If USE_LOCK_UNLOCK is set, rm will
refuse to remove a locked repo. All repos are locked by default, and
you have to explicitly 'unlock' a repo to remove it. You can also
'lock' it again instead of removing it of course.

There's also ARE_YOU_SURE, for situations where a simple warning suffices.

You can also use both these flags if you wish.

2. 'trash' will move the repo to a safe location. There are settings
for where this location is and what suffix is added to the repo name.
You can 'list-trash' to see what trash you have collected, and you can
'undelete' one of the listed repos.

It's easy to automatically clean out the trash occasionally. By
default, entries in the trash look like this:

foo/r1/2010-10-22_13:14:24
foo/r1/2010-10-22_13:14:50

This shows a repo foo/r1 that was created and trashed twice.

Since the date appears in the name, you can use it with a cutoff
to clean up old repos. Untested example:

cutoff=`date -I -d '28 days ago'`
find $TRASH_CAN -type d -name "20??-??-??_*" | while read r
do
d=`basename $r`
[[ $d < $cutoff ]] && rm -rf $d
done

Put this in cron to run once a day and that should be it.

Kevin P. Fleming

nepřečteno,

4. 11. 2010 17:17:3204.11.10

komu: gito...@googlegroups.com

I've just now gotten to read this thread, and I like this solution a
lot. I hadn't actually thought about the 'trash' approach before I saw
this, but it makes a lot of sense.

This does bring one problem I've seen with gitolite to mind though:
there are multiple operations that can modify the gitweb projects.list
file: a push to the admin repo, creation of a new repo, removal/trashing
of an existing repo. At this time there is no lock used to ensure that
these processes don't step on each other (multiple pushes to the admin
repo will be serialized by the internal git lock). I'd really like to
find a way to resolve this, and I think the right way would be to have
ADCs that are going to modify the visible set of repos take some sort of
lock on the admin repo while they are running, so that they will be
serialized along with admin repo pushes.

--
Kevin P. Fleming
Digium, Inc. | Director of Software Technologies
445 Jan Davis Drive NW - Huntsville, AL 35806 - USA
skype: kpfleming | jabber: kfle...@digium.com
Check us out at www.digium.com & www.asterisk.org

Sitaram Chamarty

nepřečteno,

7. 11. 2010 22:54:2807.11.10

komu: Kevin P. Fleming, gito...@googlegroups.com

on Friday 05 November 2010 02:47 AM Kevin P. Fleming wrote:

> This does bring one problem I've seen with gitolite to mind though:
> there are multiple operations that can modify the gitweb
> projects.list file: a push to the admin repo, creation of a new repo,

Hmm I thought I'd replied to this but apparently not, since I now can't find the email I thought I sent.

The log file is much more contentious in terms of frequency of access, and -- try as I might -- I cannot simulate a race condition on it that loses information. I have no idea what I'm doing wrong.

I'd be happy to accept patches that fix locking for the log file (much more important than the projects.list, IMO) as well as for projects.list and so on. However, what I would *really* appreciate is some way to simulate an error, then switch on the locking code and show that the error does not occur.

Any help is welcome :-)

Odpovědět všem

Odpověď autorovi

Přeposlat