quotas

35 views
Skip to first unread message

Ken Dreyer

unread,
Jan 23, 2013, 11:56:10 PM1/23/13
to gito...@googlegroups.com
Hi folks,

I just had a developer fill up my Gitorious partition with a very
large git repo. The problem was compounded by the fact that I
neglected to put /var/gitorious on its own partition... whoops. I
remedied that in hindsight :)

I'm trying to brainstorm the best way to prevent one repo or project
from filling up and blocking service for any other projects. I still
want the project and repo provisioning process to be as close to
self-service as possible - I'd rather not babysit something if I can
help it.

I was wondering if anyone had any information or tips in relation to
enforcing quotas?

- Ken

Marius Mårnes Mathiesen

unread,
Jan 24, 2013, 6:14:25 AM1/24/13
to gito...@googlegroups.com

Ken Dreyer writes:

> Hi folks,
>
> I just had a developer fill up my Gitorious partition with a very
> large git repo. The problem was compounded by the fact that I
> neglected to put /var/gitorious on its own partition... whoops. I
> remedied that in hindsight :)

Ouch!

> I'm trying to brainstorm the best way to prevent one repo or project
> from filling up and blocking service for any other projects. I still
> want the project and repo provisioning process to be as close to
> self-service as possible - I'd rather not babysit something if I can
> help it.
>
> I was wondering if anyone had any information or tips in relation to
> enforcing quotas?

Interesting question. From an application standpoint, there's one
situation where such a thing could be enforced. After pushing to a
repository, the PushProcessor will calculate the disk space used by a
repository. You wouldn't be able to stop the push from being received,
but at least we could trigger some alarms at this point.

Another approach could be to use eg. btrfs and set up subvolumes for new
projects in the repository root, assuming you're running with un-sharded
paths. Let's say a user creates a project "gitorious" and the first
repository "mainline": when generating the repository on disk in
Gitorious we could set up a subvolume for the project, so:

/var/www/gitorious/repositories/gitorious

would become a btrfs subvolume where we could enforce quotas. Achieving
this would require a hook inside the routine where a repository is
created in Gitorious, but it would definitely make sense (as long as
you're ready to trust btrfs with your data). Zfs could be an alternative
to btrfs, but the license situation is a little problematic here.

Cheers,
- Marius

anapsix

unread,
Jan 24, 2013, 8:35:55 AM1/24/13
to gito...@googlegroups.com
Would a pre-receive hook get file sizes from client when push is initiated? If so, calculating available space (quota allowance) and rejecting push based on that would be trivial..

Ken Dreyer

unread,
Jan 24, 2013, 10:23:20 AM1/24/13
to gito...@googlegroups.com
On Thu, Jan 24, 2013 at 4:14 AM, Marius Mårnes Mathiesen
<marius.m...@gmail.com> wrote:
> Another approach could be to use eg. btrfs and set up subvolumes for new
> projects in the repository root, assuming you're running with un-sharded
> paths. Let's say a user creates a project "gitorious" and the first
> repository "mainline": when generating the repository on disk in
> Gitorious we could set up a subvolume for the project, so:
>
> /var/www/gitorious/repositories/gitorious
>
> would become a btrfs subvolume where we could enforce quotas. Achieving
> this would require a hook inside the routine where a repository is
> created in Gitorious, but it would definitely make sense (as long as
> you're ready to trust btrfs with your data). Zfs could be an alternative
> to btrfs, but the license situation is a little problematic here.

Thank you! I was imagining something vaguely along the same lines.
I'll probably avoid btrfs at this point :) but maybe I can hook up
something with LVM.


It looks like I should modify these two functions in app/models/repository.rb?

def self.create_git_repository(path)
full_path = full_path_from_partial_path(path)
git_backend.create(full_path)
self.create_hooks(full_path)
end

def self.clone_git_repository(target_path, source_path, options = {})
full_path = full_path_from_partial_path(target_path)
Grit::Git.with_timeout(nil) do
git_backend.clone(full_path,
full_path_from_partial_path(source_path))
end
self.create_hooks(full_path) unless options[:skip_hooks]
end


What would be the optimal way to modify these in a way that I could
get it accepted upstream? I'm guessing that the
LV-creation-and-mounting code should live in a separate module to
handle all the different commands and the privilege escalation, etc.

- Ken

Ken Dreyer

unread,
Jan 24, 2013, 10:33:24 AM1/24/13
to gito...@googlegroups.com
Yes, I was wondering the same thing. On the other hand, it looks like I
cannot obtain the filesizes server-side until the server receives the
push, in which case it's too late. Do I have that right?

- Ken

Marius Mårnes Mathiesen

unread,
Jan 28, 2013, 7:53:13 AM1/28/13
to gito...@googlegroups.com

Ken Dreyer writes:

> On Thu, Jan 24, 2013 at 4:14 AM, Marius Mårnes Mathiesen
> <marius.m...@gmail.com> wrote:
>> Another approach could be to use eg. btrfs and set up subvolumes for new
>> projects in the repository root, assuming you're running with un-sharded
>> paths. Let's say a user creates a project "gitorious" and the first
>> repository "mainline": when generating the repository on disk in
>> Gitorious we could set up a subvolume for the project, so:
>>
>> /var/www/gitorious/repositories/gitorious
>>
>> would become a btrfs subvolume where we could enforce quotas. Achieving
>> this would require a hook inside the routine where a repository is
>> created in Gitorious, but it would definitely make sense (as long as
>> you're ready to trust btrfs with your data). Zfs could be an alternative
>> to btrfs, but the license situation is a little problematic here.
>
> Thank you! I was imagining something vaguely along the same lines.
> I'll probably avoid btrfs at this point :) but maybe I can hook up
> something with LVM.

Sounds great. I'm really looking forward to using btrfs on
gitorious.org, but we'll probably wait a year or so :-)

> It looks like I should modify these two functions in app/models/repository.rb?
>
> def self.create_git_repository(path)
> full_path = full_path_from_partial_path(path)
> git_backend.create(full_path)
> self.create_hooks(full_path)
> end
>
> def self.clone_git_repository(target_path, source_path, options = {})
> full_path = full_path_from_partial_path(target_path)
> Grit::Git.with_timeout(nil) do
> git_backend.clone(full_path,
> full_path_from_partial_path(source_path))
> end
> self.create_hooks(full_path) unless options[:skip_hooks]
> end
>
>
> What would be the optimal way to modify these in a way that I could
> get it accepted upstream? I'm guessing that the
> LV-creation-and-mounting code should live in a separate module to
> handle all the different commands and the privilege escalation, etc.

The GitBackend class would be the best place to place this kind of
logic. It is set up with some "hooks" performed after a repository has
been created, an approach we should be able to use in a similar fashion
*before* creating the repository on disk. What if the GitBackend class
had a class method or equivalent for registering hooks from the outside
that are performed before a repository is created on disk, which will
receive the path to be created:

class GitQuotaManager
def self.before_create_path(path)
# set up LVM etc
end
end

and then in an initializer register this hook:

GitBackend.register_before_hook(GitQuotaManager)

and extending the GitBackend class like this:

def create(repos_path, set_export_ok = true)
before_hook.before_create_path(repos_path)
# existing logic here
end
end

Would that make sense (thinking out loud here)?

Cheers,
- Marius

Marius Mårnes Mathiesen

unread,
Jan 28, 2013, 7:57:05 AM1/28/13
to gito...@googlegroups.com
Pretty much. The amount of disk space introduced by a series of commits
would be hard to calculate correctly, git uses hard links between cloned
repositories, and variations between file systems would introduce some
uncertainty here. But we would be able to stop pushes to a repository
that has already grown beyond its limits using the disk_usage attribute
from the database.

Cheers,
- Marius
Reply all
Reply to author
Forward
0 new messages