Preserve/Prune Old Pack Files

63 views
Skip to first unread message

jme...@codeaurora.org

unread,
Jan 3, 2017, 4:49:10 PM1/3/17
to repo-d...@googlegroups.com
We’ve noticed cases where Stale File Handle Exceptions occur during git
operations, which can happen on users of NFS repos when repacking is
done on them.

To address this issue, we’ve added two new options to the JGit GC
command:

--preserve-oldpacks: moves old pack files into the preserved
subdirectory instead of deleting them after repacking

--prune-preserved: prunes old pack files from the preserved subdirectory
after repacking, but before potentially moving the latest old pack files
to this subdirectory

The strategy is to preserve old pack files around until the next repack
with the hopes that they will become unreferenced by then and not cause
any exceptions to running processes when they are finally deleted
(pruned).

Change is uploaded for review here: https://git.eclipse.org/r/#/c/87969/

Thanks,
James

--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum,
a Linux Foundation Collaborative Project

Martin Fick

unread,
Jan 4, 2017, 11:12:00 AM1/4/17
to repo-d...@googlegroups.com, jme...@codeaurora.org, jgit...@eclipse.org, g...@vger.kernel.org
I am replying to this email across lists because I wanted to
highlight to the git community this jgit change to repacking
that we have up for review

https://git.eclipse.org/r/#/c/87969/

This change introduces a new convention for how to preserve
old pack files in a staging area
(.git/objects/packs/preserved) before deleting them. I
wanted to ensure that the new proposed convention would be
done in a way that would be satisfactory to the git
community as a whole so that it would be more easy to
provide the same behavior in git eventually. The preserved
pack files (and accompanying index and bitmap files), are not
only moved, but they are also renamed so that they no longer
will match recursive finds looking for pack files.

I look forward to any review (it need not happen on the
change, replies to this email would be fine also), in
particular with respect to the approach and naming
conventions.

Thanks,

-Martin


On Tuesday, January 03, 2017 02:46:12 PM
jme...@codeaurora.org wrote:
> We’ve noticed cases where Stale File Handle Exceptions
> occur during git operations, which can happen on users of
> NFS repos when repacking is done on them.
>
> To address this issue, we’ve added two new options to the
> JGit GC command:
>
> --preserve-oldpacks: moves old pack files into the
> preserved subdirectory instead of deleting them after
> repacking
>
> --prune-preserved: prunes old pack files from the
> preserved subdirectory after repacking, but before
> potentially moving the latest old pack files to this
> subdirectory
>
> The strategy is to preserve old pack files around until
> the next repack with the hopes that they will become
> unreferenced by then and not cause any exceptions to
> running processes when they are finally deleted (pruned).
>
> Change is uploaded for review here:
> https://git.eclipse.org/r/#/c/87969/
>
> Thanks,
> James

--
The Qualcomm Innovation Center, Inc. is a member of Code
Aurora Forum, hosted by The Linux Foundation

Martin Fick

unread,
Jan 9, 2017, 11:18:04 AM1/9/17
to Jeff King, repo-d...@googlegroups.com, jme...@codeaurora.org, jgit...@eclipse.org, g...@vger.kernel.org
On Monday, January 09, 2017 01:21:37 AM Jeff King wrote:
> On Wed, Jan 04, 2017 at 09:11:55AM -0700, Martin Fick
wrote:
> > I am replying to this email across lists because I
> > wanted to highlight to the git community this jgit
> > change to repacking that we have up for review
> >
> > https://git.eclipse.org/r/#/c/87969/
> >
> > This change introduces a new convention for how to
> > preserve old pack files in a staging area
> > (.git/objects/packs/preserved) before deleting them. I
> > wanted to ensure that the new proposed convention would
> > be done in a way that would be satisfactory to the git
> > community as a whole so that it would be more easy to
> > provide the same behavior in git eventually. The
> > preserved pack files (and accompanying index and bitmap
> > files), are not only moved, but they are also renamed
> > so that they no longer will match recursive finds
> > looking for pack files.
> It looks like objects/pack/pack-123.pack becomes
> objects/pack/preserved/pack-123.old-pack,

Yes, that's the idea.

> and so forth. Which seems reasonable, and I'm happy that:
>
> find objects/pack -name '*.pack'
>
> would not find it. :)

Cool.

> I suspect the name-change will break a few tools that you
> might want to use to look at a preserved pack (like
> verify-pack). I know that's not your primary use case,
> but it seems plausible that somebody may one day want to
> use a preserved pack to try to recover from corruption. I
> think "git index-pack --stdin
> <objects/packs/preserved/pack-123.old-pack" could always
> be a last-resort for re-admitting the objects to the
> repository.

or even a simple manual rename/move back to its orginal
place?

> I notice this doesn't do anything for loose objects. I
> think they technically suffer the same issue, though the
> race window is much shorter (we mmap them and zlib
> inflate immediately, whereas packfiles may stay mapped
> across many object requests).

Hmm, yeah that's the next change, didn't you see it? :) No,
actually I forgot about those. Our server tends to not have
too many of those (loose objects), and I don't think we have
seen any exceptions yet for them. But, of course, you are
right, they should get fixed too. I will work on a followup
change to do that.

Where would you suggest we store those? Maybe under
".git/objects/preserved/<xx>/<sha1>"? Do they need to be
renamed also somehow to avoid a find?

...
> I've wondered if we could make object pruning more atomic
> by speculatively moving items to be deleted into some
> kind of "outgoing" object area.
...
> I don't have a solution here. I don't think we want to
> solve it by locking the repository for updates during a
> repack. I have a vague sense that a solution could be
> crafted around moving the old pack into a holding area
> instead of deleting (during which time nobody else would
> see the objects, and thus not reference them), while the
> repacking process checks to see if the actual deletion
> would break any references (and rolls back the deletion
> if it would).
>
> That's _way_ more complicated than your problem, and as I
> said, I do not have a finished solution. But it seems
> like they touch on a similar concept (a post-delete
> holding area for objects). So I thought I'd mention it in
> case if spurs any brilliance.

I agree, this is a problem I have wanted to solve also. I
think having a "preserved" directory does open the door to
such "recovery" solutions, although I think you would
actually want to modify the many read code paths to fall
back to looking at the preserved area and performing
immediate "recovery" of the pack file if it ends up being
needed. That's a lot of work, but having the packs (and
eventually the loose objects) preserved into a location
where no new references will be built to depend on them is
likely the first step. Does the name "preserved" do well for
that use case also, or would there be some better name, what
would a transactional system call them?

Thanks for the review Peff!

-Martin
Reply all
Reply to author
Forward
0 new messages