man page comments: rm, gc, fsck, validate*

4 views
Skip to first unread message

Greg Troxel

unread,
May 24, 2026, 9:21:28 AM (23 hours ago) May 24
to bup-...@googlegroups.com
I've never really used rm/gc, and I have long been fuzzy on
fsck/validate. I'm trying to understand; this message has comments
about what I find confusing. Ideally, absent me being confused in a way
that nobody else would be, we'd clarify the man pages rather then
explaining here.

I offer rewrites with the understanding that I may be wrong in the
details, but I hope that saying it more precisely is on the path to
precise and correct.

* bup rm

This indicates that one can remove "saves" in addition to branches.

I am guessing that if you have done save with "-n foo" repeatedly and
you ask rm to remove foo, then refs/head/foo will go away, with no other
changes.

It doesn't explain if that means you should be passing a sha1 for a save
commit, or some symbolic name for that, and how would you get such a
name.

If I wanted to remove the most recent save for foo, presumably I'd pass
the sha1 for that commit, and expect the commit to be somehow removed.
Perhaps, that is changing the ref to the parent, and not changing any
objects.

If I wanted to remove the previous save, keeping the current, foo^2, and
all ancestors, then I don't seehow that would work unless rm writes a
new save commit (referring to the same tree) and has parent foo^2
instead of foo^1, and moves foo to point to this new commit. This is
effectively an interactive rebase dropping a commit.

It's not clear how/why bup rm behaves differently from being passed foo
or being passed the sha1 that foo points to. It seems it must, but I'm
boggled about the semantics (in a way that a person without CS training
might not be).

There's a compression argument, but I don't see what would be compressed
in any invocation of rm.

There's a comment about bup gc or git gc. That's an implication that
it's ok to run git gc on a bup repo. That's a big statement, and there
are a lot of implications.

* bup gc

The threshold description is confusing:

only rewrite a packfile if it's over N percent garbage and contains no
unreachable trees or commits.

The term garbage is not defined, and if I didn't have the lisp history
I'd wonder what it meant. Not a big deal.

I would guess that N percent is about bytes, not objects, but it doesn't
say.

I wonder why it is 10. If we don't have a good reason, then perhaps say
there is no strong justification. It seems high to me, and that
scanning takes a really long time. (I'm doing gc on an USB3-attached
SATA HDD, with a repo of 729G, and it's been going an hour.) 10% of my
space is a really big deal, but maybe the distribution of garbage is
that packs are mostly garbage (written during a removed save) or mostly
not garbage (written during a not-removed save). But with get
--rewrite, that I haven't run yet, I'm not sure.

"contains no unreachable trees or commits": this doesn't make sense.
unreachable objects are precisely what garbage is, and protecting an
entire pack because it has an unreachable tree, and not just unreachable
blobs, doesn't make sense to me. I wonder if this is a phrase-o, or if
I'm confused.

There's no discussion of merging small packs to bigger ones. git gc
would do this. I'm not complaining that this doesn't happen, but I
think it should be stated one way or another.

There is a compression level. But this is keeping or not keeping
objects, and the blobs are compressed. So what is gc compressing, that
it needs a level?

There is no mention of what happens with par2 with newly-written packs.
I bet it just needs a statement that newly-written packs do not have
par2 sidecar files automatically, and that those using par2 likely want
to bup fsck -g afterwards. And probably, "want par2" should be a repo
property and fsck -g of a pack, as it is written, should happen, just as
removal of a par2 sidecar of a removed pack should happen.

* bup validate-object-links

This uses "broken links" in quotes; insert standard rant about adding
quotes to say that the words don't mean what they say. The loose object
exclusion is unclear. Here's a rewrite -- that I'm not sure is
accurate.

`bup validate-object-links` examines all objects in the repository and
for each, checks that any referenced object (i.e. from a commit or a
tree) also exists in the repository. Currently, it ignores loose
objects (those not in packfiles -- which git may create, but bup
doesn't). It does not include loose objects in the scan, and it will
not find them when searching for an object. It does not examnine tag
objects (which bup also doesn't create).

I think it should also say one of the following:

`bup validate-object-links` does not read blob objects; if an index
says it is present, that's good enough. Thus, it can be fooled by a
broken idx or midx. It does not necessarily detect the inability to
read blocks from a packfile.

`bup validate-object-links` does not read blob objects; if a packfile
says that an object is present, that's good enough. It does not
necessarily detect the inability to read blocks from a packfile.

* bup validate-refs

(It was news to me that validate-ref-linsk is deprecaated in 0.33.4; not
a complain but context in case that helps spot confusion on my part.)

It is not clear if this does anything that validate-object-links does
not. I am guessing that it checks that every sha1 in refs/heads/* is in
the repo, on top of validate-object-links.

It seems one can pass a name foo, looked up to a sha1 in refs/head/foo,
and then only objects reachable from that sha1 are checked.

Or, one can pass a sha1 correpsonding to a commit (save) that is older.
Does it then check the parent of that commit, or just the treeo of that
commit?

Presumably the same rules about index presence vs pack presence vs not
reading the blob apply. It doesn't say that.

A nit about bupm: the word abridged is confusing. If you mean "checks
that for every direcotry and file in a save, that the appropriate bupm
file has metadata for that directory or file", say that.


* bup validate-refs and demux

My tree has a .1/.1.html but no md for these. This is baffling. demux
is not important; I know I don't need to understand that.

* bup fsck

it says:

`bup fsck` is a tool for validating bup repositories in the
same way that `git fsck` validates git repositories.

but doesn't really explain.

I would therefore expect:

probably excludes loose objects and tags

starts with every refs/heads/foo

chases all references from trees/commits

not only checks that the object is in an index, but also in a
packfile, reads the object, and computes the hash and makes sure it
matches

prints unreachable objects. perhaps prints only the head of
unreachable objects, and not all the children of the disconnected
subtree.

checks that idx files are correct

does not check that midx files are correct


It doesn't explain if -r is just a par2 operation, or if it's only
triggered if there is an object failure. It seems obvious to me that
par2 generation/checking/recovery is a logically separate operation than
chasing pointers. It is far from obvious that it is separate in bup
fsck.

The discussion of git verify-pack is confusing. git doesn't know about
par2, as I understand it, so I don't see why that's relevant. It's
unclear to me why this is done, or why it's safe to omit, and whether
the chasing of objects happens also. I can't really tell what git
verify-pack does from its man page.

The $64K question: Other than par2 generation/checking/recovery, what
does bup fsck do that git fsck doesn't, what does git fsck do that bup
fsck doesn't, and why would you choose one or the other?

(And, why is the par2 stuff in fsck? Isn't that conceptually separate,
sort of like "store your repo in ZFS with raidz1"?
Reply all
Reply to author
Forward
0 new messages